# Optuna tutorial for begginers: hyperparameter optimization framework

When I try building a model (XGBoost, LightGBM, CatBoost, Neural Network etc...), I always face an issue of how to tune these hyperparameters?<br>
Some people may be trying to set parameters manually to see if the score improves or not.

**In this tutorial, I will introduce [optuna](https://optuna.org/), *Define-by-Run Hyperparameter Optimization Framework* for automated hyperparameter tuning.**

**<p style="color:red">UPDATE: added <a href="#id6">6. Visualize study history to analayze the hyperparams-performance relationship</a> section, please check!!!</p>**

from IPython.display import HTML

HTML('<iframe width="800" height="400" src="https://www.youtube.com/embed/-UeC4MR3PHM?start=0" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>')

# Table of Contents:

Below code forked and a bit modified from [ASHRAE: Training LGBM by meter type](https://www.kaggle.com/corochann/ashrae-training-lgbm-by-meter-type) kernel (taken from version 25) until data preprocessing.<br>
Please jump to **[1. Define objective function](#id1)** to start learning optuna usage!

**[ASHRAE - Great Energy Predictor III](#title)** <br>
**[Data Preprocessing](#id0)** <br>
**[1. Define "objective" function](#id1)** <br>
**[2. Use "trial" module to define hyperparameters dynamically!](#id2)** <br>
**[3. Make "study" and let optimize!](#id3)** <br>
**[4. [Advanced] Pruning unpromising trials for more faster search](#id4)** <br>
**[5. Check study history to get best hyperparameters](#id5)** <br>
**[6. Visualize study history to analayze the hyperparams-performance relationship](#id6)** <br>
**[More to go](#id10)** <br>

In [1]:
"""
Some visualization methods used in this tutorial is supported in optuna from v0.18.0 released recently!
However this kaggle kernel pre-installs version 0.16.0
"""
# !pip install optuna==0.18.1

'\nSome visualization methods used in this tutorial is supported in optuna from v0.18.0 released recently!\nHowever this kaggle kernel pre-installs version 0.16.0\n'

<a id="title"></a>
# ASHRAE - Great Energy Predictor III

Our aim in this competition is to predict energy consumption of buildings.

There are 4 types of energy to predict:

 - 0: electricity
 - 1: chilledwater
 - 2: steam
 - 3: hotwater

Electricity and water consumption may have different behavior!
So I tried to separately train & predict the model.

I moved previous [ASHRAE: Simple LGBM submission](https://www.kaggle.com/corochann/ashrae-simple-lgbm-submission) kernel.

In [2]:
import gc
import os
from pathlib import Path
import random
import sys

from tqdm import tqdm_notebook as tqdm
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

import matplotlib.pyplot as plt
import seaborn as sns

from IPython.core.display import display, HTML

# --- plotly ---
from plotly import tools, subplots
import plotly.offline as py
py.init_notebook_mode(connected=True)
import plotly.graph_objs as go
import plotly.express as px
import plotly.figure_factory as ff

# --- models ---
from sklearn import preprocessing
from sklearn.model_selection import KFold
import lightgbm as lgb
import xgboost as xgb
# !pip install catboost
import catboost as cb



calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.



In [3]:
# Original code from https://www.kaggle.com/gemartin/load-data-reduce-memory-usage by @gemartin
# Modified to support timestamp type, categorical type
# Modified to add option to use float16 or not. feather format does not support float16.
from pandas.api.types import is_datetime64_any_dtype as is_datetime
from pandas.api.types import is_categorical_dtype

def reduce_mem_usage(df, use_float16=False):
    """ iterate through all the columns of a dataframe and modify the data type
        to reduce memory usage.        
    """
    start_mem = df.memory_usage().sum() / 1024**2
    print('Memory usage of dataframe is {:.2f} MB'.format(start_mem))
    
    for col in df.columns:
        if is_datetime(df[col]) or is_categorical_dtype(df[col]):
            # skip datetime type or categorical type
            continue
        col_type = df[col].dtype
        
        if col_type != object:
            c_min = df[col].min()
            c_max = df[col].max()
            if str(col_type)[:3] == 'int':
                if c_min > np.iinfo(np.int8).min and c_max < np.iinfo(np.int8).max:
                    df[col] = df[col].astype(np.int8)
                elif c_min > np.iinfo(np.int16).min and c_max < np.iinfo(np.int16).max:
                    df[col] = df[col].astype(np.int16)
                elif c_min > np.iinfo(np.int32).min and c_max < np.iinfo(np.int32).max:
                    df[col] = df[col].astype(np.int32)
                elif c_min > np.iinfo(np.int64).min and c_max < np.iinfo(np.int64).max:
                    df[col] = df[col].astype(np.int64)  
            else:
                if use_float16 and c_min > np.finfo(np.float16).min and c_max < np.finfo(np.float16).max:
                    df[col] = df[col].astype(np.float16)
                elif c_min > np.finfo(np.float32).min and c_max < np.finfo(np.float32).max:
                    df[col] = df[col].astype(np.float32)
                else:
                    df[col] = df[col].astype(np.float64)
        else:
            df[col] = df[col].astype('category')

    end_mem = df.memory_usage().sum() / 1024**2
    print('Memory usage after optimization is: {:.2f} MB'.format(end_mem))
    print('Decreased by {:.1f}%'.format(100 * (start_mem - end_mem) / start_mem))
    
    return df

In [4]:
!ls ../ASHRAE/input

building_metadata.csv  test.csv   weather_test.csv
sample_submission.csv  train.csv  weather_train.csv


## Fast data loading

This kernel uses the preprocessed data from my previous kernel, [
ASHRAE: feather format for fast loading](https://www.kaggle.com/corochann/ashrae-feather-format-for-fast-loading), to accelerate data loading!

In [5]:
# %%time
root = Path('../ASHRAE/input/')
train_df = pd.read_csv(root/'train.csv')

weather_train_df = pd.read_csv(root/'weather_train.csv')
building_meta_df = pd.read_csv(root/'building_metadata.csv')
train_df.head()

Unnamed: 0,building_id,meter,timestamp,meter_reading
0,0,0,2016-01-01 00:00:00,0.0
1,1,0,2016-01-01 00:00:00,0.0
2,2,0,2016-01-01 00:00:00,0.0
3,3,0,2016-01-01 00:00:00,0.0
4,4,0,2016-01-01 00:00:00,0.0


In [6]:
weather_train_df["timestamp"] = pd.to_datetime(weather_train_df["timestamp"])
train_df["timestamp"] = pd.to_datetime(train_df["timestamp"],format="%Y-%m-%d %H:%M:%S")
train_df['date'] = train_df['timestamp'].dt.date
train_df['meter_reading_log1p'] = np.log1p(train_df['meter_reading'])

In [7]:
np.sum(train_df['meter_reading_log1p'].values < 0)

0

In [8]:
def plot_date_usage(train_df, meter=0, building_id=0):
    train_temp_df = train_df[train_df['meter'] == meter]
    train_temp_df = train_temp_df[train_temp_df['building_id'] == building_id]    
    train_temp_df_meter = train_temp_df.groupby('date')['meter_reading_log1p'].sum()
    train_temp_df_meter = train_temp_df_meter.to_frame().reset_index()
    fig = px.line(train_temp_df_meter, x='date', y='meter_reading_log1p')
    fig.show()

In [9]:
plot_date_usage(train_df, meter=0, building_id=0)

## Removing weired data on site_id 0

As you can see above, this data looks weired until May 20. It is reported in [this discussion](https://www.kaggle.com/c/ashrae-energy-prediction/discussion/113054#656588) by @barnwellguy that **All electricity meter is 0 until May 20 for site_id == 0**. I will remove these data from training data.

It corresponds to `building_id <= 104`.

In [10]:
building_meta_df[building_meta_df.site_id == 0]

Unnamed: 0,site_id,building_id,primary_use,square_feet,year_built,floor_count
0,0,0,Education,7432,2008.0,
1,0,1,Education,2720,2004.0,
2,0,2,Education,5376,1991.0,
3,0,3,Education,23685,2002.0,
4,0,4,Education,116607,1975.0,
5,0,5,Education,8000,2000.0,
6,0,6,Lodging/residential,27926,1981.0,
7,0,7,Education,121074,1989.0,
8,0,8,Education,60809,2003.0,
9,0,9,Office,27000,2010.0,


In [11]:
train_df = train_df.query('not (building_id <= 104 & meter == 0 & timestamp <= "2016-05-20")')

<a id="id0"></a>
# Data preprocessing

Now, Let's try building GBDT (Gradient Boost Decision Tree) model to predict `meter_reading_log1p`. I will try using LightGBM in this notebook.


[UPDATE]
 - Processing of 'weekend'
 - Take building stats by building_id **and meter type**
 - Align timestamp in weather data, by@nz0722 https://www.kaggle.com/nz0722/aligned-timestamp-lgbm-by-meter-type

In [12]:
debug = False

## Add time feature

Some features introduced in https://www.kaggle.com/ryches/simple-lgbm-solution by @ryches

Features that are likely predictive:

#### Weather

- time of day
- holiday
- weekend
- cloud_coverage + lags
- dew_temperature + lags
- precip_depth + lags
- sea_level_pressure + lags
- wind_direction + lags
- wind_speed + lags

#### Train

- max, mean, min, std of the specific building historically



However we should be careful of putting time feature, since we have only 1 year data in training,
including `date` makes overfiting to training data.

How about `month`? It may be better to check performance by cross validation.
I go not using this data in this kernel for robust modeling.

In [13]:
def preprocess(df):
    df["hour"] = df["timestamp"].dt.hour
#     df["day"] = df["timestamp"].dt.day
    df["month"] = df["timestamp"].dt.month
    df["dayofweek"] = df["timestamp"].dt.dayofweek
    df["weekend"] = df["dayofweek"] >= 5

#     hour_rad = df["hour"].values / 24. * 2 * np.pi
#     df["hour_sin"] = np.sin(hour_rad)
#     df["hour_cos"] = np.cos(hour_rad)

In [14]:
preprocess(train_df)

In [None]:
# take stats by ONLY building_id. meter type is merged in this script...

# df_group = train_df.groupby('building_id')['meter_reading_log1p']
# building_mean = df_group.mean().astype(np.float16)
# building_median = df_group.median().astype(np.float16)
# building_min = df_group.min().astype(np.float16)
# building_max = df_group.max().astype(np.float16)
# building_std = df_group.std().astype(np.float16)

# train_df['building_mean'] = train_df['building_id'].map(building_mean)
# train_df['building_median'] = train_df['building_id'].map(building_median)
# train_df['building_min'] = train_df['building_id'].map(building_min)
# train_df['building_max'] = train_df['building_id'].map(building_max)
# train_df['building_std'] = train_df['building_id'].map(building_std)

In [15]:
df_group = train_df.groupby(['building_id', 'meter'])['meter_reading_log1p']
building_mean = df_group.mean().astype(np.float16)
building_median = df_group.median().astype(np.float16)
building_min = df_group.min().astype(np.float16)
building_max = df_group.max().astype(np.float16)
building_std = df_group.std().astype(np.float16)

In [16]:
building_stats_df = pd.concat([building_mean, building_median, building_min, building_max, building_std], axis=1,
                              keys=['building_mean', 'building_median', 'building_min', 'building_max', 'building_std']).reset_index()
train_df = pd.merge(train_df, building_stats_df, on=['building_id', 'meter'], how='left', copy=False)

In [17]:
train_df.head()

Unnamed: 0,building_id,meter,timestamp,meter_reading,date,meter_reading_log1p,hour,month,dayofweek,weekend,building_mean,building_median,building_min,building_max,building_std
0,105,0,2016-01-01,23.3036,2016-01-01,3.190624,0,1,4,False,4.316406,4.332031,3.191406,5.164062,0.318115
1,106,0,2016-01-01,0.3746,2016-01-01,0.318163,0,1,4,False,0.751953,0.559082,0.0,2.890625,0.478516
2,106,3,2016-01-01,0.0,2016-01-01,0.0,0,1,4,False,1.023438,0.0,0.0,3.712891,1.268555
3,107,0,2016-01-01,175.184,2016-01-01,5.171529,0,1,4,False,4.570312,5.78125,0.039703,6.382812,2.009766
4,108,0,2016-01-01,91.2653,2016-01-01,4.524668,0,1,4,False,5.457031,5.449219,4.417969,6.113281,0.216187


## Fill Nan value in weather dataframe by interpolation


weather data has a lot of NaNs!!

![](http://)I tried to fill these values by **interpolating** data.

In [18]:
weather_train_df.head()

Unnamed: 0,site_id,timestamp,air_temperature,cloud_coverage,dew_temperature,precip_depth_1_hr,sea_level_pressure,wind_direction,wind_speed
0,0,2016-01-01 00:00:00,25.0,6.0,20.0,,1019.7,0.0,0.0
1,0,2016-01-01 01:00:00,24.4,,21.1,-1.0,1020.2,70.0,1.5
2,0,2016-01-01 02:00:00,22.8,2.0,21.1,0.0,1020.2,0.0,0.0
3,0,2016-01-01 03:00:00,21.1,2.0,20.6,0.0,1020.1,0.0,0.0
4,0,2016-01-01 04:00:00,20.0,2.0,20.0,-1.0,1020.0,250.0,2.6


In [19]:
weather_train_df.isna().sum()

site_id                   0
timestamp                 0
air_temperature          55
cloud_coverage        69173
dew_temperature         113
precip_depth_1_hr     50289
sea_level_pressure    10618
wind_direction         6268
wind_speed              304
dtype: int64

In [22]:
weather_test_df = pd.read_csv(root/'weather_test.csv')
weather_test_df["timestamp"] = pd.to_datetime(weather_test_df["timestamp"])
weather = pd.concat([weather_train_df, weather_test_df],ignore_index=True)
del weather_test_df


In [24]:
# https://www.kaggle.com/nz0722/aligned-timestamp-lgbm-by-meter-type

weather_key = ['site_id', 'timestamp']

temp_skeleton = weather[weather_key + ['air_temperature']].drop_duplicates(subset=weather_key).sort_values(by=weather_key).copy()

# calculate ranks of hourly temperatures within date/site_id chunks

temp_skeleton['temp_rank'] = temp_skeleton.groupby(['site_id', temp_skeleton.timestamp.dt.date])['air_temperature'].rank('average')

# create a dataframe of site_ids (0-16) x mean hour rank of temperature within day (0-23)
df_2d = temp_skeleton.groupby(['site_id', temp_skeleton.timestamp.dt.hour])['temp_rank'].mean().unstack(level=1)

# Subtract the columnID of temperature peak by 14, getting the timestamp alignment gap.
site_ids_offsets = pd.Series(df_2d.values.argmax(axis=1) - 14)
site_ids_offsets.index.name = 'site_id'

def timestamp_align(df):
    df['offset'] = df.site_id.map(site_ids_offsets)
    df['timestamp_aligned'] = (df.timestamp - pd.to_timedelta(df.offset, unit='H'))
    df['timestamp'] = df['timestamp_aligned']
    del df['timestamp_aligned']
    return df

del weather
del temp_skeleton
gc.collect()

weather_train_df = timestamp_align(weather_train_df)
weather_train_df = weather_train_df.groupby('site_id').apply(lambda group: group.interpolate(limit_direction='both'))

Seems number of nan has reduced by `interpolate` but some property has never appear in specific `site_id`, and nan remains for these features.

## lags

Adding some lag feature

In [25]:
def add_lag_feature(weather_df, window=3):
    group_df = weather_df.groupby('site_id')
    cols = ['air_temperature', 'cloud_coverage', 'dew_temperature', 'precip_depth_1_hr', 'sea_level_pressure', 'wind_direction', 'wind_speed']
    rolled = group_df[cols].rolling(window=window, min_periods=0)
    lag_mean = rolled.mean().reset_index().astype(np.float16)
    lag_max = rolled.max().reset_index().astype(np.float16)
    lag_min = rolled.min().reset_index().astype(np.float16)
    lag_std = rolled.std().reset_index().astype(np.float16)
    for col in cols:
        weather_df[f'{col}_mean_lag{window}'] = lag_mean[col]
        weather_df[f'{col}_max_lag{window}'] = lag_max[col]
        weather_df[f'{col}_min_lag{window}'] = lag_min[col]
        weather_df[f'{col}_std_lag{window}'] = lag_std[col]

In [26]:
add_lag_feature(weather_train_df, window=3)
add_lag_feature(weather_train_df, window=72)

In [27]:
weather_train_df.head()

Unnamed: 0,site_id,timestamp,air_temperature,cloud_coverage,dew_temperature,precip_depth_1_hr,sea_level_pressure,wind_direction,wind_speed,offset,...,sea_level_pressure_min_lag72,sea_level_pressure_std_lag72,wind_direction_mean_lag72,wind_direction_max_lag72,wind_direction_min_lag72,wind_direction_std_lag72,wind_speed_mean_lag72,wind_speed_max_lag72,wind_speed_min_lag72,wind_speed_std_lag72
0,0,2015-12-31 19:00:00,25.0,6.0,20.0,-1.0,1019.7,0.0,0.0,5,...,1019.5,,0.0,0.0,0.0,,0.0,0.0,0.0,
1,0,2015-12-31 20:00:00,24.4,4.0,21.1,-1.0,1020.2,70.0,1.5,5,...,1019.5,0.353516,35.0,70.0,0.0,49.5,0.75,1.5,0.0,1.060547
2,0,2015-12-31 21:00:00,22.8,2.0,21.1,0.0,1020.2,0.0,0.0,5,...,1019.5,0.288574,23.328125,70.0,0.0,40.40625,0.5,1.5,0.0,0.866211
3,0,2015-12-31 22:00:00,21.1,2.0,20.6,0.0,1020.1,0.0,0.0,5,...,1019.5,0.238037,17.5,70.0,0.0,35.0,0.375,1.5,0.0,0.75
4,0,2015-12-31 23:00:00,20.0,2.0,20.0,-1.0,1020.0,250.0,2.6,5,...,1019.5,0.207397,64.0,250.0,0.0,108.3125,0.819824,2.599609,0.0,1.188477


In [28]:
weather_train_df.columns

Index(['site_id', 'timestamp', 'air_temperature', 'cloud_coverage',
       'dew_temperature', 'precip_depth_1_hr', 'sea_level_pressure',
       'wind_direction', 'wind_speed', 'offset', 'air_temperature_mean_lag3',
       'air_temperature_max_lag3', 'air_temperature_min_lag3',
       'air_temperature_std_lag3', 'cloud_coverage_mean_lag3',
       'cloud_coverage_max_lag3', 'cloud_coverage_min_lag3',
       'cloud_coverage_std_lag3', 'dew_temperature_mean_lag3',
       'dew_temperature_max_lag3', 'dew_temperature_min_lag3',
       'dew_temperature_std_lag3', 'precip_depth_1_hr_mean_lag3',
       'precip_depth_1_hr_max_lag3', 'precip_depth_1_hr_min_lag3',
       'precip_depth_1_hr_std_lag3', 'sea_level_pressure_mean_lag3',
       'sea_level_pressure_max_lag3', 'sea_level_pressure_min_lag3',
       'sea_level_pressure_std_lag3', 'wind_direction_mean_lag3',
       'wind_direction_max_lag3', 'wind_direction_min_lag3',
       'wind_direction_std_lag3', 'wind_speed_mean_lag3',
       'wind_spe

In [29]:
# categorize primary_use column to reduce memory on merge...

primary_use_list = building_meta_df['primary_use'].unique()
primary_use_dict = {key: value for value, key in enumerate(primary_use_list)} 
print('primary_use_dict: ', primary_use_dict)
building_meta_df['primary_use'] = building_meta_df['primary_use'].map(primary_use_dict)

gc.collect()

primary_use_dict:  {'Education': 0, 'Lodging/residential': 1, 'Office': 2, 'Entertainment/public assembly': 3, 'Other': 4, 'Retail': 5, 'Parking': 6, 'Public services': 7, 'Warehouse/storage': 8, 'Food sales and service': 9, 'Religious worship': 10, 'Healthcare': 11, 'Utility': 12, 'Technology/science': 13, 'Manufacturing/industrial': 14, 'Services': 15}


138

In [30]:
reduce_mem_usage(train_df, use_float16=True)
reduce_mem_usage(building_meta_df, use_float16=True)
reduce_mem_usage(weather_train_df, use_float16=True)

Memory usage of dataframe is 1724.40 MB
Memory usage after optimization is: 795.89 MB
Decreased by 53.8%
Memory usage of dataframe is 0.07 MB
Memory usage after optimization is: 0.02 MB
Decreased by 74.9%
Memory usage of dataframe is 25.59 MB
Memory usage after optimization is: 18.13 MB
Decreased by 29.2%


Unnamed: 0,site_id,timestamp,air_temperature,cloud_coverage,dew_temperature,precip_depth_1_hr,sea_level_pressure,wind_direction,wind_speed,offset,...,sea_level_pressure_min_lag72,sea_level_pressure_std_lag72,wind_direction_mean_lag72,wind_direction_max_lag72,wind_direction_min_lag72,wind_direction_std_lag72,wind_speed_mean_lag72,wind_speed_max_lag72,wind_speed_min_lag72,wind_speed_std_lag72
0,0,2015-12-31 19:00:00,25.000000,6.000000,20.000000,-1.0,1019.5,0.0,0.000000,5,...,1019.5,,0.000000,0.0,0.0,,0.000000,0.000000,0.0,
1,0,2015-12-31 20:00:00,24.406250,4.000000,21.093750,-1.0,1020.0,70.0,1.500000,5,...,1019.5,0.353516,35.000000,70.0,0.0,49.50000,0.750000,1.500000,0.0,1.060547
2,0,2015-12-31 21:00:00,22.796875,2.000000,21.093750,0.0,1020.0,0.0,0.000000,5,...,1019.5,0.288574,23.328125,70.0,0.0,40.40625,0.500000,1.500000,0.0,0.866211
3,0,2015-12-31 22:00:00,21.093750,2.000000,20.593750,0.0,1020.0,0.0,0.000000,5,...,1019.5,0.238037,17.500000,70.0,0.0,35.00000,0.375000,1.500000,0.0,0.750000
4,0,2015-12-31 23:00:00,20.000000,2.000000,20.000000,-1.0,1020.0,250.0,2.599609,5,...,1019.5,0.207397,64.000000,250.0,0.0,108.31250,0.819824,2.599609,0.0,1.188477
5,0,2016-01-01 00:00:00,19.406250,4.000000,19.406250,0.0,1019.5,0.0,0.000000,5,...,1019.5,0.231689,53.343750,250.0,0.0,100.31250,0.683105,2.599609,0.0,1.114258
6,0,2016-01-01 01:00:00,21.093750,6.000000,21.093750,-1.0,1019.5,0.0,0.000000,5,...,1019.5,0.305420,45.718750,250.0,0.0,93.81250,0.585938,2.599609,0.0,1.049805
7,0,2016-01-01 02:00:00,21.093750,6.000000,21.093750,0.0,1019.0,210.0,1.500000,5,...,1019.0,0.480957,66.250000,250.0,0.0,104.43750,0.700195,2.599609,0.0,1.024414
8,0,2016-01-01 03:00:00,20.593750,6.000000,20.000000,0.0,1018.0,0.0,0.000000,5,...,1018.0,0.713867,58.875000,250.0,0.0,100.18750,0.622070,2.599609,0.0,0.985840
9,0,2016-01-01 04:00:00,21.093750,6.000000,20.593750,0.0,1019.0,290.0,1.500000,5,...,1018.0,0.697266,82.000000,290.0,0.0,119.43750,0.709961,2.599609,0.0,0.970215


In [32]:
building_meta_df.head()

Unnamed: 0,site_id,building_id,primary_use,square_feet,year_built,floor_count
0,0,0,0,7432,2008.0,
1,0,1,0,2720,2004.0,
2,0,2,0,5376,1991.0,
3,0,3,0,23685,2002.0,
4,0,4,0,116607,1975.0,


## Train model

To win in kaggle competition, how to evaluate your model is important.
What kind of cross validation strategy is suitable for this competition? This is time series data, so it is better to consider time-splitting.

However this notebook is for simple tutorial, so I will proceed with KFold splitting without shuffling, so that at least near-term data is not included in validation.

In [33]:
category_cols = ['building_id', 'site_id', 'primary_use']  # , 'meter'
weather_cols = [
    'air_temperature', 'cloud_coverage',
    'dew_temperature', 'precip_depth_1_hr', 'sea_level_pressure',
    'wind_direction', 'wind_speed', 'air_temperature_mean_lag72',
    'air_temperature_max_lag72', 'air_temperature_min_lag72',
    'air_temperature_std_lag72', 'cloud_coverage_mean_lag72',
    'dew_temperature_mean_lag72', 'precip_depth_1_hr_mean_lag72',
    'sea_level_pressure_mean_lag72', 'wind_direction_mean_lag72',
    'wind_speed_mean_lag72', 'air_temperature_mean_lag3',
    'air_temperature_max_lag3',
    'air_temperature_min_lag3', 'cloud_coverage_mean_lag3',
    'dew_temperature_mean_lag3',
    'precip_depth_1_hr_mean_lag3', 'sea_level_pressure_mean_lag3',
    'wind_direction_mean_lag3', 'wind_speed_mean_lag3']
feature_cols = ['square_feet', 'year_built'] + [
    'hour', 'weekend', 'dayofweek', # 'month'
    'building_median'] + weather_cols

In [34]:
def create_X_y(train_df, target_meter):
    target_train_df = train_df[train_df['meter'] == target_meter]
    target_train_df = target_train_df.merge(building_meta_df, on='building_id', how='left')
    target_train_df = target_train_df.merge(weather_train_df, on=['site_id', 'timestamp'], how='left')
    X_train = target_train_df[feature_cols + category_cols]
    y_train = target_train_df['meter_reading_log1p'].values

    del target_train_df
    return X_train, y_train

In [35]:
X_train_0, y_train_0 = create_X_y(train_df, target_meter=0)
X_train_0.head(10)

Unnamed: 0,square_feet,year_built,hour,weekend,dayofweek,building_median,air_temperature,cloud_coverage,dew_temperature,precip_depth_1_hr,...,air_temperature_min_lag3,cloud_coverage_mean_lag3,dew_temperature_mean_lag3,precip_depth_1_hr_mean_lag3,sea_level_pressure_mean_lag3,wind_direction_mean_lag3,wind_speed_mean_lag3,building_id,site_id,primary_use
0,50623,,0,0.0,4,4.332031,3.800781,0.0,2.400391,,...,3.800781,0.0,2.400391,,1021.0,240.0,3.099609,105,1,0
1,5374,,0,0.0,4,0.559082,3.800781,0.0,2.400391,,...,3.800781,0.0,2.400391,,1021.0,240.0,3.099609,106,1,0
2,97532,2005.0,0,0.0,4,5.78125,3.800781,0.0,2.400391,,...,3.800781,0.0,2.400391,,1021.0,240.0,3.099609,107,1,0
3,81580,1913.0,0,0.0,4,5.449219,3.800781,0.0,2.400391,,...,3.800781,0.0,2.400391,,1021.0,240.0,3.099609,108,1,0
4,56995,1953.0,0,0.0,4,5.480469,3.800781,0.0,2.400391,,...,3.800781,0.0,2.400391,,1021.0,240.0,3.099609,109,1,0
5,27814,2006.0,0,0.0,4,5.742188,3.800781,0.0,2.400391,,...,3.800781,0.0,2.400391,,1021.0,240.0,3.099609,110,1,0
6,118338,1909.0,0,0.0,4,6.003906,3.800781,0.0,2.400391,,...,3.800781,0.0,2.400391,,1021.0,240.0,3.099609,111,1,0
7,32206,,0,0.0,4,3.056641,3.800781,0.0,2.400391,,...,3.800781,0.0,2.400391,,1021.0,240.0,3.099609,112,1,0
8,100481,1958.0,0,0.0,4,5.886719,3.800781,0.0,2.400391,,...,3.800781,0.0,2.400391,,1021.0,240.0,3.099609,113,1,0
9,139683,1958.0,0,0.0,4,6.726562,3.800781,0.0,2.400391,,...,3.800781,0.0,2.400391,,1021.0,240.0,3.099609,114,1,0


<a id="id1"></a>
# 1. Define "objective" function

To start hyperparameter tuning, we need an objective function to optimize the score.<br>
**`objective` method needs to receive `trial` object as args, and return "score" to be optimized.**<br>
(What is trial module? it is explained in next section) 

```
def objective(trial, ...):
    # calculate score...
    return score
```

Below example, I train LightGBM model (only for electricity meter), get best validation score, and return this validation score as the final score.

**The `objective` function is called many times** by `optuna` framework to try different hyperparameters, to search best hyperparameters.

In [None]:
import optuna
from optuna import Trial

optuna.__version__

In [None]:
debug = False

train_df_original = train_df
# Only use 10000 data,,, for fast computation for debugging.
train_df = train_df.sample(10000)

In [None]:
def objective(trial: Trial, fast_check=True, target_meter=0, return_info=False):
    folds = 5
    seed = 666
    shuffle = False
    kf = KFold(n_splits=folds, shuffle=shuffle, random_state=seed)

    X_train, y_train = create_X_y(train_df, target_meter=target_meter)
    y_valid_pred_total = np.zeros(X_train.shape[0])
    gc.collect()
    print('target_meter', target_meter, X_train.shape)

    cat_features = [X_train.columns.get_loc(cat_col) for cat_col in category_cols]
    print('cat_features', cat_features)

    models = []
    valid_score = 0
    for train_idx, valid_idx in kf.split(X_train, y_train):
        train_data = X_train.iloc[train_idx,:], y_train[train_idx]
        valid_data = X_train.iloc[valid_idx,:], y_train[valid_idx]

        print('train', len(train_idx), 'valid', len(valid_idx))
    #     model, y_pred_valid, log = fit_cb(train_data, valid_data, cat_features=cat_features, devices=[0,])
        model, y_pred_valid, log = fit_lgbm(trial, train_data, valid_data, cat_features=category_cols,
                                            num_rounds=1000)
        y_valid_pred_total[valid_idx] = y_pred_valid
        models.append(model)
        gc.collect()
        valid_score += log["valid/l2"]
        if fast_check:
            break
    valid_score /= len(models)
    if return_info:
        return valid_score, models, y_pred_valid, y_train
    else:
        return valid_score

Here I passed `trial` module to `fit_lgbm` function, which is the core training code and defines hyperparameters.<br>
Let's see inside of `fit_lgbm` module next.

<a id="id2"></a>
# 2. Use "trial" module to define hyperparameters dynamically!

trial module can be used to get hyperparameters.
As shown in below figure, we just need to get hyperparameters from `trial` module where we want to use hyper parameters!

<img src="https://optuna.org/assets/img/define-by-run.png"></img>
from [https://optuna.org/](https://optuna.org/)

This scheme is called "Define-by-Run" which makes user to intuitively write code to get hyperparameters, instead of defining whole search space in advance.

You can call these methods to get hyperparametes (ref: [Defining Parameter Spaces](https://optuna.readthedocs.io/en/latest/tutorial/configurations.html#defining-parameter-spaces)):

    # Categorical parameter
    optimizer = trial.suggest_categorical('optimizer', ['MomentumSGD', 'Adam'])

    # Int parameter
    num_layers = trial.suggest_int('num_layers', 1, 3)

    # Uniform parameter
    dropout_rate = trial.suggest_uniform('dropout_rate', 0.0, 1.0)

    # Loguniform parameter
    learning_rate = trial.suggest_loguniform('learning_rate', 1e-5, 1e-2)

    # Discrete-uniform parameter
    drop_path_rate = trial.suggest_discrete_uniform('drop_path_rate', 0.0, 1.0, 0.1)


In [None]:
# Referred https://github.com/pfnet/optuna/blob/master/examples/lightgbm_simple.py

def fit_lgbm(trial, train, val, devices=(-1,), seed=None, cat_features=None, num_rounds=1500):
    """Train Light GBM model"""
    X_train, y_train = train
    X_valid, y_valid = val
    metric = 'l2'
    params = {
        'num_leaves': trial.suggest_int('num_leaves', 2, 256),
        'objective': 'regression',
#               'max_depth': -1,
        'learning_rate': 0.1,
        "boosting": "gbdt",
        'lambda_l1': trial.suggest_loguniform('lambda_l1', 1e-8, 10.0),
        'lambda_l2': trial.suggest_loguniform('lambda_l2', 1e-8, 10.0),
        "bagging_freq": 5,
        "bagging_fraction": trial.suggest_uniform('bagging_fraction', 0.1, 1.0),
        "feature_fraction": trial.suggest_uniform('feature_fraction', 0.4, 1.0),
        "metric": metric,
        "verbosity": -1,
    }
    device = devices[0]
    if device == -1:
        # use cpu
        pass
    else:
        # use gpu
        print(f'using gpu device_id {device}...')
        params.update({'device': 'gpu', 'gpu_device_id': device})

    params['seed'] = seed

    early_stop = 20
    verbose_eval = 20

    d_train = lgb.Dataset(X_train, label=y_train, categorical_feature=cat_features)
    d_valid = lgb.Dataset(X_valid, label=y_valid, categorical_feature=cat_features)
    watchlist = [d_train, d_valid]

    print('training LGB:')
    model = lgb.train(params,
                      train_set=d_train,
                      num_boost_round=num_rounds,
                      valid_sets=watchlist,
                      verbose_eval=verbose_eval,
                      early_stopping_rounds=early_stop)

    # predictions
    y_pred_valid = model.predict(X_valid, num_iteration=model.best_iteration)
    
    print('best_score', model.best_score)
    log = {'train/l2': model.best_score['training']['l2'],
           'valid/l2': model.best_score['valid_1']['l2']}
    return model, y_pred_valid, log

<a id="id3"></a>
# 3. Make "study" and let optimize!

After you define `objective` function and write a code to use hyperparamers by `trial` module, we are ready to go!

Just 2 lines of code do the all troublesome hyperparameter tuning for you. That's all!!

In [None]:
study = optuna.create_study()
study.optimize(objective, n_trials=10)

## Summary of trial & study

 - **`trial` manages each execution of model training, evaluation, getting a score by specifying one trial of hyperparameter**
 - **`study` manages all history of `trial`. So that we can know best hyperparamers, and suggest next hyperparameters to search etc**


## Sampler

`optuna` provides many types of `Sampler` class to suggest next hyperparameters. (Ref: [sampler](https://optuna.readthedocs.io/en/stable/tutorial/sampler.html))

Default behavior is to use TPE algorithm, which is same with famous `hyperopt` library. This is kind of a bayesian optimization based sampling.


## FAQ: How to define objective functions that have own arguments?

`study.optimize` method receives `objective` method with `trial` args, how to pass other 2nd, 3rd argument?<br>
We can use class or lambda expression.<br>
For example, `study.optimize(lambda trial: objective(trial, arg0=1, arg1=2), n_trials=100)`<br>

Refer [How to define objective functions that have own arguments?](https://optuna.readthedocs.io/en/latest/faq.html#how-to-define-objective-functions-that-have-own-arguments) for details.

<a id="id4"></a>
# 4. [Advanced] Pruning unpromising trials for more faster search

Another important functionality is **"pruning"**. When human try hyperparameter tuning, we usually stop training earlier when the learning curve was much worse than the best known result.

`optuna` provides pruning feature, and also provides integration modules for famous ML framework so that user can easily try pruning during hyperparameter optimization.<br>
Supported modules are following

 - XGBoost: `optuna.integration.XGBoostPruningCallback`
 - LightGBM: `optuna.integration.LightGBMPruningCallback`
 - Chainer: `optuna.integration.ChainerPruningExtension`
 - Keras: `optuna.integration.KerasPruningCallback`
 - TensorFlow `optuna.integration.TensorFlowPruningHook`
 - tf.keras `optuna.integration.TFKerasPruningCallback`
 - MXNet `optuna.integration.MXNetPruningCallback`

See [pruning](https://optuna.readthedocs.io/en/stable/tutorial/pruning.html) section for details.

<img src="https://optuna.org/assets/img/pruning-example-with-caption.png"></img>

I just added `LightGBMPruningCallback` for LightGBM training.<br>
The difference is only 2 line. Define `pruning_callback` and set it in `lgb.train` method.

The example below will check validation score each iteration for pruning.

In [None]:
# Referred https://github.com/pfnet/optuna/blob/master/examples/pruning/lightgbm_integration.py

def fit_lgbm_with_pruning(trial, train, val, devices=(-1,), seed=None, cat_features=None, num_rounds=1500):
    """Train Light GBM model"""
    X_train, y_train = train
    X_valid, y_valid = val
    metric = 'l2'
    params = {
        'num_leaves': trial.suggest_int('num_leaves', 2, 256),
        'objective': 'regression',
#               'max_depth': -1,
        'learning_rate': 0.1,
        "boosting": "gbdt",
        'lambda_l1': trial.suggest_loguniform('lambda_l1', 1e-8, 10.0),
        'lambda_l2': trial.suggest_loguniform('lambda_l2', 1e-8, 10.0),
        "bagging_freq": 5,
        "bagging_fraction": trial.suggest_uniform('bagging_fraction', 0.1, 1.0),
        "feature_fraction": trial.suggest_uniform('feature_fraction', 0.4, 1.0),
        "metric": metric,
        "verbosity": -1,
    }
    device = devices[0]
    if device == -1:
        # use cpu
        pass
    else:
        # use gpu
        print(f'using gpu device_id {device}...')
        params.update({'device': 'gpu', 'gpu_device_id': device})

    params['seed'] = seed

    early_stop = 20
    verbose_eval = 20

    d_train = lgb.Dataset(X_train, label=y_train, categorical_feature=cat_features)
    d_valid = lgb.Dataset(X_valid, label=y_valid, categorical_feature=cat_features)
    watchlist = [d_train, d_valid]

    # Add a callback for pruning.
    pruning_callback = optuna.integration.LightGBMPruningCallback(trial, 'l2', valid_name='valid_1')    
    print('training LGB:')
    model = lgb.train(params,
                      train_set=d_train,
                      num_boost_round=num_rounds,
                      valid_sets=watchlist,
                      verbose_eval=verbose_eval,
                      early_stopping_rounds=early_stop,
                      callbacks=[pruning_callback])

    # predictions
    y_pred_valid = model.predict(X_valid, num_iteration=model.best_iteration)
    
    print('best_score', model.best_score)
    log = {'train/l2': model.best_score['training']['l2'],
           'valid/l2': model.best_score['valid_1']['l2']}
    return model, y_pred_valid, log

In [None]:
def objective_with_prune(trial: Trial, fast_check=True, target_meter=0):
    """This method is completely same with previous `objective` method except calling `fit_lgbm_with_pruning`"""
    folds = 5
    seed = 666
    shuffle = False
    kf = KFold(n_splits=folds, shuffle=shuffle, random_state=seed)

    X_train, y_train = create_X_y(train_df, target_meter=target_meter)
    y_valid_pred_total = np.zeros(X_train.shape[0])
    gc.collect()
    print('target_meter', target_meter, X_train.shape)

    cat_features = [X_train.columns.get_loc(cat_col) for cat_col in category_cols]
    print('cat_features', cat_features)

    models0 = []
    valid_score = 0
    for train_idx, valid_idx in kf.split(X_train, y_train):
        train_data = X_train.iloc[train_idx,:], y_train[train_idx]
        valid_data = X_train.iloc[valid_idx,:], y_train[valid_idx]

        print('train', len(train_idx), 'valid', len(valid_idx))
        model, y_pred_valid, log = fit_lgbm_with_pruning(trial, train_data, valid_data, cat_features=category_cols,
                                                         num_rounds=1000)
        y_valid_pred_total[valid_idx] = y_pred_valid
        models0.append(model)
        gc.collect()
        valid_score += log["valid/l2"]
        if fast_check:
            break
    valid_score /= len(models0)
    return valid_score

Then you can again create `study` object with **setting `pruner` object**, and let optimize!

`pruner` class defines strategy how to prune in each intermediate step.<br>
`MedianPruner` used in this example just check reported value each iteration, and if its score is higher than the median value of whole study history in this iteration, prunes.

In [None]:
study = optuna.create_study(pruner=optuna.pruners.MedianPruner(n_warmup_steps=5))
study.optimize(objective_with_prune, n_trials=50)

Now you can check this kind of message:

> [I 2019-10-28 13:03:33,408] Setting status of trial#15 as TrialState.PRUNED. Trial was pruned at iteration 6.

Some unpromised trial is pruned before totally taining the model, so that we can check more trial!

<a id="id5"></a>
# 5. Check study history to get best hyperparameters

Now we've done hyperparameter optimization. So we want to know the best hyperparameters.<br>
Whole `trial` history is managed in `study` object, we can get best trial by **`study.best_trial`**

In [None]:
print('Best trial: score {}, params {}'.format(study.best_trial.value, study.best_trial.params))

# It has alias, below is also same.
# print('Best trial: score {}, params {}'.format(study.best_value, study.best_params))

Another tips: If you want to see all the history of trials, you can get `pandas.DataFrame` format by calling `study.trials_dataframe()`.

In [None]:
trials_df = study.trials_dataframe()
trials_df

<a id="id6"></a>
# 6. Visualize study history to analayze the hyperparams-performance relationship

Since we have history of trials, we can analyze how each hyperparams affects to the performance.<br>
All the visualization below can be done in just one line!!!

(Acknowledgement: thank you @shotaro for let me know this feature)

## Optimization history

It is a best score's history, blue dot is the score of this trial and orange line show the best score.<br>
Note that blue dot is not in the all trial, because we turned on pruning thus many of the trials are stopped before getting final objective value.

In [None]:
optuna.visualization.plot_optimization_history(study)

## Visualize pruning history

You can visualize how the pruning is executed!<br>
Each color shows the loss curve of each trial.

In [None]:
optuna.visualization.plot_intermediate_values(study)

## Visualize plot slice

We can see how the optuna's sampler searches hyperparameter space.<br>
The white dots are searched in the early stage, while blue dots are searched in the later stage.

We can see following behavior (This is for the debugging small train_df training, actual dataset behavior maybe different!)

 - `feature_fraction`: Low value (0.4~0.8) got very but score at the begging of trial, and thus high value (0.9~1.0) are extensively searched in the later stage.
 - `num_leaves`      : Intermediate value around 32~64 are extensively searched in the later stage, which may be "reasonable".

In [None]:
optuna.visualization.plot_slice(study)

## Visualize plot contour

We can see 2 parametes pair plot with objective value as contour.

In [None]:
optuna.visualization.plot_contour(study)

## Visualize parallel_coordinate

When I check the best valued line, which is most blue color line starts from left-bottom, we can see following behavior:

 - higher high feature_fraction seems better
 - lower lambda_l1 seems better
 - lower lambda_l2 seems better
 - not too big num_leaves seems better
 
for debugging data. This is very small data, so it is reasonable that it should use all the feature and less regularization...

In [None]:
optuna.visualization.plot_parallel_coordinate(study)

## More examples?

See [https://github.com/pfnet/optuna/tree/master/examples/visualization](https://github.com/pfnet/optuna/tree/master/examples/visualization) !!

# Let's start actual hyperparameter optimization

Reharsal is done! Now let's proceed with the actual tuning.<br>
`study.optimize` can be configured **by time limitation using `timeout`**, instead of number of trials `n_trials`. It is convinient for kaggle kernel.

In [None]:
if not debug:
    # Restore to original data size for actual computation
    train_df = train_df_original
    timeout = 60 * 60 * 2  # For 2 hours...
else:
    timeout = 60 * 1       # Debug mode, only for 1 mins

print('train_df.shape', train_df.shape)
print(f'timeout {timeout /60} min')

In [None]:
study = optuna.create_study(pruner=optuna.pruners.SuccessiveHalvingPruner(min_resource=2, reduction_factor=4, min_early_stopping_rate=1))
study.optimize(objective_with_prune, timeout=timeout)

In [None]:
print(f'Executed {len(study.trials)} trials, best score {study.best_value} with best_params {study.best_params}')

In [None]:
trials_df = study.trials_dataframe()
trials_df.to_csv('trials_history.csv')

In [None]:
# optuna.visualization.plot_optimization_history(study)
fig = optuna.visualization._get_optimization_history_plot(study)
py.plot(fig, filename='optimization_history.html')
fig.show()

In [None]:
# optuna.visualization.plot_intermediate_values(study)
fig = optuna.visualization._get_intermediate_plot(study)
py.plot(fig, filename='intermediate_values.html')
fig.show()

In [None]:
# optuna.visualization.plot_slice(study)
fig = optuna.visualization._get_slice_plot(study)
py.plot(fig, filename='slice.html')
fig.show()

In [None]:
# optuna.visualization.plot_contour(study)
fig = optuna.visualization._get_contour_plot(study)
py.plot(fig, filename='contour.html')
fig.show()

In [None]:
# optuna.visualization.plot_parallel_coordinate(study)
fig = optuna.visualization._get_parallel_coordinate_plot(study)
py.plot(fig, filename='parallel_coordinate.html')
fig.show()

# Train model and make submission with best params

I just tuned parameter for meter type 0 and only for 1st fold above.<br>
But I use this parameter for all training for simplicity.

In [None]:
print("Training model with best_params {}".format(study.best_params))

When running the training again with specific hyperparameters, we can use `FixedTrial` object to reuse `objective` function which we defined before.<br>
`FixedTrial` object always returns fixed value, so we can run with `best_params`.

In [None]:
# For meter 0
valid_score, models0, y_pred_valid, y_train = objective(optuna.trial.FixedTrial(study.best_params), fast_check=False, target_meter=0, return_info=True)

sns.distplot(y_pred_valid, label='pred')
sns.distplot(y_train, label='ground truth')
plt.legend()
plt.show()

del y_pred_valid, y_train

In [None]:
# For meter 1
valid_score, models1, y_pred_valid, y_train = objective(optuna.trial.FixedTrial(study.best_params), fast_check=False, target_meter=1, return_info=True)

sns.distplot(y_pred_valid, label='pred')
sns.distplot(y_train, label='ground truth')
plt.legend()
plt.show()

del y_pred_valid, y_train

In [None]:
# For meter 2
valid_score, models2, y_pred_valid, y_train = objective(optuna.trial.FixedTrial(study.best_params), fast_check=False, target_meter=2, return_info=True)

sns.distplot(y_pred_valid, label='pred')
sns.distplot(y_train, label='ground truth')
plt.legend()
plt.show()

del y_pred_valid, y_train

In [None]:
# For meter 3
valid_score, models3, y_pred_valid, y_train = objective(optuna.trial.FixedTrial(study.best_params), fast_check=False, target_meter=3, return_info=True)

sns.distplot(y_pred_valid, label='pred')
sns.distplot(y_train, label='ground truth')
plt.legend()
plt.show()

del y_pred_valid, y_train

In [None]:
try:
    del train_df
    del train_df_original
    del weather_train_df
except:
    pass
gc.collect()

## Create test data

In [None]:
print('loading...')
test_df = pd.read_feather(root/'test.feather')
weather_test_df = pd.read_feather(root/'weather_test.feather')

print('preprocessing building...')
test_df['date'] = test_df['timestamp'].dt.date
preprocess(test_df)

print('preprocessing weather...')
weather_test_df = timestamp_align(weather_test_df)
weather_test_df = weather_test_df.groupby('site_id').apply(lambda group: group.interpolate(limit_direction='both'))
weather_test_df.groupby('site_id').apply(lambda group: group.isna().sum())

add_lag_feature(weather_test_df, window=3)
add_lag_feature(weather_test_df, window=72)

print('reduce mem usage...')
weather_test_df = weather_test_df[['site_id', 'timestamp'] + weather_cols]
reduce_mem_usage(test_df, use_float16=True)
reduce_mem_usage(weather_test_df, use_float16=True)

gc.collect()

In [None]:
del df_2d
del site_ids_offsets
gc.collect()

In [None]:
def create_X(test_df, target_meter):
    target_test_df = test_df[test_df['meter'] == target_meter]
    target_test_df = target_test_df.merge(building_meta_df, on='building_id', how='left', copy=False)
    target_test_df = pd.merge(target_test_df, building_stats_df, on=['building_id', 'meter'], how='left', copy=False)
    target_test_df = target_test_df.merge(weather_test_df, on=['site_id', 'timestamp'], how='left', copy=False)
    X_test = target_test_df[feature_cols + category_cols]
    return X_test

def pred(X_test, models, batch_size=1000000):
    iterations = (X_test.shape[0] + batch_size -1) // batch_size
    print('iterations', iterations)

    y_test_pred_total = np.zeros(X_test.shape[0])
    for i, model in enumerate(models):
        print(f'predicting {i}-th model')
        for k in tqdm(range(iterations)):
            y_pred_test = model.predict(X_test[k*batch_size:(k+1)*batch_size], num_iteration=model.best_iteration)
            y_test_pred_total[k*batch_size:(k+1)*batch_size] += y_pred_test

    y_test_pred_total /= len(models)
    return y_test_pred_total

In [None]:
%%time
X_test = create_X(test_df, target_meter=0)
gc.collect()

y_test0 = pred(X_test, models0)

del X_test
gc.collect()

In [None]:
%%time
X_test = create_X(test_df, target_meter=1)
gc.collect()
y_test1 = pred(X_test, models1)

del X_test
gc.collect()

In [None]:
%%time
X_test = create_X(test_df, target_meter=2)
gc.collect()
y_test2 = pred(X_test, models2)

del X_test
gc.collect()

In [None]:
%%time
X_test = create_X(test_df, target_meter=3)
gc.collect()
y_test3 = pred(X_test, models3)

del X_test
gc.collect()

In [None]:
sns.distplot(y_test0)
plt.title('test prediction for meter type 0')

In [None]:
sns.distplot(y_test1)
plt.title('test prediction for meter type 1')

In [None]:
sns.distplot(y_test2)
plt.title('test prediction for meter type 2')

In [None]:
sns.distplot(y_test3)
plt.title('test prediction for meter type 3')

In [None]:
sample_submission = pd.read_feather(os.path.join(root, 'sample_submission.feather'))
reduce_mem_usage(sample_submission)

In [None]:
print(np.sum(y_test0 < 0))
print(np.sum(y_test1 < 0))
print(np.sum(y_test2 < 0))
print(np.sum(y_test3 < 0))

y_test0 = np.where(y_test0 < 0, 0, y_test0)
y_test1 = np.where(y_test1 < 0, 0, y_test1)
y_test2 = np.where(y_test2 < 0, 0, y_test2)
y_test3 = np.where(y_test3 < 0, 0, y_test3)

In [None]:
sample_submission.loc[test_df['meter'] == 0, 'meter_reading'] = np.expm1(y_test0)
sample_submission.loc[test_df['meter'] == 1, 'meter_reading'] = np.expm1(y_test1)
sample_submission.loc[test_df['meter'] == 2, 'meter_reading'] = np.expm1(y_test2)
sample_submission.loc[test_df['meter'] == 3, 'meter_reading'] = np.expm1(y_test3)

In [None]:
sample_submission.to_csv('submission.csv', index=False, float_format='%.4f')

In [None]:
sample_submission.head()

In [None]:
np.log1p(sample_submission['meter_reading']).hist()

In [None]:
def plot_feature_importance(model):
    importance_df = pd.DataFrame(model.feature_importance(),
                                 index=feature_cols + category_cols,
                                 columns=['importance']).sort_values('importance')
    fig, ax = plt.subplots(figsize=(8, 8))
    importance_df.plot.barh(ax=ax)
    fig.show()

In [None]:
plot_feature_importance(models0[1])

In [None]:
plot_feature_importance(models1[1])

In [None]:
plot_feature_importance(models2[1])

In [None]:
plot_feature_importance(models3[1])

<a id="id10"></a>
# More to go

## Other features...

That's all for this tutorial. However `optuna` further provides other useful functionality.

 - Store trial information in Relational Database (SQLite, PostgreSQL, MySQL) rather than on memory, to save hyperparameter search history.
 - Parallel distributed optimization: you can run several process at the same time to asyncronously search hyperparameters with many CPUs in parallel.

See Official document, github for details!

 - [web site](https://optuna.org/)
 - [github: pfnet/optuna](https://github.com/pfnet/optuna)
 - [document](https://optuna.readthedocs.io/en/latest/)

**Recently LightGBMTuner is introduced (See [this PR#549](https://github.com/pfnet/optuna/pull/549)), maybe it is quite interesting to try!**

If this kernel helps you, please upvote to keep me motivated 😁<br>Thanks!