# Feature Engineering

In [51]:
import pandas as pd
import numpy as np

import joblib

In [52]:
data = joblib.load('../data/01_data.pkl')

In [53]:
data.head()

Unnamed: 0,rented_bike_count,hour,temperature,humidity,wind_speed,visibility,solar_radiation,rainfall,snowfall,seasons,holiday
264,133,0,-11.0,51,1.1,2000,0.0,0.0,0.0,Winter,No Holiday
265,127,1,-11.2,51,1.1,2000,0.0,0.0,0.0,Winter,No Holiday
266,95,2,-11.5,50,0.7,2000,0.0,0.0,0.0,Winter,No Holiday
267,54,3,-11.6,50,2.2,1995,0.0,0.0,0.0,Winter,No Holiday
268,46,4,-11.6,47,2.1,1982,0.0,0.0,0.0,Winter,No Holiday


### Feature Engingeering

What we need to do from Part 01:
- Turn the following features into binary features: `rainfall`, `snowfall`, `solar_radiation`.

In [54]:
for col in ['rainfall', 'snowfall', 'solar_radiation']:

    data['binary_' + col] = np.where(data.loc[:, col] > 0, 1, 0)

We will keep the original features, but exclude them in the modelling. This way we can build models with both binary and orginal versions and determine if this is useful.

### Dealing with other categorical features

In [55]:
data.holiday.value_counts()

No Holiday    8064
Holiday        432
Name: holiday, dtype: int64

In [56]:
data.seasons.value_counts()

Spring    2208
Summer    2208
Autumn    2184
Winter    1896
Name: seasons, dtype: int64

`holiday`: map to a binary features

`seasons`: Create dummy columns for each season. Remove one season for linear independence.

### Turn `holiday` into a binary feature

In [57]:
data['holiday'] = np.where(data.holiday == 'Holiday', 1, 0)

In [58]:
data.holiday.value_counts()

0    8064
1     432
Name: holiday, dtype: int64

### Create dummy features for `seasons`

In [59]:
data_dummies = pd.get_dummies(data)

# Drop to maintain linear independence
data_dummies.drop(columns = ['seasons_Winter'], inplace=True)

### Check data

In [60]:
data_dummies.head()

Unnamed: 0,rented_bike_count,hour,temperature,humidity,wind_speed,visibility,solar_radiation,rainfall,snowfall,holiday,binary_rainfall,binary_snowfall,binary_solar_radiation,seasons_Autumn,seasons_Spring,seasons_Summer
264,133,0,-11.0,51,1.1,2000,0.0,0.0,0.0,0,0,0,0,0,0,0
265,127,1,-11.2,51,1.1,2000,0.0,0.0,0.0,0,0,0,0,0,0,0
266,95,2,-11.5,50,0.7,2000,0.0,0.0,0.0,0,0,0,0,0,0,0
267,54,3,-11.6,50,2.2,1995,0.0,0.0,0.0,0,0,0,0,0,0,0
268,46,4,-11.6,47,2.1,1982,0.0,0.0,0.0,0,0,0,0,0,0,0


### save csv

In [61]:
joblib.dump(data_dummies, '../data/02_data.pkl')

['../data/02_data.pkl']