## Bike rental demand prediction using Machine Learning

### Preprocessing: 
To make the data suitable for machine learning, we are going to do some preprocessing, including, handling missing data, transform some columns, etc.

* Use One hot encoding or pd.get_dummies() to convert ordinal, binary and all other categorical columns to numeric
* Data Transformation (Optional): Standardization/Normalization/log/sqrt especially if you are using distance based algorithms like KNN, or Neural Networks.


### Import libraries

In [1]:
import pandas as pd
import numpy as np
import joblib
import warnings
warnings.filterwarnings('ignore')

In [2]:
df = pd.read_csv('../data/train.csv', header = 0, error_bad_lines=False)

In [3]:
## parse_datetime

# Convert the datetime col in datetime format
df['datetime'] = pd.to_datetime(df.datetime)

# extract month, day, , and hour from datetime
df['month'] = df['datetime'].dt.month
df['hour'] = df['datetime'].dt.hour

#rearrange columns
df = df.set_index('datetime')
df = df[['month', 'hour','season','holiday','workingday','weather','temp','atemp','humidity','windspeed','casual','registered','count']]

In [4]:
##feature_transformation (Creat a new variable having categories like weekend, holiday & working day) 

df.loc[(df['holiday']==0) & (df['workingday']==0),'day_typ'] = 'weekend'
df.loc[(df['holiday']==1),'day_typ'] = 'holiday'
df.loc[(df['holiday']==0) & (df['workingday']==1),'day_typ'] = 'workday'


In [5]:
#Create dummies for each variable in one_hot_var and merging dummies dataframe to our original dataframe
cat_features = ['season','day_typ','weather']

for i in cat_features:
    init = pd.get_dummies(df[i], prefix = i)
    df = df.join(init)

In [6]:
##Dimensionality reduction using Principal Component Analysis (PCA)
from sklearn.decomposition import PCA

pca = PCA(n_components=1, random_state=42)
df['mtemp'] = pca.fit_transform(df[['temp','atemp']])
df.drop(['temp','atemp'], axis=1, inplace=True)

In [7]:
##Create X and y

X = df.drop(['season', 'holiday', 'day_typ','workingday','weather', 'casual', 'registered', 'count'], axis=1)
y = np.log(df['count'])

In [8]:
#feature scaling/normalization
from sklearn.preprocessing import StandardScaler

numerical_features = ['mtemp','humidity','windspeed']
scaler = StandardScaler() 
X.loc[:, numerical_features] = scaler.fit_transform(X[numerical_features])


### Build & Compare Different ML Regression Models

In [9]:
from sklearn.ensemble import RandomForestRegressor
RF = RandomForestRegressor()
RF.fit(X, y)

RandomForestRegressor()

In [10]:
## Serializing:
# PCA
joblib.dump(pca, 'pca.joblib') 

# Scaler
joblib.dump(scaler, 'scaler.joblib')

# Trained model
joblib.dump(RF, 'bike-model.joblib')

['bike-model.joblib']

In [11]:
X.columns

Index(['month', 'hour', 'humidity', 'windspeed', 'season_1', 'season_2',
       'season_3', 'season_4', 'day_typ_holiday', 'day_typ_weekend',
       'day_typ_workday', 'weather_1', 'weather_2', 'weather_3', 'weather_4',
       'mtemp'],
      dtype='object')

In [12]:
X

Unnamed: 0_level_0,month,hour,humidity,windspeed,season_1,season_2,season_3,season_4,day_typ_holiday,day_typ_weekend,day_typ_workday,weather_1,weather_2,weather_3,weather_4,mtemp
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
2011-01-01 00:00:00,1,0,0.993213,-1.567754,1,0,0,0,0,1,0,1,0,0,0,1.207544
2011-01-01 01:00:00,1,1,0.941249,-1.567754,1,0,0,0,0,1,0,1,0,0,0,1.304715
2011-01-01 02:00:00,1,2,0.941249,-1.567754,1,0,0,0,0,1,0,1,0,0,0,1.304715
2011-01-01 03:00:00,1,3,0.681430,-1.567754,1,0,0,0,0,1,0,1,0,0,0,1.207544
2011-01-01 04:00:00,1,4,0.681430,-1.567754,1,0,0,0,0,1,0,1,0,0,0,1.207544
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2012-12-19 19:00:00,12,19,-0.617666,1.617227,0,0,0,1,0,0,1,1,0,0,0,0.528627
2012-12-19 20:00:00,12,20,-0.253919,0.269704,0,0,0,1,0,0,1,1,0,0,0,0.722781
2012-12-19 21:00:00,12,21,-0.046064,0.269704,0,0,0,1,0,0,1,1,0,0,0,0.868444
2012-12-19 22:00:00,12,22,-0.046064,-0.832442,0,0,0,1,0,0,1,1,0,0,0,0.771140
