### Dataset Information

<b>Dataset</b>: Bike Sharing Demand Dataset

<a href='http://archive.ics.uci.edu/ml/datasets/bike+sharing+dataset' target="_blank">http://archive.ics.uci.edu/ml/datasets/bike+sharing+dataset</a>

This dataset contains the hourly and daily count of rental bikes between years 2011 and 2012 in Capital bikeshare system with the corresponding weather and seasonal information.

Number of instances: 17379

Number of Features: 14

<b>Feature Information </b>
- instant: record index
- dteday : date
- season : season (1:springer, 2:summer, 3:fall, 4:winter)
- yr : year (0: 2011, 1:2012)
- mnth : month ( 1 to 12)
- hr : hour (0 to 23)
- holiday : weather day is holiday or not (extracted from [Web Link])
- weekday : day of the week
- workingday : if day is neither weekend nor holiday is 1, otherwise is 0.
+ weathersit : 
- 1: Clear, Few clouds, Partly cloudy, Partly cloudy
- 2: Mist + Cloudy, Mist + Broken clouds, Mist + Few clouds, Mist
- 3: Light Snow, Light Rain + Thunderstorm + Scattered clouds, Light Rain + Scattered clouds
- 4: Heavy Rain + Ice Pallets + Thunderstorm + Mist, Snow + Fog
- temp : Normalized temperature in Celsius. The values are derived via (t-t_min)/(t_max-t_min), t_min=-8, t_max=+39 (only in hourly scale)
- atemp: Normalized feeling temperature in Celsius. The values are derived via (t-t_min)/(t_max-t_min), t_min=-16, t_max=+50 (only in hourly scale)
- hum: Normalized humidity. The values are divided to 100 (max)
- windspeed: Normalized wind speed. The values are divided to 67 (max)

<b>What we want Model to Predict?</b>

Model should predict either one of the following
1. casual: count of casual users
2. registered: count of registered users
3. cnt: count of total rental bikes including both casual and registered


<font color='green'>We will use Entity Embedding for Categorical columns in this example.</font>

### Load Dataset
Dataset has two CSV files. One has data by Day while other has data by the hour. We are using 'hour' dataset but feel free to use either one.

In [0]:
import pandas as pd

In [2]:
df = pd.read_csv('/gdrive/My Drive/AI-ML/hour.csv')
df.head()

Unnamed: 0,instant,dteday,season,yr,mnth,hr,holiday,weekday,workingday,weathersit,temp,atemp,hum,windspeed,casual,registered,cnt
0,1,2011-01-01,1,0,1,0,0,6,0,1,0.24,0.2879,0.81,0.0,3,13,16
1,2,2011-01-01,1,0,1,1,0,6,0,1,0.22,0.2727,0.8,0.0,8,32,40
2,3,2011-01-01,1,0,1,2,0,6,0,1,0.22,0.2727,0.8,0.0,5,27,32
3,4,2011-01-01,1,0,1,3,0,6,0,1,0.24,0.2879,0.75,0.0,3,10,13
4,5,2011-01-01,1,0,1,4,0,6,0,1,0.24,0.2879,0.75,0.0,0,1,1


In [3]:
df.shape

(17379, 17)

### Analyze Data

Check for any missing values

In [4]:
df.isnull().sum()

instant       0
dteday        0
season        0
yr            0
mnth          0
hr            0
holiday       0
weekday       0
workingday    0
weathersit    0
temp          0
atemp         0
hum           0
windspeed     0
casual        0
registered    0
cnt           0
dtype: int64

Check Datatypes for all columns

In [5]:
df.dtypes

instant         int64
dteday         object
season          int64
yr              int64
mnth            int64
hr              int64
holiday         int64
weekday         int64
workingday      int64
weathersit      int64
temp          float64
atemp         float64
hum           float64
windspeed     float64
casual          int64
registered      int64
cnt             int64
dtype: object

### Preprare Data for training

Remove following columns as they are not useful for Model Training

1. instant - Record ID
2. dteday - This information is available in other columns

In [6]:
df.drop(labels=['instant', 'dteday'], axis=1, inplace=True)
df.head()

Unnamed: 0,season,yr,mnth,hr,holiday,weekday,workingday,weathersit,temp,atemp,hum,windspeed,casual,registered,cnt
0,1,0,1,0,0,6,0,1,0.24,0.2879,0.81,0.0,3,13,16
1,1,0,1,1,0,6,0,1,0.22,0.2727,0.8,0.0,8,32,40
2,1,0,1,2,0,6,0,1,0.22,0.2727,0.8,0.0,5,27,32
3,1,0,1,3,0,6,0,1,0.24,0.2879,0.75,0.0,3,10,13
4,1,0,1,4,0,6,0,1,0.24,0.2879,0.75,0.0,0,1,1


### Handling Categorical Data
Some of the columns in the dataset have categorical values i.e only few possible values.

e.g season column can be either 1, 2, 3 or 4. 

We will need to deal with Categorical data differently. Lets identify categorical columns by checking the unique values in them.

In [7]:
df.season.unique()

array([1, 2, 3, 4])

In [8]:
df.yr.unique()

array([0, 1])

In [9]:
df.mnth.unique()

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12])

In [10]:
df.hr.unique()

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23])

In [11]:
df.holiday.unique()

array([0, 1])

In [12]:
df.weekday.unique()

array([6, 0, 1, 2, 3, 4, 5])

In [13]:
df.workingday.unique()

array([0, 1])

In [14]:
df.weathersit.unique()

array([1, 2, 3, 4])

Lets drop 'yr' column as it may not be useful to predict future demand

In [0]:
df.drop(labels=['yr'], axis=1, inplace=True)

#### What are the Categorical Columns here

Based on the unique values in each Column, the following Columns are Categorical:

1. season
2. mnth
3. hr
4. weekday
5. weathersit

In this example, we will let Neural network to build encodings for these Categorical values. This approach is also call Entity Embedding. 

### Features vs Target

In this example, we will predict count of total rental bikes i.e 'cnt' column.

In [16]:
y = df[['cnt']]
y.shape

(17379, 1)

Drop the following columns as they are not be used as Input features
- 'cnt', 'registered' and 'casual' 

In [18]:
df.drop(labels=['cnt','registered','casual'],axis=1,inplace=True)

KeyError: ignored

### Split data between Training and Test

In [0]:
from sklearn.model_selection import train_test_split

In [0]:
train_x, test_x, train_y, test_y = train_test_split(df, y, test_size=0.25)

In [21]:
train_x.shape, train_y.shape

((13034, 11), (13034, 1))

In [22]:
test_x.shape, test_y.shape

((4345, 11), (4345, 1))

### Build the model

Load tensorflow library

In [0]:
import tensorflow as tf

In [0]:
tf.keras.backend.clear_session()

In this case, our model will have 6 inputs

1. One input for all Continuous value columns - 'holiday', 'workingday','temp', 'atemp', 'hum', 'windspeed'
2. Input for 'season'
3. Input for 'mnth'
4. Input for 'hr'
5. Input for 'weekday'
6. Input for 'weathersit'

For Categorical columns, we will create an Input Layer followed by Embedding Layer. We can decide an Embedding size of our choice for each categorical column.

In [0]:
#Input layer for Continuous value columns
input_1 = tf.keras.layers.Input(shape=(6,))

Build Input and Embedding Layer for each of the Categorical Column

In [26]:
#season column
input_2 = tf.keras.layers.Input(shape=(1,))
embed_2 = tf.keras.layers.Embedding(input_dim=5, #Possible input values : 0 to 4 in this case
                                    output_dim=10, #Embedding size - how many numbers to use
                                    input_length=1 #How many input values to be fed per example
                                   )(input_2)

Instructions for updating:
Colocations handled automatically by placer.


In [0]:
#mnth column
input_3 = tf.keras.layers.Input(shape=(1,))
embed_3 = tf.keras.layers.Embedding(input_dim=13, #Possible input values : 0 to 12 in this case
                                    output_dim=15, #Embedding size - how many numbers to use
                                    input_length=1 #How many input values to be fed per example
                                   )(input_3)

#hr column
input_4 = tf.keras.layers.Input(shape=(1,))
embed_4 = tf.keras.layers.Embedding(input_dim=24, #Possible input values : 0 to 23 in this case
                                    output_dim=12, #Embedding size - how many numbers to use
                                    input_length=1 #How many input values to be fed per example
                                   )(input_4)

#weekday column
input_5 = tf.keras.layers.Input(shape=(1,))
embed_5 = tf.keras.layers.Embedding(input_dim=7, #Possible input values : 0 to 6 in this case
                                    output_dim=12, #Embedding size - how many numbers to use
                                    input_length=1 #How many input values to be fed per example
                                   )(input_5)

#weathersit column
input_6 = tf.keras.layers.Input(shape=(1,))
embed_6 = tf.keras.layers.Embedding(input_dim=5, #Possible input values : 0 to 4 in this case
                                    output_dim=8, #Embedding size - how many numbers to use
                                    input_length=1 #How many input values to be fed per example
                                   )(input_6)

Check Output shape of Embedding layer

In [28]:
embed_2.shape

TensorShape([Dimension(None), Dimension(1), Dimension(10)])

Concatenate all Embeddings and Reshape

In [0]:
x = tf.keras.layers.concatenate([embed_2, embed_3, embed_4, embed_5, embed_6])

In [30]:
x.shape

TensorShape([Dimension(None), Dimension(1), Dimension(57)])

In [0]:
x = tf.keras.layers.Reshape((57,))(x)

In [32]:
x.shape

TensorShape([Dimension(None), Dimension(57)])

Concatenate Embeddings with Input for Continuous value columns

In [0]:
x = tf.keras.layers.concatenate([input_1, x])

In [34]:
x.shape

TensorShape([Dimension(None), Dimension(63)])

Add hidden layers with dropout

In [35]:
x = tf.keras.layers.Dense(200, activation='relu')(x)
x = tf.keras.layers.Dropout(0.4)(x)

x = tf.keras.layers.Dense(100, activation='relu')(x)
x = tf.keras.layers.Dropout(0.4)(x)

x = tf.keras.layers.Dense(60, activation='relu')(x)
x = tf.keras.layers.Dropout(0.25)(x)

x = tf.keras.layers.Dense(30, activation='relu')(x)

Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.


Add Output Layer

In [0]:
model_output = tf.keras.layers.Dense(1)(x)

#### Build a Non-Sequential Model

In [0]:
model = tf.keras.Model(inputs=[input_1, input_2, input_3, input_4, input_5, input_6], #6 inputs including 5 Categorical
                       outputs=model_output)

Specify Optimizer and Loss function for the model

In [38]:
model.compile(optimizer='adam', loss='mse')

Instructions for updating:
Use tf.cast instead.


In [39]:
#Check Model
model.summary()

__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_2 (InputLayer)            (None, 1)            0                                            
__________________________________________________________________________________________________
input_3 (InputLayer)            (None, 1)            0                                            
__________________________________________________________________________________________________
input_4 (InputLayer)            (None, 1)            0                                            
__________________________________________________________________________________________________
input_5 (InputLayer)            (None, 1)            0                                            
__________________________________________________________________________________________________
input_6 (I

### Train the Model

We need to provide 6 inputs for both training and Validation Data

In [40]:
model.fit([train_x[[ 'holiday', 'workingday','temp', 'atemp', 'hum', 'windspeed']], 
           train_x[['season']], train_x[['mnth']], train_x[['hr']], train_x[['weekday']], train_x[['weathersit']]],
          train_y, 
          validation_data=([test_x[[ 'holiday', 'workingday','temp', 'atemp', 'hum', 'windspeed']],
                            test_x[['season']], test_x[['mnth']], test_x[['hr']],
                            test_x[['weekday']], test_x[['weathersit']]], test_y),
          epochs=100, 
          batch_size=100)

Train on 13034 samples, validate on 4345 samples
Instructions for updating:
Use tf.cast instead.
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Ep

<tensorflow.python.keras.callbacks.History at 0x7f768d1290b8>

### Model Prediction
Prediction on first test example. You will need to feed 6 inputs as thats how model was trained.

In [0]:
#Prediction
model.predict([test_x[[ 'holiday', 'workingday','temp', 'atemp', 'hum', 'windspeed']][0:1],
               test_x[['season']][0:1], test_x[['mnth']][0:1], test_x[['hr']][0:1],
               test_x[['weekday']][0:1], test_x[['weathersit']][0:1]])[0]

In [0]:
#Actual
test_y[0:1]

<font color='blue'>Try changing number of hidden layers, number of neurons in each layer. Change amount of Dropout in between layers to improve the model. Can you change the model to predict both count of 'registered' and 'casual' users?</font> 