### Bike Sharing Dataset

https://www.pluralsight.com/guides/regression-keras

Data Repo link: <href> https://archive.ics.uci.edu/ml/datasets/Bike+Sharing+Dataset# </href>

Data Set Information:

Bike sharing systems are new generation of traditional bike rentals where whole process from membership, rental and return back has become automatic. Through these systems, user is able to easily rent a bike from a particular position and return back at another position. Currently, there are about over 500 bike-sharing programs around the world which is composed of over 500 thousands bicycles. Today, there exists great interest in these systems due to their important role in traffic, environmental and health issues.

Apart from interesting real world applications of bike sharing systems, the characteristics of data being generated by these systems make them attractive for the research. Opposed to other transport services such as bus or subway, the duration of travel, departure and arrival position is explicitly recorded in these systems. This feature turns bike sharing system into a virtual sensor network that can be used for sensing mobility in the city. Hence, it is expected that most of important events in the city could be detected via monitoring these data.


Attribute Information:

Both hour.csv and day.csv have the following fields, except hr which is not available in day.csv

- instant: record index
- dteday : date
- season : season (1:winter, 2:spring, 3:summer, 4:fall)
- yr : year (0: 2011, 1:2012)
- mnth : month ( 1 to 12)
- hr : hour (0 to 23)
- holiday : weather day is holiday or not (extracted from [Web Link])
- weekday : day of the week
- workingday : if day is neither weekend nor holiday is 1, otherwise is 0.
+ weathersit :
- 1: Clear, Few clouds, Partly cloudy, Partly cloudy
- 2: Mist + Cloudy, Mist + Broken clouds, Mist + Few clouds, Mist
- 3: Light Snow, Light Rain + Thunderstorm + Scattered clouds, Light Rain + Scattered clouds
- 4: Heavy Rain + Ice Pallets + Thunderstorm + Mist, Snow + Fog
- temp : Normalized temperature in Celsius. The values are derived via (t-t_min)/(t_max-t_min), t_min=-8, t_max=+39 (only in hourly scale)
- atemp: Normalized feeling temperature in Celsius. The values are derived via (t-t_min)/(t_max-t_min), t_min=-16, t_max=+50 (only in hourly scale)
- hum: Normalized humidity. The values are divided to 100 (max)
- windspeed: Normalized wind speed. The values are divided to 67 (max)
- casual: count of casual users
- registered: count of registered users
- cnt: count of total rental bikes including both casual and registered



In [2]:
import pandas as pd
import numpy as np

import tensorflow as tf

import sklearn
from sklearn.metrics import mean_squared_error, r2_score

In [3]:
df=pd.read_csv("/home/ruchisaboo/Downloads/Bike-Sharing-Dataset/hour.csv")

In [4]:
df.head()

Unnamed: 0,instant,dteday,season,yr,mnth,hr,holiday,weekday,workingday,weathersit,temp,atemp,hum,windspeed,casual,registered,cnt
0,1,2011-01-01,1,0,1,0,0,6,0,1,0.24,0.2879,0.81,0.0,3,13,16
1,2,2011-01-01,1,0,1,1,0,6,0,1,0.22,0.2727,0.8,0.0,8,32,40
2,3,2011-01-01,1,0,1,2,0,6,0,1,0.22,0.2727,0.8,0.0,5,27,32
3,4,2011-01-01,1,0,1,3,0,6,0,1,0.24,0.2879,0.75,0.0,3,10,13
4,5,2011-01-01,1,0,1,4,0,6,0,1,0.24,0.2879,0.75,0.0,0,1,1


In [5]:
df.shape

(17379, 17)

In [6]:
df=df.iloc[:,1:]
df.shape

(17379, 16)

In [7]:
#converting to datetime object and extract days information from dteday column
df['dteday'] = pd.to_datetime(df['dteday'])
df['day'] = pd.DatetimeIndex(df['dteday']).day
df.drop(['dteday'], axis=1, inplace=True)

In [8]:
#Checking null values
df.isnull().sum()

season        0
yr            0
mnth          0
hr            0
holiday       0
weekday       0
workingday    0
weathersit    0
temp          0
atemp         0
hum           0
windspeed     0
casual        0
registered    0
cnt           0
day           0
dtype: int64

In [9]:
#checking datatype
df.dtypes

season          int64
yr              int64
mnth            int64
hr              int64
holiday         int64
weekday         int64
workingday      int64
weathersit      int64
temp          float64
atemp         float64
hum           float64
windspeed     float64
casual          int64
registered      int64
cnt             int64
day             int64
dtype: object

In [10]:
df.columns

Index(['season', 'yr', 'mnth', 'hr', 'holiday', 'weekday', 'workingday',
       'weathersit', 'temp', 'atemp', 'hum', 'windspeed', 'casual',
       'registered', 'cnt', 'day'],
      dtype='object')

In [11]:
category_features = ['season', 'holiday', 'mnth', 'hr', 'weekday', 'workingday', 'weathersit']
number_features = ['temp', 'atemp', 'hum', 'windspeed']

converting categorical column to dummy variables

In [12]:
df1=pd.get_dummies(df['season'],drop_first=True,prefix='season')
df2=pd.get_dummies(df['holiday'],drop_first=True,prefix='holiday')
df3=pd.get_dummies(df['workingday'],drop_first=True,prefix='workingday')
df4=pd.get_dummies(df['weathersit'],drop_first=True,prefix='weathersit')

In [13]:
df.drop(['season','holiday','workingday','weathersit'],axis=1,inplace=True)
df=pd.concat([df,df1,df2,df3,df4],axis=1)
df.head()

Unnamed: 0,yr,mnth,hr,weekday,temp,atemp,hum,windspeed,casual,registered,cnt,day,season_2,season_3,season_4,holiday_1,workingday_1,weathersit_2,weathersit_3,weathersit_4
0,0,1,0,6,0.24,0.2879,0.81,0.0,3,13,16,1,0,0,0,0,0,0,0,0
1,0,1,1,6,0.22,0.2727,0.8,0.0,8,32,40,1,0,0,0,0,0,0,0,0
2,0,1,2,6,0.22,0.2727,0.8,0.0,5,27,32,1,0,0,0,0,0,0,0,0
3,0,1,3,6,0.24,0.2879,0.75,0.0,3,10,13,1,0,0,0,0,0,0,0,0
4,0,1,4,6,0.24,0.2879,0.75,0.0,0,1,1,1,0,0,0,0,0,0,0,0


In [14]:
#Doing train test split
train=df.iloc[:14000]
test=df.iloc[14000:]

In [15]:
train_x, test_x = train.drop(['cnt'],axis=1), test.drop(['cnt'],axis=1)
train_y, test_y = train['cnt'], test['cnt']

In [16]:
train_x.shape, test_x.shape, train_y.shape, test_y.shape

((14000, 19), (3379, 19), (14000,), (3379,))

### Define model

In [30]:
model1 = tf.keras.models.Sequential([
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(512,input_dim=19, activation=tf.nn.relu),
tf.keras.layers.Dropout(0.2), 
tf.keras.layers.Dense(1)
])


In [31]:
optimiser = tf.keras.optimizers.Adam()
model1.compile (optimizer= optimiser, loss='mean_squared_error', metrics = ['mean_squared_error'])

In [32]:
model1.fit(train_x, train_y, epochs=20)

Epoch 1/20


To change all layers to have dtype float64 by default, call `tf.keras.backend.set_floatx('float64')`. To change just this layer, pass dtype='float64' to the layer constructor. If you are the author of this layer, you can disable autocasting by passing autocast=False to the base Layer constructor.

Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<tensorflow.python.keras.callbacks.History at 0x7f42d80c9cf8>

### Predict on the Test Data and Compute Evaluation Metrics

In [33]:
pred_train= model1.predict(train_x)
pred= model1.predict(test_x)

In [34]:
print('Mean squared error for training data : ', np.sqrt(mean_squared_error(train_y,pred_train)))
print('Mean squared error for testing data : ', np.sqrt(mean_squared_error(test_y,pred))) 

Mean squared error for training data :  2.4999572198855557
Mean squared error for testing data :  3.1310356362561134


In [35]:
print("R2 score for training data : ",r2_score(train_y,pred_train))
print("R2 score for testing data : ",r2_score(test_y,pred))

R2 score for training data :  0.9997779448853589
R2 score for testing data :  0.9997974491141796
