# # Bike Sharing Assignment

## Introduction
#### A US bike-sharing provider BoomBikes has recently suffered considerable dips in their revenues due to the ongoing Corona pandemic. They want to understand the factors affecting the demand for these shared bikes in the American market. 
#### The company wants to know:
 - Which variables are significant in predicting the demand for shared bikes.
 - How well those variables describe the bike demand
 
#### Model the demand for shared bikes with the available independent variables. It will be used by the management to understand how exactly the demands vary with different features. They can accordingly manipulate the business strategy to meet the demand levels and meet the customer's expectations.

### Action Items:
- Step 1: Importing required libraries
- Step 2: Reading and Understanding the Data
- Step 3: Visualising the Data
- Step 4: Data Preparation
- Step 5: Splitting the Data into Training and Testing Sets
- Step 6: Building a linear model
- Step 7: Residual Analysis of the train data
- Step 8: Making predictions on test data and evaluating
- Step 9: Model Evaluation
- Step 10: Suggestions and Inferences

## Step 1: Importing required libraries

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

import sklearn

import statsmodels.api as sm

import warnings
warnings.filterwarnings('ignore')

## Step 2: Reading and Understanding the Data

In [None]:
day = pd.read_csv('../input/bike-sharing-dataset/day.csv')
day.head()

In [None]:
day.shape

In [None]:
day.info()

### Finding the numerical variables

In [None]:
day_desc = day.describe()
day_desc_cols = day_desc.columns
day_desc

## Step 3: Visualizing the data

In [None]:
day_desc_cols

In [None]:
plt.figure(figsize = (16, 16))
sns.pairplot(day[day_desc_cols])
plt.show()

In [None]:
plt.figure(figsize = (16, 10))
sns.heatmap(day.corr(), annot = True, cmap="YlGnBu")
plt.show()

### Trend set by number of bikes rented

In [None]:
g=sns.FacetGrid(day, hue='yr', palette='coolwarm',height=6,aspect=2)
g=g.map(plt.hist,'cnt',alpha=0.5, edgecolor='black')
plt.legend()

### Number of bikes rented per season in years 2018 and 2019.

In [None]:
plt.figure(figsize=(11,5))
sns.barplot('yr','casual',hue='season', data=day, palette='rainbow', ci=None)
plt.xlabel('Year')
plt.ylabel('Total number of bikes rented on Casual basis')
plt.title('Number of bikes rented per season')

##### We can clearly see that more bikes were rented in 2019 across all seasons

### User trends across the seasons for working days and non-working days

In [None]:
df_season_winter=day[day['season']==4]
df_season_fall=day[day['season']==3]
df_season_summer=day[day['season']==2]
df_season_spring=day[day['season']==1]

In [None]:
df_season_winter.mnth.nunique()
df_season_fall.mnth.nunique()
df_season_summer.mnth.nunique()
df_season_spring.mnth.nunique()

In [None]:
sns.factorplot('mnth','cnt',hue='workingday',data = df_season_winter, ci=None, palette='Set1')
sns.factorplot('mnth','cnt',hue='workingday',data = df_season_fall, ci=None, palette='Set1')
sns.factorplot('mnth','cnt',hue='workingday',data = df_season_summer, ci=None, palette='Set1')
sns.factorplot('mnth','cnt',hue='workingday',data = df_season_spring, ci=None, palette='Set1')

### Bike rentals with respect to weather and climate

In [None]:
sns.factorplot('mnth','cnt',hue='weathersit',data = df_season_winter, ci=None, palette='Set1')
sns.factorplot('mnth','cnt',hue='weathersit',data = df_season_fall, ci=None, palette='Set1')
sns.factorplot('mnth','cnt',hue='weathersit',data = df_season_summer, ci=None, palette='Set1')
sns.factorplot('mnth','cnt',hue='weathersit',data = df_season_spring, ci=None, palette='Set1')

In [None]:
sns.lmplot('temp','cnt',row='workingday',col='season',data=day,palette='RdBu_r',fit_reg=False)

In [None]:
plt.figure(figsize=(20,5))
mask = np.zeros_like(day.corr(), dtype=np.bool)
mask[np.triu_indices_from(mask)] = True
sns.heatmap(day.corr(),cmap='RdBu_r',mask=mask, annot=True)

In [None]:
day.plot(x='dteday', y='cnt')

In [None]:
plt.figure(figsize = (16, 16))
sns.pairplot(day)
plt.show()

In [None]:
plt.figure(figsize=(20, 12))
plt.subplot(2,3,1)
sns.boxplot(x = 'season', y = 'cnt', data = day)
plt.subplot(2,3,2)
sns.boxplot(x = 'yr', y = 'cnt', data = day)
plt.subplot(2,3,3)
sns.boxplot(x = 'mnth', y = 'cnt', data = day)
plt.subplot(2,3,4)
sns.boxplot(x = 'holiday', y = 'cnt', data = day)
plt.subplot(2,3,5)
sns.boxplot(x = 'weekday', y = 'cnt', data = day)
plt.subplot(2,3,6)
sns.boxplot(x = 'weathersit', y = 'cnt', data = day)
plt.show()

##### Inferences:
- Spring has very low bike rental numbers.
- 2019 has higher number of rentals than 2018.
- Jan-Mar has low numbers of bike rentals
- Bike rentals during holidays are less than non-holidays
- There are no bike rentals during heavy rain/thunderstorm and the number significantly reduces during times of light rain/snow.

In [None]:
day.describe()

## Step 4: Data Preparation

### 1. Finding the number of null values

In [None]:
day.isnull().sum()

We can concur that the dataset has no null values

##### The data looks seems to be pre-prepared as there are no null values.

### 2. Categorical Variables - Dummy/Binary encoding

#### i. Season

In [None]:
day['season'] = day['season'].map({1:"spring", 2:"summer", 3:"fall", 4:"winter"})
season = pd.get_dummies(day['season'], drop_first = True)

In [None]:
day = pd.concat([day, season], axis = 1)

In [None]:
day.drop('season',axis=1,inplace=True)

In [None]:
day.head()

#### ii. Month

In [None]:
day['mnth'] = day['mnth'] = day['mnth'].map({1:"Jan", 2:"Feb", 3:"Mar", 4:"Apr", 5:"May", 6:"Jun", 7:"Jul", 8:"Aug", 9:"Sep", 10:"Oct", 11:"Nov", 12:"Dec"})
month = pd.get_dummies(day['mnth'], drop_first = True)

In [None]:
day = pd.concat([day, month], axis = 1)

In [None]:
day.drop('mnth',axis=1,inplace=True)

In [None]:
day.head()

#### iii. Weekday

In [None]:
dmap = {1:'Mon',2:'Tue',3:'Wed',4:'Thu',5:'Fri',6:'Sat',0:'Sun'}
day['weekday'].astype('object')
day['weekday'] = day['weekday'].map(dmap)

weekday = pd.get_dummies(day['weekday'], drop_first = True)

In [None]:
day = pd.concat([day, weekday], axis = 1)

In [None]:
day.drop('weekday',axis=1,inplace=True)

In [None]:
day.head()

#### iv. Weather

In [None]:
day['weathersit'] = day['weathersit'].map({1:"Clear", 2:"Cloudy", 3:"Snow/Rain", 4:"Heavy Rain"})
weathersit = pd.get_dummies(day['weathersit'], drop_first = True)

In [None]:
day = pd.concat([day, weathersit], axis = 1)

In [None]:
day.drop('weathersit',axis=1,inplace=True)

In [None]:
day.head()

### 3. Rescaling the features

In [None]:
from sklearn.preprocessing import MinMaxScaler

In [None]:
scaler = MinMaxScaler()

num_vars = ['temp', 'atemp', 'hum', 'windspeed', 'casual', 'registered', 'cnt']

In [None]:
day[num_vars] = scaler.fit_transform(day[num_vars])
day.head()

### 4. Dropping unrequired columns

##### The datasource contains columns which are not important to our linear model such as instant

In [None]:
day.drop('instant',axis=1,inplace=True)
day.drop('dteday',axis=1,inplace=True)
day.drop('casual',axis=1,inplace=True)
day.drop('registered',axis=1,inplace=True)

### 5. Splitting data into test and train

In [None]:
# importing libraries for splitting

import sklearn
from sklearn.model_selection import train_test_split

In [None]:
df_train,df_test = train_test_split(day, train_size=0.7, random_state=100)
print(df_train.shape)
print(df_test.shape)

### 6. Creating the Correlation Matrix (Pearson's)

In [None]:
plt.figure(figsize = (20, 15))
sns.heatmap(df_train.corr(), annot = True, cmap="YlGnBu")
plt.show()

## Step 5: Splitting the Data into Training and Testing Sets

#### Since we need to forecast and predict variables to increase the total number of bike rentals, we can select 'cnt' as our target

In [None]:
y_train = df_train.pop('cnt')
X_train = df_train

## Step 6: Building a linear model

### 1. Adding variables to the model

#### Model 1: Adding variable 'temp' to the model

In [None]:
X_train_sm = sm.add_constant(X_train['temp'])

lr = sm.OLS(y_train, X_train_sm)

lr_model = lr.fit()

lr_model.params

In [None]:
lr_model.summary()

##### We can see that the R-Squared value of 0.416 is fairly high which tells us that we are moving in the right direction.
Further, we concur people are more likely to rent a bike at higher temperatures.

#### Model 2: Adding variable 'atemp' to the model

In [None]:
#adding another variable
X_train_sm = X_train[['temp', 'atemp']]

X_train_sm = sm.add_constant(X_train_sm)

lr = sm.OLS(y_train, X_train_sm)

lr_model = lr.fit()

lr_model.summary()

##### We can see that the R-Squared value of 0.421 is fairly high.
But, the p-value for variable - 'temp' is quite high, thereby giving us the undesrtanding that it has a low significance

#### Model 3: Using only variable 'atemp' in the model

In [None]:
#adding another variable
X_train_sm = X_train[['atemp']]

X_train_sm = sm.add_constant(X_train_sm)

lr = sm.OLS(y_train, X_train_sm)

lr_model = lr.fit()

lr_model.summary()

##### We can see that the R-Squared value of 0.420 is fairly high.
And judging from the p-values, we can say that the variables are significant. Further, we concur that 'atemp' explains how the temperature **feels** to the user. This give us the understanding that bike rentals go up depending on how the temperature feels rather than the temperature itself

#### Model 4: Adding variable 'yr' to the model

Since the variable 'yr' consists of only two values- 0 and 1, to signify the years 2018 and 2019, we can logically assign meaning for the variable - **'yr'** to signify the years that the brand - 'Zoombikes' has existed.

In [None]:
#adding another variable
X_train_sm = X_train[['atemp', 'yr']]

X_train_sm = sm.add_constant(X_train_sm)

lr = sm.OLS(y_train, X_train_sm)

lr_model = lr.fit()

lr_model.summary()

##### We can see that the R-Squared value of 0.697 is high.
This further goes to show that there is a year on year growth


#### Model 6: Adding variable 'holiday' to the model

In [None]:
#adding another variable
X_train_sm = X_train[['atemp', 'yr', 'holiday']]

X_train_sm = sm.add_constant(X_train_sm)

lr = sm.OLS(y_train, X_train_sm)

lr_model = lr.fit()

lr_model.summary()

##### We can see that the R-Squared value of 0.699 is high; but hass not significantly changed from the previous model. However, it has low p-values, hence, we can say that it had significance to the model.


#### Model 7: Adding variable 'windspeed' to the model

In [None]:
#adding another variable
X_train_sm = X_train[['atemp', 'yr', 'holiday', 'windspeed']]

X_train_sm = sm.add_constant(X_train_sm)

lr = sm.OLS(y_train, X_train_sm)

lr_model = lr.fit()

lr_model.summary()

##### We can see that the R-Squared value of 0.715 and p-value of 0.0 denotes that 'windspeed' has a great contribution to the model as well as high significance

#### Model 8: Adding variable 'Snow/Rain' to the model

In [None]:
#adding another variable
X_train_sm = X_train[['atemp', 'yr', 'holiday', 'windspeed', 'Snow/Rain']]

X_train_sm = sm.add_constant(X_train_sm)

lr = sm.OLS(y_train, X_train_sm)

lr_model = lr.fit()

lr_model.summary()

##### While 'Snow/Rain' has a negative coeff, we can see that the R-Squared value of 0.741 and p-value of 0.0 denotes that 'Snow/Rain' has a great contribution to the model as well as high significance

#### Model 9: Adding variable 'summer' to the model

In [None]:
#adding another variable
X_train_sm = X_train[['atemp', 'yr', 'holiday', 'windspeed', 'Snow/Rain', 'summer']]

X_train_sm = sm.add_constant(X_train_sm)

lr = sm.OLS(y_train, X_train_sm)

lr_model = lr.fit()

lr_model.summary()

##### While 'summer' has a positive coeff, we can see that the R-Squared value of 0.745 and p-value of 0.01 denotes that 'summer' contributes to the model with high significance

#### Model 10: Adding variable 'spring' to the model

In [None]:
#adding another variable
X_train_sm = X_train[['atemp', 'yr', 'holiday', 'windspeed', 'Snow/Rain', 'summer', 'spring']]

X_train_sm = sm.add_constant(X_train_sm)

lr = sm.OLS(y_train, X_train_sm)

lr_model = lr.fit()

lr_model.summary()

##### While 'spring' has a negative coeff, we can see that the R-Squared value of 0.79 and p-value of 0.0 denotes that 'spring' contributes to the model with high significance

#### Model 11: Adding variable 'Cloudy' to the model

In [None]:
#adding another variable
X_train_sm = X_train[['atemp', 'yr', 'holiday', 'windspeed', 'Snow/Rain', 'summer', 'spring', 'winter', 'Cloudy']]

X_train_sm = sm.add_constant(X_train_sm)

lr = sm.OLS(y_train, X_train_sm)

lr_model = lr.fit()

lr_model.summary()

##### While 'Cloudy' has a negative coeff, we can see that the R-Squared value of 0.819 and p-value of 0.0 denotes that 'Cloudy' contributes to the model with high significance

### 2. Adding all variables

In [None]:
cols = X_train.columns
#X_train[cols]

In [None]:
X_train_sm1 = (X_train[cols])

X_train_sm1 = sm.add_constant(X_train_sm1)

lr1 = sm.OLS(y_train, X_train_sm1)

lr_model1 = lr1.fit()

lr_model1.summary()

### 3. Checking VIF

In [None]:
from statsmodels.stats.outliers_influence import variance_inflation_factor

In [None]:
vif = pd.DataFrame()
vif['Features'] = X_train.columns
vif['VIF'] = [variance_inflation_factor(X_train.values, i) for i in range(X_train.shape[1])]
vif['VIF'] = round(vif['VIF'], 2)
vif = vif.sort_values(by = "VIF", ascending = False)
vif

### 4. Dropping variables based on p-value and VIF

In [None]:
X = X_train.drop('Mar', 1)

In [None]:
X_train_lm = sm.add_constant(X)

lr_2 = sm.OLS(y_train, X_train_lm).fit()

print(lr_2.summary())

In [None]:
X = X.drop('Thu', 1)

In [None]:
X_train_lm = sm.add_constant(X)

lr_3 = sm.OLS(y_train, X_train_lm).fit()

print(lr_3.summary())

In [None]:
X = X.drop('Oct', 1)

In [None]:
X_train_lm = sm.add_constant(X)

lr_4 = sm.OLS(y_train, X_train_lm).fit()

print(lr_4.summary())

In [None]:
X = X.drop('Jun', 1)

In [None]:
X_train_lm = sm.add_constant(X)

lr_5 = sm.OLS(y_train, X_train_lm).fit()

print(lr_5.summary())

In [None]:
X = X.drop('atemp', 1)

In [None]:
X_train_lm = sm.add_constant(X)

lr_6 = sm.OLS(y_train, X_train_lm).fit()

print(lr_6.summary())

In [None]:
vif = pd.DataFrame()
vif['Features'] = X.columns
vif['VIF'] = [variance_inflation_factor(X.values, i) for i in range(X.shape[1])]
vif['VIF'] = round(vif['VIF'], 2)
vif = vif.sort_values(by = "VIF", ascending = False)
vif

In [None]:
X = X.drop('workingday', 1)

In [None]:
X_train_lm = sm.add_constant(X)

lr_7 = sm.OLS(y_train, X_train_lm).fit()

print(lr_7.summary())

In [None]:
vif = pd.DataFrame()
vif['Features'] = X.columns
vif['VIF'] = [variance_inflation_factor(X.values, i) for i in range(X.shape[1])]
vif['VIF'] = round(vif['VIF'], 2)
vif = vif.sort_values(by = "VIF", ascending = False)
vif

In [None]:
X = X.drop('Sat', 1)

In [None]:
X_train_lm = sm.add_constant(X)

lr_8 = sm.OLS(y_train, X_train_lm).fit()

print(lr_8.summary())

In [None]:
vif = pd.DataFrame()
vif['Features'] = X.columns
vif['VIF'] = [variance_inflation_factor(X.values, i) for i in range(X.shape[1])]
vif['VIF'] = round(vif['VIF'], 2)
vif = vif.sort_values(by = "VIF", ascending = False)
vif

In [None]:
X = X.drop('Sun', 1)

In [None]:
X_train_lm = sm.add_constant(X)

lr_9 = sm.OLS(y_train, X_train_lm).fit()

print(lr_9.summary())



In [None]:
vif = pd.DataFrame()
vif['Features'] = X.columns
vif['VIF'] = [variance_inflation_factor(X.values, i) for i in range(X.shape[1])]
vif['VIF'] = round(vif['VIF'], 2)
vif = vif.sort_values(by = "VIF", ascending = False)
vif

In [None]:
X = X.drop('hum', 1)

In [None]:
X_train_lm = sm.add_constant(X)

lr_10 = sm.OLS(y_train, X_train_lm).fit()

print(lr_10.summary())

In [None]:
vif = pd.DataFrame()
vif['Features'] = X.columns
vif['VIF'] = [variance_inflation_factor(X.values, i) for i in range(X.shape[1])]
vif['VIF'] = round(vif['VIF'], 2)
vif = vif.sort_values(by = "VIF", ascending = False)
vif

In [None]:
X = X.drop('Aug', 1)

In [None]:
X_train_lm = sm.add_constant(X)

lr_10 = sm.OLS(y_train, X_train_lm).fit()

print(lr_10.summary())

In [None]:
vif = pd.DataFrame()
vif['Features'] = X.columns
vif['VIF'] = [variance_inflation_factor(X.values, i) for i in range(X.shape[1])]
vif['VIF'] = round(vif['VIF'], 2)
vif = vif.sort_values(by = "VIF", ascending = False)
vif

In [None]:
X = X.drop('May', 1)

In [None]:
X_train_lm = sm.add_constant(X)

lr_10 = sm.OLS(y_train, X_train_lm).fit()

print(lr_10.summary())

In [None]:
vif = pd.DataFrame()
vif['Features'] = X.columns
vif['VIF'] = [variance_inflation_factor(X.values, i) for i in range(X.shape[1])]
vif['VIF'] = round(vif['VIF'], 2)
vif = vif.sort_values(by = "VIF", ascending = False)
vif

In [None]:
X = X.drop('temp', 1)

In [None]:
X_train_lm = sm.add_constant(X)

lr_11 = sm.OLS(y_train, X_train_lm).fit()

print(lr_11.summary())

In [None]:
vif = pd.DataFrame()
vif['Features'] = X.columns
vif['VIF'] = [variance_inflation_factor(X.values, i) for i in range(X.shape[1])]
vif['VIF'] = round(vif['VIF'], 2)
vif = vif.sort_values(by = "VIF", ascending = False)
vif

In [None]:
X = X.drop('winter', 1)

In [None]:
X_train_lm = sm.add_constant(X)

lr_12 = sm.OLS(y_train, X_train_lm).fit()

print(lr_12.summary())

In [None]:
X = X.drop('Jul', 1)

In [None]:
X_train_lm = sm.add_constant(X)

lr_13 = sm.OLS(y_train, X_train_lm).fit()

print(lr_13.summary())

In [None]:
vif = pd.DataFrame()
vif['Features'] = X.columns
vif['VIF'] = [variance_inflation_factor(X.values, i) for i in range(X.shape[1])]
vif['VIF'] = round(vif['VIF'], 2)
vif = vif.sort_values(by = "VIF", ascending = False)
vif

#### Since we see low VIF as well as low P-Values, we can conclude that the model - 'lr_13' can be used as our final model

## Step 7: Residual Analysis of the train data

In [None]:
y_train_pred = lr_13.predict(X_train_lm)

In [None]:
# Plot the histogram of the error terms
fig = plt.figure()
sns.distplot((y_train - y_train_pred), bins = 20)
fig.suptitle('Error Terms', fontsize = 20)                  # Plot heading 
plt.xlabel('Errors', fontsize = 18)                         # X-label

#### Qualitatively, it is centered around 0. Therefore, we can accept this model.

## Step 8: Making predictions on test data and evaluating

#### 1. Applying the scaling on the test sets

In [None]:
df_test

In [None]:
df_test.describe()

In [None]:
cols = df_test.describe().columns

In [None]:
cols

In [None]:
num_vars = ['temp', 'temp', 'hum', 'windspeed', 'cnt']

In [None]:
df_test[num_vars] = scaler.fit_transform(df_test[num_vars])

In [None]:
df_test.describe()

In [None]:
y_test = df_test.pop('cnt')
X_test = df_test

For lr_10

In [None]:
X_test_m10 = sm.add_constant(X_test)

In [None]:
X_test_m10 = X_test_m10.drop(["Mar", "Thu", "Oct", "Jun", "atemp", "workingday", "Sat", "Sun", "hum", "Aug", "May"], axis = 1)

In [None]:
y_pred_m10 = lr_10.predict(X_test_m10)

For lr_13

In [None]:
X_test_m13 = sm.add_constant(X_test)

In [None]:
X_test_m13 = X_test_m13.drop(["Mar", "Thu", "Oct", "Jun", "atemp", "workingday", "Sat", "Sun", "hum", "Aug", "May", "temp", "winter", "Jul"], axis = 1)

In [None]:
y_pred_m13 = lr_13.predict(X_test_m13)

## Step 9: Model Evaluation

### 1. Plotting y_test and y_pred to understand the spread

#### ii.  For lr_10

In [None]:
fig = plt.figure()
plt.scatter(y_test, y_pred_m10)
fig.suptitle('y_test vs y_pred', fontsize = 20)              # Plot heading 
plt.xlabel('y_test', fontsize = 18)                          # X-label
plt.ylabel('y_pred', fontsize = 16)    

#### ii.  For lr_13

In [None]:
fig = plt.figure()
plt.scatter(y_test, y_pred_m13)
fig.suptitle('y_test vs y_pred', fontsize = 20)              # Plot heading 
plt.xlabel('y_test', fontsize = 18)                          # X-label
plt.ylabel('y_pred', fontsize = 16)    

### 2. R2 Score and MSE

#### i. Foe lr_10

In [None]:
from sklearn.metrics import mean_squared_error, r2_score

r_squared = r2_score(y_test, y_pred_m10)

mse = mean_squared_error(y_test, y_pred_m10)

In [None]:
print('Mean Squared Error :' ,mse)
print('R Square Value :',r_squared)

#### Mean Squared Error : 0.010664552891617373
#### R Square Value : 0.8132258474643972

#### ii. For lr_13

In [None]:
from sklearn.metrics import mean_squared_error, r2_score

r_squared = r2_score(y_test, y_pred_m13)

mse = mean_squared_error(y_test, y_pred_m13)

In [None]:
print('Mean Squared Error :' ,mse)
print('R Square Value :',r_squared)

#### Mean Squared Error : 0.012460755921949618
#### R Square Value : 0.7817679605579628

### 3. Actual vs Predicted

For lr_10

In [None]:
c = [i for i in range(1,221,1)]
fig = plt.figure(figsize = (100, 15))
plt.plot(c,y_test, color="#3498DB", linewidth=5, linestyle="-")
plt.plot(c,y_pred_m13, color="#EC7063",  linewidth=5, linestyle="-")
fig.suptitle('Actual and Predicted', fontsize=20)              # Plot heading 
plt.xlabel('Index', fontsize=18)                               # X-label
plt.ylabel('Count', fontsize=16)                               # Y-label

For lr_13

In [None]:
c = [i for i in range(1,221,1)]
fig = plt.figure(figsize = (100, 15))
plt.plot(c,y_test, color="#3498DB", linewidth=5, linestyle="-")
plt.plot(c,y_pred_m13, color="#EC7063",  linewidth=5, linestyle="-")
fig.suptitle('Actual and Predicted', fontsize=20)              # Plot heading 
plt.xlabel('Index', fontsize=18)                               # X-label
plt.ylabel('Count', fontsize=16)                               # Y-label

##### From the above graph, we can see that the predicted values very closely coincide with the actual values; This is an indicator that our model is sound.

## Step 10: Suggestions and Inferences:
#### We select lr_10 and lr_13 models

In [None]:
print(lr_10.summary())

In [None]:
print(lr_13.summary())

#### From the above validated model, we can make some generalisations. The insights are as follows:
1. Users tend not to rent and use bikes during Snow or Rain.
2. Users generally prefer riding bikes when the temperature is higher.
3. Bike rentals on Holidays are lesser than working days.
4. Windspeed has a significant effect on bikes being rented.
5. Spring season generally sees lesser bike rentals

In [None]:
day_disp = pd.read_csv('../input/bike-sharing-dataset/day.csv')

In [None]:
plt.figure(figsize=(20, 12))
plt.subplot(2,3,1)
sns.boxplot(x = 'weathersit', y = 'cnt', data = day_disp)
plt.subplot(2,3,2)
sns.regplot(x = 'temp', y = 'cnt', data = day_disp)
plt.subplot(2,3,3)
sns.boxplot(x = 'holiday', y = 'cnt', data = day_disp)
plt.subplot(2,3,4)
sns.regplot(x = 'windspeed', y = 'cnt', data = day_disp)
plt.subplot(2,3,5)
sns.boxplot(x = 'season', y = 'cnt', data = day_disp)
plt.subplot(2,3,6)
sns.boxplot(x = 'mnth', y = 'windspeed', data = day_disp)
plt.show()

In [None]:
day_disp['season'] = day_disp['season'].map({1:"spring", 2:"summer", 3:"fall", 4:"winter"})
season = pd.get_dummies(day_disp['season'], drop_first = True)

In [None]:
day_disp = pd.concat([day_disp, season], axis = 1)

In [None]:
day_disp.drop('season',axis=1,inplace=True)

In [None]:
day_disp.head()

In [None]:
plt.figure(figsize=(20, 12))
plt.subplot(2,3,1)
sns.boxplot(x = 'spring', y = 'casual', data = day_disp)
plt.subplot(2,3,2)
sns.boxplot(x = 'spring', y = 'registered', data = day_disp)
plt.subplot(2,3,3)
sns.boxplot(x = 'spring', y = 'cnt', data = day_disp)
plt.show()

#### Suggestions are as follows:

##### 1. Users tend not to rent and use bikes during Snow or Rain.
- We can reduce the price of bike rentals during times of snow or rain by implementing dynamic pricig
- We can also implement hand warmers into the bike handlebars
- At the bike docking station, we can also have thin displosable rain coats

##### 2. Users generally prefer riding bikes when the temperature is higher.
- We can have different offers and coupons sent to riders when we can see a dip in the temperature to increase interest
- We can have dynamic pricing to increase rates by 5% for every 5°C change in temp above the mean of 20°C

##### 3. Bike rentals on Holidays are lesser than working days.
- Like in the above suggestion, we can send coupons and offers to riders to increase interest on holidays
- We can see that bikes are predominantly rented as a means of workplace commuting during workdays. We can have additional marketing to pose bike rides as a lifestyle activity and host events during holidays so that riders can rent bikes and participate in these events

##### 4. Windspeed has a significant (-ve) effect on bikes being rented.
- Like in the above suggestion, we can send coupons and offers to riders to increase rentals when the windspeed is high
- During months when the windspeed is high on average, like in the case of April, we can have challenges for the number of miles or calories lost during bike rides, and give them a trophy on the app as well as a voucher if they manage to achieve the targets

##### 5. Spring season generally sees lesser bike rentals.
- This could be due to the spring breaks and other holidays that are taken during this time
- We can have marketing to ensure that people realize that spring is the best season to cycle, apart from that, events and challenges need to be concentrated on in this month.
- Spring might also bring plenty of tourists to Washington, they would be classified as casual riders and not registered riders, in Spring there need to be stronger strategies to ensure and entice more number of casual riders during Spring season.
- We could include guided bicycle tours of tourist destinations.
- Placement of bicycles in docking stations near tourist destinations, parks and restaurants