<a href="https://colab.research.google.com/github/sanyamja1n/Bike-Sharing-Demand-Prediction/blob/main/Bike_Sharing_Demand_Prediction.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    - Bike Sharing Demand Prediction



##### **Project Type**    - Regression
##### **Contribution**    - Team
##### **Team Member 1 -** Sanyam Jain
##### **Team Member 2 -** Yaser Zaidi
##### **Team Member 3 -** Abhishek Bhargava
##### **Team Member 4 -** Yaseer Khan
##### **Team Member 5 -** Shraddha Shandilya

# **Project Summary -**

A bike-sharing system is a service in which bikes are made available for shared use to individuals on a short term basis for a price or free. Many bike share systems allow people to borrow a bike from a "dock" which is usually computer-controlled wherein the user enters the payment information, and the system unlocks it. This bike can then be returned to another dock belonging to the same system. We will work with the factors on which the demand for these shared bikes depends and the factors affecting the demand for these shared bikes in the market. We will get to know:

* Which variables are significant in predicting the demand for shared bikes?
* How well those variables describe the bike demands?

 The goal of this project is also to do a linear regression model to predict the number of rented bikes in a given time frame

# **GitHub Link -**

https://github.com/sanyamja1n/Bike-Sharing-Demand-Prediction

# **Problem Statement**


***Currently Rental bikes are introduced in many urban cities for the enhancement of mobility comfort. It is important to make the rental bike available and accessible to the public at the right time as it lessens the waiting time. Eventually, providing the city with a stable supply of rental bikes becomes a major concern. The crucial part is the prediction of bike count required at each hour for the stable supply of rental bikes.***

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required. 
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits. 
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 15 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule. 

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





6. You may add more ml algorithms for model creation. Make sure for each and every algorithm, the following format should be answered.


*   Explain the ML Model used and it's performance using Evaluation metric Score Chart.


*   Cross- Validation & Hyperparameter Tuning

*   Have you seen any improvement? Note down the improvement with updates Evaluation metric Score Chart.

*   Explain each evaluation metric's indication towards business and the business impact pf the ML model used.




















# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import pandas as pd
import numpy as np 
import matplotlib.pyplot as plt
import seaborn as sns                

from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.preprocessing import StandardScaler, MinMaxScaler
from sklearn.metrics import r2_score, mean_squared_error, accuracy_score
from sklearn.linear_model import Ridge, Lasso, LinearRegression, LogisticRegression
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor

import xgboost as xgb



import warnings
warnings.filterwarnings('ignore')

%matplotlib inline

### Dataset Loading

In [None]:
#let's mount the google drive for import the dtaset
from google.colab import drive
drive.mount('/content/drive')

In [None]:
# Load Dataset
df = pd.read_csv("/content/drive/MyDrive/Sample Data/SeoulBikeData.csv", encoding = "unicode_escape")

In [None]:
pd.set_option('display.max_columns',None)

### Dataset First View

In [None]:
# Dataset First Look
df.head()

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
df.shape

* *This Dataset contains 8760 lines and 14 columns.*
*The Data in this dataset is of 1 year.
**In a day we have 24 hours and we have 365 days a year so we will have 365 * 24 = 8760 = number of line in the dataset.*

### Dataset Information

In [None]:
# Dataset Info
df.info()

#### Duplicate Values

In [None]:
#Checking for Duplicates
df.duplicated().sum()

There are 0 Duplicate values

#### Missing Values/Null Values

In [None]:
#check for count of missing values in each column.
df.isnull().sum()

There are 0 null values.

### What did you know about your dataset?

***The dataset contains weather information (Temperature, Humidity, Windspeed, Visibility, Dewpoint, Solar radiation, Snowfall, Rainfall), the number of bikes rented per hour and date information.***

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
df.columns

In [None]:
# Dataset Describe
df.describe()

### <b>Attribute Information: </b>

* ### Date : year-month-day
* ### Rented Bike count - Count of bikes rented at each hour
* ### Hour - Hour of he day
* ### Temperature-Temperature in Celsius
* ### Humidity - %
* ### Windspeed - m/s
* ### Visibility - 10m
* ### Dew point temperature - Celsius
* ### Solar radiation - MJ/m2
* ### Rainfall - mm
* ### Snowfall - cm
* ### Seasons - Winter, Spring, Summer, Autumn
* ### Holiday - Holiday/No holiday
* ### Functional Day - NoFunc(Non Functional Hours), Fun(Functional hours)

### Variables Description 

**Breakdown of Features:**

**Date** : *The date of the day, during 365 days from 01/12/2017 to 30/11/2018, formating in DD/MM/YYYY, type : str*, we need to convert into date time format.

**Rented Bike Count** : *Number of rented bikes per hour which our dependent variable and we need to predict that, type : int*

**Hour**: *The hour of the day, starting from 0-23 it's in a digital time format, type : int, we need to convert it into category data type.*

**Temperature(°C)**: *Temperature in Celsius, type : Float*

**Humidity(%)**: *Humidity in the air in %, type : int*

**Wind speed (m/s)** : *Speed of the wind in m/s, type : Float*

**Visibility (10m)**: *Visibility in m, type : int*

**Dew point temperature(°C)**: *Temperature at the beggining of the day, type : Float*

**Solar Radiation (MJ/m2)**: *Sun contribution, type : Float*

**Rainfall(mm)**: *Amount of raining in mm, type : Float*

**Snowfall (cm)**: *Amount of snowing in cm, type : Float*

**Seasons**: Season of the year, type : str, there are only 4 season's in data, type: str

**Holiday**: *If the day  is holiday period or not, type: str*

**Functioning Day**: *If the day is a Functioning Day or not, type : str*






### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
df.nunique()

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
#Changing the datatype of date column
df['Date'] = pd.to_datetime(df['Date'])

In [None]:
#Breaking the date column
df['Month']=df['Date'].dt.month
df['Day']=df['Date'].dt.day_name()

In [None]:
#creating a new column of "weekdays_weekend"
df['weekdays_weekend']= df['Day'].apply(lambda x : 'Weekend' if x=='Saturday' or x=='Sunday' else 'Weekday')
df.drop(columns=['Date','Day'],inplace=True)

* **We Extracted the Month and Day from the Date column.**
* **Then created a new column weekdays_weekend.**

Hour, Month, weekdays_weekend column are shown as integer data type but actually they are category data type, so we have to change their datatypes from int to category.

In [None]:
#Change the int64 column into category column
cols=['Hour','Month','weekdays_weekend']
for col in cols:
  df[col]=df[col].astype('category')

In [None]:
#let's check the result of data type
df.info()

### What all manipulations have you done and insights you found?

First we changed the datatype of the date column.

Then break down the date column into month and day column.

Then created a new column weekdays_weekend from day column.

Then dropped date and day column.

Hour, Month, weekdays_weekend column are shown as int data type but actually they are of category data type, so we have changed their datatypes from int to category.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In which months/season, the demand of rented bikes is the highest?

In [None]:
# Count of Rented bikes according to Month
plt.figure(figsize=(20, 8))
sns.barplot(data=df,x='Month',y='Rented Bike Count')
plt.title('Count of Rented bikes according to Month ')
plt.show()
#Count of Rented bikes acording to Seasons
plt.figure(figsize=(12, 5))
sns.barplot(data=df,x='Seasons',y='Rented Bike Count')
plt.title('Count of Rented bikes acording to Seasons')
plt.show()

##### 1. Why did you pick the specific chart?

To check relation between rented bike count and month/season.

##### 2. What is/are the insight(s) found from the chart?

***From the above bar plots we can clearly say that from  the month 5 to 10 the demand of the rented bike is high as compared to other months.These months are part of summer season.***

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, it will help to gain insight to help creating a positive business impact.

#### Chart - 2

How does the bike demand vary from weekdays to weekend?

In [None]:
#Count of Rented bikes acording to Weekdays and Weekend
plt.figure(figsize=(8, 8))
sns.barplot(data=df,x='weekdays_weekend',y='Rented Bike Count')
plt.title('Count of Rented bikes acording to Weekdays and Weekend')

##### 1. Why did you pick the specific chart?

To check relation between rented bike count and Weekend & weekdays.

##### 2. What is/are the insight(s) found from the chart?

On weekend, the demand of rented bikes are low.

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, it will help to gain insight to help creating a positive business impact.

#### Chart - 3

When is the rented bike used the most during the time period of 24 hours in the day?

In [None]:
#Count of Rented bikes acording to weekdays_weekend and Hour
plt.figure(figsize=(20, 8))
sns.pointplot(data=df,x='Hour',y='Rented Bike Count',hue='weekdays_weekend')
plt.title('Count of Rented bikes acording to weekdays_weekend and Hour')

##### 1. Why did you pick the specific chart?

To check relation between rented bike count and hour.

##### 2. What is/are the insight(s) found from the chart?

Peak Time are 7 am to 9 am and 5 pm to 7 pm

On weekend, the demand of rented bikes are very low specially in the morning hour but during evening, from 4 pm to 8 pm the demand slightly increases.  

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, it will help to gain insight to help creating a positive business impact.

#### Chart - 4

What is the impact of Functioning Day on Rented bike count?

In [None]:
plt.figure(figsize=(8, 8))
sns.barplot(data=df,x='Functioning Day',y='Rented Bike Count')
plt.title('Count of Rented bikes acording to Functioning Day')

##### 1. Why did you pick the specific chart?

To check relation between rented bike count and Functioning day.

##### 2. What is/are the insight(s) found from the chart?

Above plot shows that 0 bikes were rented on non functioning day.

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, it will help to gain insight to help creating a positive business impact.

#### Chart - 5

Effect of holiday on rented bike

In [None]:
plt.figure(figsize=(12, 5))
sns.barplot(data=df,x='Holiday',y='Rented Bike Count')
plt.title('Count of Rented bikes acording to Holiday')
plt.show()
plt.figure(figsize=(20, 8))
sns.pointplot(data=df,x='Hour',y='Rented Bike Count',hue='Holiday')
plt.title('Count of Rented bikes acording to Holiday ')

##### 1. Why did you pick the specific chart?

To check relation between rented bike count and Holiday and also with hour.



##### 2. What is/are the insight(s) found from the chart?

During Workday people mostly rent bike around 7am to 9am and 5pm to 7pm but during holiday people tend to rent bike mostly in evening.



##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, it will help to gain insight to help creating a positive business impact.



#### Chart - 6

Effect of different numerical features on rented bike count

In [None]:
#Assigning the numerical column to variable
numcol=list(df.select_dtypes(['int64','float64']).columns)

# Plotting this numcol vs rented bike count
for i in numcol[1:]:
  plt.figure(figsize=(15, 5))
  df.groupby(i).mean()['Rented Bike Count'].plot()
  plt.show()

##### 1. Why did you pick the specific chart?

To check the effect of different numerical features on rented bike count.

##### 2. What is/are the insight(s) found from the chart?

We can see from the above plot that

* People love to ride bikes when temp is around 30deg.

* People love to ride bike in summer i.e., high temp.

* The demand of rented bike is mostly uniform but as the wind speed hits 7 m/s then the demand of bike also increases which clearly depicts that people love to ride bikes when it becomes windy.

* As it starts raining, the demand of rented bikes decreases but as the rainfall increases and it keeps raining, the demand of rented bikes is not decreasing.

* The number of rented bikes decreases as the snowfall increases more than 4 cm.


##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, it will help to gain insight to help creating a positive business impact.



#### Chart - 7

Regression plot

In [None]:
for col in numcol:
  fig,ax=plt.subplots(figsize=(10,6))
  sns.regplot(x=df[col],y=df['Rented Bike Count'],scatter_kws={"color": 'orange'}, line_kws={"color": "black"})

##### 1. Why did you pick the specific chart?

To check regression plot of different numerical columns with rented bike count.

##### 2. What is/are the insight(s) found from the chart?

 From the above regression plot of all numerical features we see that 'Temperature', 'Wind_speed','Visibility', 'Dew_point_temperature' and 'Solar_Radiation' are positively related to the rented bike count which means the rented bike count increases with increase of these features.
 
 On the other hand, 'Rainfall','Snowfall' and 'Humidity' these features are negatively related which means the rented bike count decreases when these features increase.

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, it will help to gain insight to help creating a positive business impact.



#### Chart - 8 - Correlation Heatmap

In [None]:
plt.figure(figsize=(15,11))
sns.heatmap(df.corr(), cmap='coolwarm', annot=True)

##### 1. Why did you pick the specific chart?

To check the multicollinearity problem.

##### 2. What is/are the insight(s) found from the chart?

The most influential variable is Temperature.

From the above correlation heatmap, We can see that there is a highly positive correlation between columns 'Temperature' and 'Dew point temperature' i.e 0.91 so even if we drop this column then it dont affects the outcome of our analysis. And they have the same variations.. so we can drop the column 'Dew point temperature(°C)'


In [None]:
df.drop(columns=['Dew point temperature(°C)'],inplace=True)

#### Chart - 9 - Pair Plot 

In [None]:
# Pair Plot visualization code
sns.pairplot(df, hue="Rented Bike Count")

## ***5. Feature Engineering & Data Pre-processing***

### 1. Handling Missing Values

In [None]:
# No Missing Values

### 2. Data Normalization, Outlier detection and Removal

In [None]:
fig = plt.figure(figsize=(9, 6))
ax = fig.gca()
feature = df['Rented Bike Count']
feature.hist(bins=50, ax = ax)
ax.axvline(feature.mean(), color='red', linestyle='dashed', linewidth=2)
ax.axvline(feature.median(), color='black', linestyle='dashed', linewidth=2)    
ax.set_title('Rented Bike Count')
plt.show()


The above graph shows that Rented Bike Count has moderate right skewness. Since the assumption of linear regression is that 'the distribution of dependent variable has to be normal', so we perform some operations to make it normal.

In [None]:
#Boxplot of Rented Bike Count to check outliers
plt.figure(figsize=(10,6))
plt.ylabel('Rented_Bike_Count')
sns.boxplot(x=df['Rented Bike Count'])
plt.show()

There are some Outliers.

In [None]:
#Applying square root to Rented Bike Count to improve skewness
plt.figure(figsize=(10,8))
plt.xlabel('Rented Bike Count')
plt.ylabel('Density')

ax=sns.distplot(np.sqrt(df['Rented Bike Count']), color="y")
ax.axvline(np.sqrt(df['Rented Bike Count']).mean(), color='magenta', linestyle='dashed', linewidth=2)
ax.axvline(np.sqrt(df['Rented Bike Count']).median(), color='black', linestyle='dashed', linewidth=2)

plt.show()

Got Almost Normal Distribution.

In [None]:
#After applying sqrt on Rented Bike Count check wheater we still have outliers 
plt.figure(figsize=(10,6))

plt.ylabel('Rented_Bike_Count')
sns.boxplot(x=np.sqrt(df['Rented Bike Count']))
plt.show()

After applying Square root to the Rented Bike Count column, we found that there are no outliers present.



### 3. Categorical Encoding

In [None]:
#Assign all catagorical features to a variable
categorical_features=list(df.select_dtypes(['object','category']).columns)

In [None]:
#creat a copy
df_copy = df

def one_hot_encoding(data, column):
    data = pd.concat([data, pd.get_dummies(data[column], prefix=column,)], axis=1)
    data = data.drop([column], axis=1)
    return data

for col in categorical_features:
    df_copy = one_hot_encoding(df_copy, col)
df_copy.head()       

#### What all categorical encoding techniques have you used & why did you use those techniques?

Created dummy variables of categorical variables by One Hot Encoding.

Many machine learning algorithms cannot work with categorical data directly that's why the categories must be converted into numbers.

### 4. Feature Manipulation & Selection

#### 1. Feature Manipulation

In [None]:
#Not Required

#### 2. Feature Selection

### 5. Data Transformation

#### Do you think that your data needs to be transformed? If yes, which transformation have you used. Explain Why?

In [None]:
indvar = list(set(df_copy.columns)-{"Rented Bike Count"})
X = df_copy[indvar]
yLR = np.sqrt(df_copy['Rented Bike Count'])
y = df_copy['Rented Bike Count']

### 8. Data Splitting

In [None]:
# Split your data to train and test. Choose Splitting ratio wisely.
X_train, X_test, Y_train, Y_test = train_test_split( X , yLR , test_size = 0.2, random_state = 42)

In [None]:
X_train, X_test, y_train, y_test = train_test_split( X,yLR , test_size = 0.2, random_state = 42) 
print(X_train.shape)
print(X_test.shape)

#### What data splitting ratio have you used and why? 

We have made the use of 80% and 20% Training and Testing data respectively.The data should be divided in such a way that neither of them is too high, which is more dependent on the ammount of data we have. If the data is too small, then no split will give satisfactory variance so we'll have to do cross-validation, but if the data is huge then it doesn’t really matter whether we choose an 80:20 split or a 90:10 split (indeed we may choose to use less training data as otherwise, it might be more computationally intensive).

### 6. Data Scaling

In [None]:
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

## ***7. ML Model Implementation***

### ML Model -  Linear Regression, Ridge, Lasso

In [None]:
models = [
          ['Linear Regression', LinearRegression()],
          ['Lasso', Lasso()],
          ['Ridge', Ridge()],
          ] 

model_eveluation_metrices =[]
for name, model in models:
    working_model ={}
    working_model['name'] = name
    
    model.fit(X_train, y_train)
    
    
    y_pred_train = model.predict(X_train)
    y_pred_test = model.predict(X_test)
    working_model["Train_R2_Score"] = r2_score((y_train)**2, (y_pred_train**2))
    working_model["Test_R2_Score"] = r2_score((y_test)**2,(y_pred_test)**2)
    print( f"{name} coef :{model.coef_}" )

    
    model_eveluation_metrices.append(working_model)

model_eveluation_metrices_df = pd.DataFrame(model_eveluation_metrices)
model_eveluation_metrices_df

### ML Model - Decision Tree, Gradient Boosting, XG Boost

In [None]:
#Train Test Split for other models
X_train, X_test, y_train, y_test = train_test_split( X,y , test_size = 0.2, random_state = 42) 

In [None]:
models = [ ['Decision Tree Regressor', DecisionTreeRegressor()],
          ['XG Boost Regressor', xgb.XGBRegressor()],
          ['Gradient Boosting Regressor', GradientBoostingRegressor()],
          ]
for name, model in models:
    working_model ={}
    working_model['name'] = name
    
    model.fit(X_train, y_train)
    
    
    y_pred_train = model.predict(X_train)
    y_pred_test = model.predict(X_test)
    working_model["Train_R2_Score"] = r2_score(y_train, y_pred_train)
    working_model["Test_R2_Score"] = r2_score(y_test,y_pred_test)


    #Feature Importance
    importances = model.feature_importances_

    importance_dict = {'Feature' : list(X_train.columns),
                   'Feature Importance' : importances}

    importance_df = pd.DataFrame(importance_dict)

    importance_df['Feature Importance'] = round(importance_df['Feature Importance'],2)

     


    
    model_eveluation_metrices.append(working_model)

model_eveluation_metrices_df = pd.DataFrame(model_eveluation_metrices)
model_eveluation_metrices_df

XG Boost Regressor and Gradient Boosting Regressor have high test r2 score. So we will take XG Boost for Hyperparameter Tuning.


In [None]:
#Feature Importance
importance_df.sort_values(by=['Feature Importance'],ascending=False).head()

Most important feature is Temperature.

### Hyperparameter Tuning

***We will do Hyperparameter tuning by using Grid Search CV.***

GridSearchCV helps to loop through predefined hyperparameters and fit the model on the training set. So, in the end, we can select the best parameters from the listed hyperparameters. 

In [None]:
# creating parameters

learning_rate = [0.2,0.4,.6]
n_estimators = range(50, 400, 50)

# param_dict
param_dict = {
    'learning_rate' : learning_rate, 
    'n_estimators' : n_estimators
}

In [None]:
### Cross validation
XGBR = xgb.XGBRegressor()
parameters = param_dict
XGB_regressor = GridSearchCV(XGBR, parameters, scoring='r2', cv=3)
XGB_regressor.fit(X_train, y_train)

In [None]:
print("Best parameters are  \n",XGB_regressor.best_params_)

# predit 
y_train_preds = XGB_regressor.best_estimator_.predict(X_train)
y_test_preds = XGB_regressor.best_estimator_.predict(X_test)

print("\n \n")
Train_R2_Score = r2_score(y_train,y_train_preds)


Test_R2_Score = r2_score(y_test, y_test_preds)

print(f"Train_R2_Score  {Train_R2_Score}")
print(f"Test_R2_Score  {Test_R2_Score}")

In [None]:
#Plotting Actual vs Predicted Values
plt.figure(figsize=(100,20))
plt.plot((y_test_preds))
plt.plot((np.array(y_test)))
plt.legend(["Predicted","Actual"])
plt.show()

# **Conclusion**

In our analysis, we initially did EDA on all the features of our datset. We first analysed our dependent variable, 'Rented Bike Count' and transformed it. Next we analysed categorical variables and dropped the variables which are of no use. We also analysed numerical variables, found out the correlation, distribution and their relationship with the dependent variable. We also removed some numerical features which are highly correlated to other variables and hot encoded the categorical variables.

Next we implemented four machine learning algorithms Linear Regression, L1 & L2 Regularization, Decision tree and XGBoost. We did hyperparameter tuning to improve our model performance. The results of our evaluation are:


• No overfitting is seen.

• XG Boost, Gradient Boosting and XG Boost GridSearchCV gives the highest test R2 score of 84.9%, 84.7% and 87.4% recpectively.


However, this is not the ultimate end. As this data is time dependent, the values for variables like temperature, windspeed, solar radiation etc., will not always be consistent. Therefore, there will be scenarios where the model might not perform well. As Machine learning is an exponentially evolving field, we will have to be prepared for all contingencies and also keep checking our model from time to time. Therefore, having a quality knowledge and keeping pace with the ever evolving ML field would surely help one to stay a step ahead in future.

On the test set, the final model scored . Although this is good, there are many ways we can continue to improve this models predictive ability. We can always continue to see how other machine learning models perform. Also, there are other features I believe can help shed light on the trends and patterns in demand for bike rentals. Prediction model decay with respect to time, or we can`t trust any model on long go. Frequent retraining is one way to address production model maintenance. Sudden changes in data could lead model to behave unwantedly. Monitoring and observability adds one more layer to assure model quality.

### ***Hurrah! You have successfully completed your Machine Learning Capstone Project !!!***