# **Project Title: Public Transport Demand Prediction**




**Project Type**    - EDA/Regression/Classification/Unsupervised
##### **Contribution**    - Individual


# **Project Summary -**

The goal of a project on public transport demand prediction is to develop a system or model that can accurately forecast the number of passengers who will use a public transportation system over a given period of time. This requires collecting and analyzing historical data on passenger numbers and other relevant factors, as well as implementing machine learning algorithms and other techniques to make predictions.

The project may involve developing a proof-of-concept prototype, a full-scale system for use by a public transportation provider, or a research study to evaluate the effectiveness of different prediction methods. Some of the key challenges in this type of project include dealing with data quality and missing data, accounting for changing patterns of demand over time, and selecting appropriate models and parameters to achieve accurate predictions.

Overall, the aim of a project on public transport demand prediction is to help transportation providers optimize their services, reduce costs, and improve the overall passenger experience by anticipating and meeting the needs of their customers.

# **GitHub Link -** - https://github.com/ankurvish1920

# **Problem Statement**


This challenge ask you to build a model that predicts the number of seats that Mobiticket can expect to sell in each ride, i.e. for a specific route on a specific date and time. There are 14 routes in this dataset. All of the routes ends in Nairobi and originate in towns to the North-West of Nairobi.

#The towns from which these routes originate are:


Awendo

Homa Bay

Kehancha

Kendu Bay

Keroka

Keumbu

Kijauri

Kisii

Mbita

Migori

Ndhiwa

Nyachenge

Oyugis

Rodi

Sirari

Sori


### These routes from 14 origins at the first stop in the outskirts of Nairobi takes approximately 8 to 9 hours from the time of departure. From the first stop in the outskirts of Nairobi into the main bus terminal, where most passengers get off, in the Central Business District, takes another 2 to 3 hours depending on the traffic. The three stops that all these routes makes in Nairobi(in order) are:


1.   Kawangware: the first stop in the outskirts of Nairobi.
2.   Westlands: 
3.   Afya centre: the bus centre where most passengers disembark.

###     Passengers from these bus (or shuttle ) rides are affected by the Nairobi traffic not only during there ride into the city, but from there they must continue their journey to there final destination in Nairobi wherever they may be. Traffic can act as a deterent for those who have the option to avoid buses that arrive in Nairobi during peak traffic hours. On the other hand traffic maybe an indication for people's movement patterns, reflection business hours, cultural events, political events, and holidays.

# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
# Importing required libraries
import pandas as pd
import numpy as np
# Importing Visualization Packages
import matplotlib.pyplot as plt
import seaborn as sns
#importing skewwness data 
from scipy.stats import skew
from scipy.stats import norm
# Importing Modelling libraries
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import GridSearchCV, KFold
# Importing warnings so as to handle the warnings. Python by default displays warnings once per module session. But, however sometimes we want to 
# ingore that, by importing these warnings you can control how warnings are handled and ensure that the code execute correctly.
import warnings

warnings.filterwarnings("ignore")
# Importing rs2score to determine how well algorithms can predict target variable
from sklearn.metrics import r2_score
# Importing mean_squared_error to evaluate the distance between predicted and actual values of the variables.
from sklearn.metrics import mean_squared_error
# importing MinMaxScaler to scale data between (0-1).
from sklearn.preprocessing import MinMaxScaler
# Importing datetime to handle datetime of the data
import datetime 
import time

### Dataset Loading

In [None]:
# Load Dataset
##### **Here we are parsing the datetime columns in its proper format.**
df= pd.read_csv('train_revised.csv',parse_dates=["travel_date","travel_time"])

### Dataset First View

In [None]:
# Dataset First Look
df.head()

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
df.shape

### Dataset Information

In [None]:
# Dataset Info
df.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
# Dataset Duplicate Value Count
len(df[df.duplicated()])


#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
print(df.isnull().sum())

### What did you know about your dataset?

This dataset have a total 10 columns in which no columns have a null and dataset have no any duplicate values

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
df.columns

In [None]:
# Dataset Describe
df.describe(include='all')

### Variables Description 

The following are descriptions the fields.


*   ride_id : unique ID of a vehicle on a specific route on a specific day and time.
*   seat_number : seat assigned to the ticket.

*   payment_method : Method used by customer to buy ticket from Mobiticket(Cash or Mpesa).
*   payment_receipt : unique id number for ticket purchased from Mobiticket.

*   travel_date : date of ride departure (MM/DD/YYYY).
*   travel_time : Schedule departure time of ride, Rides generally depart on time. (hh:mm).
*   travel_from : time from which ride originated.
*   travel_to : designation of rides. All rides are to Nairobi.
*   car_type : vehicle type (shuttle or bus).
*   max_capacity : number of seats on a vehicle.

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
# Check Unique Values for each variable.
for i in df.columns.tolist():
  print("No. of unique values in ",i,"is",df[i].nunique(),".")
     

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# we want to prediction the number of tickets that Mobiticket can sell but in our data we are not having any dependent variable so we have to create it.
# Here we are applying transform on a group and getting the count of non-nulls as storing these values into a new column called 'no_of_ticket'
df['no_of_ticket']=df.groupby(['travel_date','travel_time','travel_from','car_type'])['travel_time'].transform("count")


In [None]:
# mapping datetime is the process of converting dates and times into numerical values.
df["day"] = df['travel_date'].map(lambda x: x.day)
df["year"] = df['travel_date'].map(lambda x: x.year)
df["month"] = df['travel_date'].map(lambda x: x.month)

In [None]:
#creating a new columns name hour and mintus 
df['hour']= df['travel_time'].dt.hour

df['minute'] =df['travel_time'].dt.minute

In [None]:
# mapping week and weekdays into numerical values.
df['week']= df['travel_date'].dt.week
df['day_of_week']= df['travel_date'].dt.weekday
df['day_of_week'] = df['travel_date'].dt.day_name()

In [None]:
# dropping unnecessary columns.
df=df.drop(['seat_number','ride_id','payment_receipt', 'travel_date','travel_time','travel_to'],axis= 1)

In [None]:
df

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### DIFFERENT TYPE OF CAR 

In [None]:
#USING COUNTPLOT TO COUNT A NUMBER OF BUS AND SHUTTLE 
sns.countplot(x=df['car_type'])

##### 1. Why did you pick the specific chart?

countplot because it is a useful visualization tool for counting the frequency of each category in a categorical variable. In the context of booking data, a countplot can help us quickly understand how many bookings were made for each type of Car.

The advantage of using a countplot over a simple bar chart is that the countplot automatically aggregates the data and counts the frequency of each category. This makes it easy to visualize the distribution of a categorical variable and compare the frequency of each category in a clear and concise way.

##### 2. What is/are the insight(s) found from the chart?

the number of bus booking is more as compared to shuttle booking.

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

yes it will help transporation business to  analyse which car is most use.

COUNTING WHICH PLACE IS MOST USE FOR TRANSPORATION 

In [None]:

plt.figure(figsize=(15,10))
sns.countplot(x='travel_from', data=df, order=df['travel_from'].value_counts().index)

##### 1. Why did you pick the specific chart?

countplot because it is a useful visualization tool for counting the frequency of each category in a categorical variable. In the context of booking data, a countplot can help us quickly understand how many bookings were made for each type of booking.

The advantage of using a countplot over a simple bar chart is that the countplot automatically aggregates the data and counts the frequency of each category. This makes it easy to visualize the distribution of a categorical variable and compare the frequency of each category in a clear and concise way.

##### 2. What is/are the insight(s) found from the chart?

Kissi and migori are most busiest starting point of transporation and kendu bay are rarely uses

Answer Here

WHICH DAY IS MOST BUSIEST

In [None]:
# Chart - 3 visualization code
sns.countplot(x='day_of_week', data=df, order=df['day_of_week'].value_counts().index)

##### 1. Why did you pick the specific chart?

A countplot is a type of plot in seaborn library that allows you to visualize the count of observations in a categorical variable. In the context of booking data, a countplot can be used to count the number of bookings for WEEK.

##### 2. What is/are the insight(s) found from the chart?

 the number of bookings for each day of the week in your dataset and found that Wednesday had the highest number of bookings, followed by Tuesday, Thursday, Friday, and Monday in that order

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

YES IT IS SHOWING WEDNESDAY AND TUESDAY ARE MOST BUSIEST DAY , COMPANY CAN USE THIS DATA TO MARKETING.

# **WHICH YEAR HAVE MOST BOOKING**

In [None]:
# Chart - 4 visualization code

plt.figure(figsize=(8,8))
df['year'].value_counts().plot(kind='pie', autopct='%1.1f%%')
plt.title('Pie Chart of Column Name')
plt.ylabel('')
plt.show()

##### 1. Why did you pick the specific chart?

TO SEE A WHICH YEAR HAVE HIGHEST BOOKING PIECHART IS GOOD BECUASE IT IS SHOWING MORE VISUAL AND PRECENTAGE.

##### 2. What is/are the insight(s) found from the chart?

MOST BOOKING ARE IN 2018 AS COMPARED TO 2017

Answer Here

**WHICH HOUR IS MOST BUSIEST**

In [None]:
# Chart - 5 visualization code

plt.figure(figsize=(8,6))
plt.hist(df['hour'], bins=10)
plt.title('busiest hour')
plt.xlabel('hour Name')
plt.ylabel('Frequency')
plt.show()

##### 1. Why did you pick the specific chart?

HISTLPOT USES FOR VISUALISE  DISTRIBUTION OF DATA 

##### 2. What is/are the insight(s) found from the chart?

MORNING TIME IS MOST BUSIEST HOUR AND WE CAN ALSO SAY MORE THAN 80% BOOKING ARE DONE IN MORNING.

### MONTHLY BOOKING 

In [None]:

plt.figure(figsize=(15,10))
sns.countplot(x='month', data=df, order=df['month'].value_counts().index)


## CORRELATION OF DATA 

In [None]:
# Chart - 8 visualization code
# Chart - 7 visualization code
plt.figure(figsize= (15,8))
correlation= df.corr()
sns.heatmap(abs(correlation),annot= True, cmap= 'coolwarm')

##### 1. Why did you pick the specific chart?

### Overall, correlation is a useful tool for booking analysis data as it can help us understand the relationships between variables, build more accurate predictive models, and identify potential issues with multicollinearity.







#### Pair Plot 

In [None]:
# Pair Plot visualization code
sns.pairplot(df)

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

## ***6. Feature Engineering & Data Pre-processing***

### 3. Categorical Encoding

In [None]:
# Encode your categorical columns
from sklearn.preprocessing import LabelEncoder
# Our data contains lots of categorical data that we need to encode this into a numerical data. One such method to do this is labelencoding
# LabelEncoder creates an ordering of the categorical values in the format that can be used by various algorithms.
le= LabelEncoder()
df['travel_from']= le.fit_transform(df['travel_from'])
df['car_type']= le.fit_transform(df['car_type'])
df['payment_method']= le.fit_transform(df['payment_method'])
df['day_of_week']= le.fit_transform(df['day_of_week'])

#### What all categorical encoding techniques have you used & why did you use those techniques?

Answer Here.

### 2. Handling Outliers

In [None]:

fig, axs = plt.subplots(ncols=len(df.columns), figsize=(15,5))

for i, col in enumerate(df.columns):
    sns.boxplot(df[col], color='orange', ax=axs[i], width=0.3) 
    axs[i].set_title(col)

plt.subplots_adjust(wspace=0.4)

plt.show()

boxplot showing me hour and mintus and travel_From  columns have high outliers.

In [None]:
for col in df:
#     print(df)
    print(col  ,skew(df[col]))

payment method , hour and mintus showing the high skewness , but we dont need payment method for machine learning model , we will delete this columns .but next syntax is for decrease skewness of hour and mintus columns .

In [None]:
df['hour']= np.log10(df['hour'])
df['month']= np.sqrt(df['month'])
df['car_type']= np.sqrt(df['car_type'])
df['year']= np.sqrt(df['year'])

In [None]:
for col in df:
#     print(df)
    print(col  ,skew(df[col]))

In [None]:
# Iterate over each column and create a displot
for column in df.columns:
    sns.displot(x=column, data=df, kde=True, rug=True)

# Show the plots
plt.show()

In [None]:
#i dont need some columns for  machine learning.
df.drop(columns=['payment_method','car_type','max_capacity','week','day_of_week'],axis=1,inplace=True)

### 8. Data Splitting

In [None]:
# Determing independent and dependent variable best suited for modelling 
# Data for all the independent variables
## Importing required Libraries
from sklearn.preprocessing import StandardScaler
from scipy import stats
from scipy.stats import zscore
X = df.drop(labels='no_of_ticket',axis=1).apply(zscore)
# Data for the dependent variable
Y = df['no_of_ticket']

In [None]:
X.columns

#DATA SPLITTING

In [None]:
# Split your data to train and test. Choose Splitting ratio wisely.
# Importing required Libraries
from sklearn.model_selection import train_test_split
#Divding the data set into Training and testing dataset using Test Train split
#we have takes 80% - 20% ratio for Test Train Split
X_train, X_test, Y_train, Y_test= train_test_split(X,Y, test_size= 0.2, random_state= 0)
print(X_train.shape, Y_train.shape)
print(X_test.shape, Y_test.shape)

DATA SCALLING

In [None]:
## standardizing the values.
from sklearn.preprocessing import StandardScaler
scaler= StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

Standardization is another common data preprocessing technique used in machine learning to transform feature values to have zero mean and unit variance. In this technique, each feature is scaled so that it has a mean of 0 and a standard deviation of 1

## ***7. ML Model Implementation***

LINEAR REGRESSION MODEL

In [None]:
# ML Model - 1 Implementation

# Fit the Algorithm
## importing linear regression
from sklearn.linear_model import LinearRegression

reg = LinearRegression().fit(X_train, Y_train)
## fitting training data into linear regression model
reg.score(X_train, Y_train)
# Predict on the model
# getting the test score of linear regrssion.
print(f"training_score {reg.score(X_train,Y_train)}")
print(f"testing_score {reg.score(X_test,Y_test)}")
Y_actual=reg.predict(X_train)
print(Y_actual)


In [None]:
Y_pred = reg.predict(X_test)
Y_pred

In [None]:
reg.coef_

In [None]:
#plotting the the actual and predicted sales values
plt.figure(figsize=(5,5))
plt.plot(Y_pred[:50]**2)
plt.plot(np.array((Y_test[:50])**2))
plt.legend(["Predicted","Actual"])
plt.show()

In [None]:
## determining the distance of actual and the predicted output
MSE= mean_squared_error(Y_test,Y_pred)
print("MSE", MSE)

## taking root of the mean squared error.
RMSE= np.sqrt(MSE)
print("RMSE", RMSE)

In [None]:
# determing how our model is fitting the datapoint.
# As the number of predictors in the model increases, the R-squared score tends to increase as well, even if the additional predictors do not contribute
# significantly to the model's performance.
from sklearn.metrics import r2_score
r2_score= r2_score(10**Y_test,10**Y_pred)

print("r2_score", r2_score)

In [None]:
# adjusted r2score iscore adjusts the R-squared score by penalizing the addition of unnecessary predictors.
from sklearn.metrics import r2_score
print("Adjusted R2 : ",1-(1-r2_score(10**(Y_test), 10**(Y_pred)))*((X_test.shape[0]-1)/(X_test.shape[0]-X_test.shape[1]-1)))

###  XGBOOST REGRESSOR 

In [None]:
## importing xgboost regressor
from xgboost import XGBRegressor
XG_model=XGBRegressor()
## fitting the training data into the model
XG_model.fit(X_train,Y_train)

In [None]:
Y_pred=XG_model.predict(X_test)
Y_pred_train=XG_model.predict(X_train)
Y_pred=XG_model.predict(X_test)

In [None]:
## getting the training score of the model
train_Score=XG_model.score(X_train,Y_train)
print(f"train_Score{XG_model.score(X_train,Y_train)}")
test_Score=XG_model.score(X_test,Y_test)
print(f"test_score{XG_model.score(X_test,Y_test)}")


In [None]:
#plotting the the actual and predicted values
plt.figure(figsize=(10,5))
plt.plot(Y_pred[:50]**2)
plt.plot(np.array((Y_test[:50])**2))
plt.legend(["Predicted","Actual"])
plt.show()

I USED XGBOOST REGRESSOR XG BOOST GIVE 94% ACCURACY.

CROSS VALIDATION AND HYPERPARAMETRIC TUNING ON XGBOOST REGRESSOR 

In [None]:
import xgboost as xgb
from sklearn.model_selection import GridSearchCV

# Create the XGBRegressor object
dreg = xgb.XGBRegressor()

# Set up the parameter grid for hyperparameter tuning
param_grid = {
    'n_estimators': [50, 100, 200],
    'max_depth': [5, 10, 15],
}

# Create the GridSearchCV object
grid_search = GridSearchCV(
    dreg,
    param_grid,
    cv=5, # 5-fold cross-validation
    scoring='neg_mean_squared_error', # Use mean squared error as the evaluation metric
    n_jobs=-1 # Use all available CPU cores
)

# Fit the GridSearchCV object to the data
grid_search.fit(X_train, Y_train)

# Get the best estimator from the GridSearchCV object
best_estimator = grid_search.best_estimator_

# Use the best estimator to make predictions on the test set
y_predbc = best_estimator.predict(X_test)

In [None]:
training_score= 1 * best_estimator.score(X_train, Y_train)
test_score = 1 * best_estimator.score(X_test, Y_test)
print('Training score: {:.2f}'.format(training_score))
print('Testing score: {:.2f}'.format(test_score))

In [None]:
#plotting the the actual and predicted values
plt.figure(figsize=(10,7))
plt.plot(y_predbc[:50]**2)
plt.plot(np.array((Y_test[:50])**2))
plt.legend(["Predicted","Actual"])
plt.show()

##### Which hyperparameter optimization technique have you used and why?

Hyperparameter tuning helps to find the optimal combination of hyperparameters, which can improve the accuracy of the model. This is especially important when dealing with complex datasets.
 XGBoost is already a fast algorithm, but hyperparameter tuning can help to further optimize the training process, resulting in faster training times.


##### Have you seen any improvement? Note down the improvement with updates Evaluation metric Score Chart.

YES , BEFORE USING HYPERPARAMETRIC TUNING TESTING SCORE WAS 94 BUT NOW TESTING SCORE IS 97.

# **RANDOM FOREST**

In [None]:
# Instantiate the random forest regressor model
rf_model = RandomForestRegressor()

# Fit the model to the training data
rf_model.fit(X_train, Y_train)
# Make predictions on the test data
y_pred_nrg = rf_model.predict(X_test)

# Evaluate the model using mean squared error
mse = mean_squared_error(Y_test, y_pred_nrg)
print("Mean squared error: ", mse)
## getting the training score of the model
train_Score=rf_model.score(X_train,Y_train)
print(f"train_Score{rf_model.score(X_train,Y_train)}")
test_Score=rf_model.score(X_test,Y_test)
print(f"test_score{rf_model.score(X_test,Y_test)}")

In [None]:
#plotting the the actual and predicted values
plt.figure(figsize=(10,7))
plt.plot(y_pred_nrg[:50]**2)
plt.plot(np.array((Y_test[:50])**2))
plt.legend(["Predicted","Actual"])
plt.show()

**RANDOM FOREST GIVE A TRAINING SCORE 99% AND TESTING SCORE 99%.**

# **Implementation with hyperparameter optimization techniques GridSearch**

In [None]:

from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import GridSearchCV

# Creating the RandomForestRegressor object
rdf = RandomForestRegressor()

# Set up the parameter grid for hyperparameter tuning
param_grid = {
    'n_estimators': [25, 50, 100],
    'max_depth': [5, 10, 15],
}

# Creating the GridSearchCV object
grid_search = GridSearchCV(
    rdf,
    param_grid,
    cv=5, # 5-fold cross-validation
    scoring='neg_mean_squared_error', # Use mean squared error as the evaluation metric
    n_jobs=-1 # Use all available CPU cores
)

# Fiting the GridSearchCV object to the data
grid_search.fit(X_train, Y_train)

# Get the best estimator from the GridSearchCV object
best_estimator = grid_search.best_estimator_

# Use the best estimator to make predictions on the test set
y_predcv = best_estimator.predict(X_test)
# Fit the Algorithm

# Predict on the model

In [None]:

train_score = 1 * best_estimator.score(X_train, Y_train)
test_score = 1 * best_estimator.score(X_test, Y_test)
print('Train score: {:.2f}'.format(train_score))
print('Test score: {:.2f}'.format(test_score))

In [None]:
#plotting the the actual and predicted values
plt.figure(figsize=(10,7))
plt.plot(y_predcv[:50]**2)
plt.plot(np.array((y_predcv[:50])**2))
plt.legend(["Predicted","Actual"])
plt.show()

# **DECISION TREE**

In [None]:
from sklearn.tree import DecisionTreeRegressor
tree = DecisionTreeRegressor(min_samples_leaf=10)
treereg =tree.fit(X_train ,Y_train)
print("Regression Model Score" , ":" ,treereg.score(X_train,Y_train),"\n",
      "Out of sample Test score" ,":",treereg.score(X_test ,Y_test))
print("\n")
yk_predicted = treereg.predict(X_train)
yk_test_predicted = treereg.predict(X_test)

In [None]:
#constructing a prediction dataframe with the actual and predicted sales values.
df_prediction = pd.DataFrame(np.array((Y_test)**2) ,columns =["Y_test"])
df_prediction["Y_test_predicted"] = np.array((yk_test_predicted)**2)
df_prediction.head(25)

In [None]:
# Create the DecisionTreeRegressor object
tree = DecisionTreeRegressor()

# Set up the parameter grid for hyperparameter tuning
param_grid = {
    'min_samples_leaf': [5, 10, 20, 50]
}

# Create the GridSearchCV object
grid_search = GridSearchCV(
    tree,
    param_grid,
    cv=5, # 5-fold cross-validation
    scoring='neg_mean_squared_error', # Use mean squared error as the evaluation metric
    n_jobs=-1 # Use all available CPU cores
)

# Fiting the GridSearchCV object to the data
grid_search.fit(X_train, Y_train)

# Geting the best estimator from the GridSearchCV object
best_estimator = grid_search.best_estimator_

# Use the best estimator to make predictions on the test set
y_pred = best_estimator.predict(X_test)

# Geting the train score and test score
train_score = 1 * best_estimator.score(X_train, Y_train)
test_score = 1 * best_estimator.score(X_test, Y_test)

# Print out the train score and test score
print('Train score: {:.2f}'.format(train_score))
print('Test score: {:.2f}'.format(test_score))

In [None]:
#constructing a prediction dataframe with the actual and predicted sales values.
df_prediction = pd.DataFrame(np.array((Y_test)**2) ,columns =["Y_test"])
df_prediction["Y_test_predicted"] = np.array((yk_test_predicted)**2)
df_prediction.head(25)

### 2. Which ML model did you choose from the above created models as your final prediction model and why?

Xg boost with hyperparametic tuning  GIVE US 100% Training accuracy and 99% testing score and random forest giving 99% traing score and 99% testing score.
both model give equal score.xg boost with hyperparametric tuning is not taking time to execute and train data but random forest taking some time. so i will xg  boost with hyperparametric tuning.

### ***Congrats! Your model is successfully created and ready for deployment on a live server for a real user interaction !!!***

# **Conclusion**

### **Conclusion:-** With 99% Training Accuracy and 99% Testing Accuracy, Random Forest and Xgoost has proven to be the most efficient model out of the algorithms used in our model, including Linear Regression, Lasso Regression, Ridge Regression, Decision Tree, and Random Forest. Whereas, Linear Regression, Lasso and Rigde are not fitting well into the data points.

### ***Hurrah! You have successfully completed your Machine Learning Capstone Project !!!***