<a href="https://colab.research.google.com/github/shubham19nijwala/Bike_Sharing_Demand_Prediction-Regression/blob/main/Bike_Sharing_Demand_Prediction.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    -  *Bike Sharing Demand Prediction*

---



---





### **Project Type**    - *Regression*
### **Contribution**    - *Individual*- *Shubham Singh Nijwala*


# **Project Summary -**

The emergence of bike and scooter ride-sharing companies in urban areas, has created a challenge in accurately predicting the demand for their services. Overestimating or underestimating the demand can lead to resource wastage or revenue loss, respectively. To address this challenge, a project aims to combine historical bike usage patterns with weather data to forecast bike rental demand.

The project utilizes a dataset with eight input variables: 'Date', 'Seasons', 'Holiday', 'Functional day', 'Temperature', 'Humidity', 'Dew Point Temperature', and 'Windspeed'. Python libraries such as Pandas, Seaborn, NumPy, and scikit-learn (sklearn) are used to develop the prediction algorithm. By evaluating different models, the project seeks to identify algorithms that provide accurate predictions and can be deployed effectively in real-world scenarios.

Accurate bike rental demand forecasting offers significant benefits. Ride-sharing companies can reduce waste and improve resource allocation, resulting in cost savings and increased profitability. By optimizing bike maintenance, parking space allocation, and operational planning based on anticipated demand, these companies can operate more efficiently.

Moreover, accurate demand predictions enhance customer satisfaction and provide a better overall experience for users. By ensuring an adequate supply of bikes and scooters based on anticipated demand, customers are less likely to face unavailability issues. This fosters customer loyalty, positive word-of-mouth, and sustained business growth.

Additionally, bike and scooter ride-sharing services are considered environmentally friendly alternatives to traditional transportation methods. By incorporating weather data into demand forecasting, it becomes possible to align the supply of bikes and scooters with weather conditions suitable for cycling. This encourages more people to choose biking as a means of transportation, resulting in reduced traffic congestion and lower carbon emissions. Accurate demand forecasting contributes to the broader goal of promoting sustainable and eco-friendly urban mobility.

In conclusion, the project's aim to combine historical bike usage patterns with weather data for accurate demand forecasting holds significant potential for the bike and scooter ride-sharing industry. By utilizing advanced algorithms and machine learning techniques, the project seeks to optimize resource allocation, reduce waste, and increase profitability for ride-sharing companies. Simultaneously, it strives to enhance customer satisfaction, promote environmentally friendly transportation alternatives, and mitigate traffic congestion and carbon emissions. Data-driven insights can have a positive impact on both the business and environmental aspects of the bike and scooter ride-sharing industry, leading to a more sustainable and efficient urban mobility landscape.






# **GitHub Link -**

Provide your GitHub Link here.

# **Problem Statement**


**Currently Rental bikes are introduced in many urban cities for the enhancement of mobility comfort. It is important to make the rental bike available and accessible to the public at the right time as it lessens the waiting time. Eventually, providing the city with a stable supply of rental bikes becomes a major concern. The crucial part is the prediction of bike count required at each hour for the stable supply of rental bikes.**

#**Data Description**
* **Date** - day/month/year
* **Rented Bike count** - Count of bikes rented per hour
* **Hour** - Hour of the day
* **Temperature**-Temperature in Celsius
* **Humidity** - Humidity in the air in %
* **Windspeed** - Speed of the wind in  m/s
* **Visibility** - Visibility in m (10m)
* **Dew point temperature** - Temperature at the beggining of the day(Celsius)
* **Solar radiation** -Sun contribution (MJ/m2)
* **Rainfall** - Amount of raining in mm
* **Snowfall** - Amount of snowing in cm
* **Seasons** - Winter, Spring, Summer, Autumn
* **Holiday** - Holiday/No holiday
* **Functional Day** -  If the day is a Functioning Day or not

# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns

from datetime import datetime
import datetime as dt

from sklearn.model_selection import train_test_split,cross_validate,GridSearchCV,RandomizedSearchCV

from sklearn.preprocessing import MinMaxScaler,OneHotEncoder,OrdinalEncoder,LabelEncoder

from sklearn.linear_model import LinearRegression,Lasso,Ridge,ElasticNet

from sklearn.metrics import accuracy_score,mean_absolute_error,mean_squared_error,r2_score,log_loss

import warnings
warnings.filterwarnings('ignore')


### Dataset Loading

In [None]:
# Load Dataset
from google.colab import drive
drive.mount('/content/drive')


In [None]:
bike_df=pd.read_csv('/content/drive/MyDrive/Bike Sharing Demand Prediction/SeoulBikeData.csv',encoding='latin')

### Dataset First View

In [None]:
# Dataset First Look
bike_df.sample(5)

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
bike_df.shape

### Dataset Information

In [None]:
# Dataset Info
bike_df.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
bike_df.duplicated().sum()

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
bike_df.isnull().sum()

In [None]:
# Visualizing the missing values
missing_values_per = pd.DataFrame((bike_df.isnull().sum()/len(bike_df))*100).reset_index()
plt.figure(figsize=(15,5))
plt.stem(missing_values_per['index'],missing_values_per[0])
plt.xticks(rotation=45,fontsize=10)
plt.title('Percentage of Missing Values')
plt.ylabel('%')
plt.show()



### What did you know about your dataset?

***There are no missing values and duplicates present in our Dataset.Our data contains 8760 rows and 14 columns.***

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
bike_df.columns

In [None]:
# Renaming Columns
bike_df=bike_df.rename(columns={'Rented Bike Count':'Rented_Bike_Count',
                                'Temperature(°C)':'Temperature',
                                'Humidity(%)':'Humidity',
                                'Wind speed (m/s)':'Wind_speed',
                                'Visibility (10m)':'Visibility',
                                'Dew point temperature(°C)':'Dew_point_temperature',
                                'Solar Radiation (MJ/m2)':'Solar_Radiation',
                                'Rainfall(mm)':'Rainfall',
                                'Snowfall (cm)':'Snowfall',
                                'Functioning Day':'Functioning_Day'})

In [None]:
# Dataset Describe
bike_df.describe(include='all')

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
for var in bike_df.columns:
  print(var,' : ', bike_df[var].unique())
  print('--'*70)

In [None]:
# Number of Unique values in each columns
bike_df.nunique()

## 3. ***Data Wrangling***

In [None]:
# convert data type of 'Date' column to Datetime format
bike_df['Date'].dtype

In [None]:
bike_df['Date']=bike_df['Date'].apply(lambda x: datetime.strptime(x,'%d/%m/%Y'))

In [None]:
bike_df['Date'].dtype

In [None]:
# Extracting: 'day' , 'month' and  'year'  from 'Date' column:
bike_df['day']=bike_df['Date'].dt.day_name()
bike_df['month']=bike_df['Date'].dt.month
bike_df['year']=bike_df['Date'].dt.year

In [None]:
bike_df.head()

In [None]:
# Creating new column named: 'weekend'
bike_df['weekend']=bike_df['day'].apply(lambda x: 1 if x== 'Saturday' or x== 'Sunday' else 0)

In [None]:
bike_df['weekend'].value_counts()

In [None]:
# Drop columns: 'Date', 'day' and 'year'
bike_df.drop([ 'Date','day','year'],axis=1,inplace=True)


In [None]:
bike_df.sample(3)

###***The "Date" column, initially read as a string by Python, is essential for analyzing user behavior. To enable precise analysis, it is necessary to convert the column into a datetime format. Once converted, it can be separated into three separate columns: "year," "month," and "day." These columns represent distinct temporal components and allow for more efficient categorization and analysis of the data, providing valuable insights into user behavior patterns.***

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

## ***UNIVARIATE ANALYSIS***

#### Chart - 1

In [None]:
bike_df['Rented_Bike_Count'].value_counts().sort_values(ascending=False)

In [None]:
# Chart - 1 : Distribution of Dependent variable

fig, (ax1,ax2) = plt.subplots(1,2,figsize=(18,6))
sns.kdeplot(bike_df,x='Rented_Bike_Count',fill=True,color='g',ax=ax1)
ax1.axvline(bike_df['Rented_Bike_Count'].mean(), color='salmon', linestyle='dashed', linewidth=2)
ax1.axvline(bike_df['Rented_Bike_Count'].median(), color='royalblue', linestyle=':', linewidth=2)  
sns.boxplot(bike_df,x='Rented_Bike_Count',ax=ax2,palette="viridis")
plt.show()

#####  What is/are the insight(s) found from the chart?

###**The dependent variable is positively skewed and have lot more outliers**

#####  Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

####**The gained insights from analyzing data with a positively skewed dependent variable (Bike rented count) and a high number of outliers can potentially create a positive business impact. However, the presence of outliers suggests instances where there are extremely high bike rental counts, which may indicate exceptional demand spikes or anomalies. While this may not directly lead to negative growth, it can pose challenges in capacity planning, resource allocation, and service delivery, requiring businesses to carefully manage and optimize operations to meet customer demand and prevent potential negative impacts on customer satisfaction and business growth.**

#### Chart - 2

In [None]:
num_features=bike_df.drop('Rented_Bike_Count',axis=1).describe().columns
num_features

In [None]:
# Chart - 2 : Distribution of Numerical Features
for var in num_features:
  fig,(ax1,ax2)=plt.subplots(1,2,figsize=(18,5))
  sns.kdeplot(bike_df,x=var,fill=True,ax=ax1,color='g')
  ax1.axvline(bike_df[var].mean(),color='salmon', linestyle='dashed', linewidth=2)
  ax1.axvline(bike_df[var].median(),color='royalblue', linestyle=':', linewidth=2)
  sns.boxplot(bike_df,x=var,ax=ax2,palette="viridis")
  plt.show()
  print('\n\n')



#####  What is/are the insight(s) found from the chart?

####**Our numerical features exhibit skewness and some of them contains outliers.**

 #####  Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

####**Yes, the gained insights from careful analysis and appropriate handling of skewness and outliers can help create a positive business impact. By accurately understanding and addressing these data characteristics, businesses can make informed decisions, develop effective strategies, and optimize their operations. This can lead to improved resource allocation, targeted marketing, enhanced customer satisfaction, and overall positive growth and performance in the business.**

#### Chart - 3

In [None]:
# Chart - 3 : Plotting graph for categorical features
fig,(ax1,ax2,ax3)=plt.subplots(1,3,figsize=(18,5))
sns.countplot(bike_df,x='Seasons',ax=ax1,palette='viridis')
sns.countplot(bike_df,x='Holiday',ax=ax2,palette='pastel')
sns.countplot(bike_df,x='Functioning_Day',ax=ax3,palette='inferno')
plt.show()


#####  What is/are the insight(s) found from the chart?

#####**There is not much difference across seasons but the count of Rental bikes significantly imbalance across 'Holiday' and 'Functioning Day' columns.**

#####  Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

#####**Analyzing the rental bike patterns across seasons, holidays, and functional days provides valuable insights for understanding the fluctuation in demand and optimizing resource allocation accordingly.**

## ***BIVARIATE ANALYSIS***

#### Chart - 4

In [None]:
num_features

In [None]:
# Chart - 4 : Relation between  Numerical Features and dependent variable
for var in num_features:
  plt.figure(figsize=(12,6))
  sns.scatterplot(bike_df,x=var,y='Rented_Bike_Count',color='salmon')
  correlation=bike_df[var].corr(bike_df['Rented_Bike_Count'])
  plt.title('Rented_Bike_Count vs ' + var + ': Correlation = '+str(correlation) )
  z = np.polyfit(bike_df[var], bike_df['Rented_Bike_Count'], 1)
  y_hat = np.poly1d(z)(bike_df[var])
  plt.plot(bike_df[var], y_hat,'r--', lw=1)
  plt.show()
  print('\n\n\n')

 



##### 1. Why did you pick the specific chart?

#####**To  understand the  relationship  between numerical feature and dependent variable by plotting a regression line helps visualize and understand their association.**

##### 2. What is/are the insight(s) found from the chart?

#####**Hour, Temperature, wind speed, visibility, dew point temperature,solar radiation & month are positively correlated with our dependent variable (Rented Bike Count) while other numerical features are negatively correlated with Rented bike count.**

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

#####**Yes, the gained insights indicating positive correlations between Hour, Temperature, wind speed, visibility, dew point temperature, solar radiation, and month with the Rented Bike Count can inform business decisions such as optimizing operational hours, adjusting pricing, and targeting marketing efforts to maximize bike rentals and drive positive business impact. Additionally, understanding the negative correlations with other numerical features can help identify areas for improvement and implement strategies to mitigate potential negative impacts on bike rentals.**

#### Chart - 5

In [None]:
# Chart - 5: Relation between Categorical features and Dependent Variable
fig,(ax1,ax2,ax3)=plt.subplots(1,3,figsize=(18,5))
sns.barplot(bike_df,x='Seasons',y='Rented_Bike_Count',ax=ax1,palette='viridis',capsize=0.1)
sns.barplot(bike_df,x='Holiday',y='Rented_Bike_Count',ax=ax2,palette='pastel',capsize=0.1)
sns.barplot(bike_df,x='Functioning_Day',y='Rented_Bike_Count',ax=ax3,palette='inferno',capsize=0.1)
plt.show()


##### 1. Why did you pick the specific chart?

#####**To plot the variation in Rented Bike count due to Seasons,Holiday and Functioning day**

##### 2. What is/are the insight(s) found from the chart?

* **Count is maximum during summer but minimum during winter.**
* **During holidays counts drop down**. 
* **Contribution of non-funtioning day to count is insignificant**. 


##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

**Yes, the gained insights about the count being maximum during summer, dropping during winter, and decreasing during holidays can help businesses plan their resources, adjust marketing strategies, and optimize operations to meet customer demand, resulting in a positive business impact. Additionally, the understanding that non-functioning days have an insignificant contribution can guide businesses in allocating resources more efficiently.**

#### Chart - 6

In [None]:
# Chart - 6: plot to analyze the relationship between "Rented_Bike_Count" and "Rainfall"
plt.figure(figsize=(12,6))
bike_df.groupby('Rainfall').mean()['Rented_Bike_Count'].plot(c='g')
plt.xlabel('Rainfall in mm')
plt.ylabel('Average rented bike count')
plt.xticks(range(0,37,2))
plt.show()

 #####  What is/are the insight(s) found from the chart?

**The above plot indicates that despite heavy rainfall, the demand for rented bikes does not decrease. For instance, even with a rainfall of 22-24 mm, there is a significant peak in the number of rented bikes.**

 #####  Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

**Yes, the gained insight that heavy rainfall does not decrease the demand for rented bikes can have a positive business impact. Businesses can leverage this information to optimize their operations during rainy periods and ensure a continuous supply of bikes, meeting customer demand and potentially increasing revenue.**

#### Chart - 7

In [None]:
# Chart - 7 : plot to analyze the relationship between "Rented_Bike_Count" and "Wind_speed" 
plt.figure(figsize=(12,6))
bike_df.groupby('Wind_speed').mean()['Rented_Bike_Count'].plot(c='g')
plt.xlabel('Wind_Speed in m/s')
plt.ylabel('Average rented bike count')
plt.show()




#####  What is/are the insight(s) found from the chart?

**From the plot above, we can observe that the demand for rented bikes is evenly distributed regardless of the wind speed. However, there is a spike in bike rentals when the wind speed is at 7 m/s, indicating that people enjoy riding bikes when there is a slight breeze.**

#####  Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

**Yes, the gained insight about the even distribution of bike rentals regardless of wind speed, with a spike at 7 m/s, can have a positive business impact. Businesses can promote biking as an enjoyable activity during breezy conditions, potentially increasing bike rentals and attracting more customers.**

#### Chart - 8

In [None]:
# Chart - 8 
fig, ax = plt.subplots(figsize=(15, 8))
sns.boxplot(data=bike_df, x='Hour', y='Rented_Bike_Count', ax=ax,palette='viridis')
ax.set(title='Count of Rented bikes according to Hour')
plt.show()

##### What is/are the insight(s) found from the chart?

**The plot above showcases the usage of rented bikes across different hours throughout the year. It is notable that people tend to use rented bikes during their working hours, specifically from 7 AM to 9 AM and 5 PM to 7 PM.**

#####  Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

**Yes, the gained insight that people tend to use rented bikes during their working hours can have a positive business impact. Businesses can optimize their operations and marketing efforts during these peak hours to meet customer demand, attract more riders, and potentially increase revenue.**

#### Chart - 9

In [None]:
# Chart - 9 :  
plt.figure(figsize=(14,6))
sns.barplot(x='month',y='Rented_Bike_Count',data=bike_df,palette='viridis')
plt.title('Average count of Bikes Rented per Month')
plt.show()

#####  What is/are the insight(s) found from the chart?

**During Summer season the demand for rented bikes are on hike while during winter demand is low.**

#####  Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

**Yes, the gained insight that demand for rented bikes is high during summer and low during winter can help businesses align their resources and marketing strategies accordingly, maximizing revenue and creating a positive business impact.**

## ***TriVariate Analysis***

#### Chart - 10

In [None]:
# Chart - 10 : 
plt.figure(figsize=(16,6))
sns.lineplot(x='Hour',y= "Rented_Bike_Count",data=bike_df,hue='Seasons',palette='deep',alpha=1)
plt.title('Analysing trend line of "Rented Bike Count" w.r.t "Hour" for different Seasons')
plt.show()

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 11

In [None]:
# Chart - 11 :  
plt.figure(figsize=(16,6))
sns.pointplot(x='Hour',y= "Rented_Bike_Count",data=bike_df,hue='Holiday',palette='rocket')
plt.title('Analysing trend line of "Rented Bike Count" w.r.t "Hour" seperately for "Holiday" and "No Holiday" ')
plt.show()

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 12

In [None]:
# Chart - 12 
plt.figure(figsize=(16,6))
sns.pointplot(x='Hour',y= "Rented_Bike_Count",data=bike_df,hue='weekend',palette='rocket')
plt.title('Analysing trend line of "Rented Bike Count" w.r.t "Hour" sperately for "weekdays" and "weekend" ')
plt.show()

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 13

In [None]:
# Chart - 13 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 14 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

#### Chart - 15 - Pair Plot 

In [None]:
# Pair Plot visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

## ***5. Hypothesis Testing***

### Based on your chart experiments, define three hypothetical statements from the dataset. In the next three questions, perform hypothesis testing to obtain final conclusion about the statements through your code and statistical testing.

Answer Here.

### Hypothetical Statement - 1

#### 1. State Your research hypothesis as a null hypothesis and alternate hypothesis.

Answer Here.

#### 2. Perform an appropriate statistical test.

In [None]:
# Perform Statistical Test to obtain P-Value

##### Which statistical test have you done to obtain P-Value?

Answer Here.

##### Why did you choose the specific statistical test?

Answer Here.

### Hypothetical Statement - 2

#### 1. State Your research hypothesis as a null hypothesis and alternate hypothesis.

Answer Here.

#### 2. Perform an appropriate statistical test.

In [None]:
# Perform Statistical Test to obtain P-Value

##### Which statistical test have you done to obtain P-Value?

Answer Here.

##### Why did you choose the specific statistical test?

Answer Here.

### Hypothetical Statement - 3

#### 1. State Your research hypothesis as a null hypothesis and alternate hypothesis.

Answer Here.

#### 2. Perform an appropriate statistical test.

In [None]:
# Perform Statistical Test to obtain P-Value

##### Which statistical test have you done to obtain P-Value?

Answer Here.

##### Why did you choose the specific statistical test?

Answer Here.

## ***6. Feature Engineering & Data Pre-processing***

### 1. Handling Missing Values

In [None]:
# Handling Missing Values & Missing Value Imputation

#### What all missing value imputation techniques have you used and why did you use those techniques?

Answer Here.

### 2. Handling Outliers

In [None]:
# Handling Outliers & Outlier treatments

##### What all outlier treatment techniques have you used and why did you use those techniques?

Answer Here.

### 3. Categorical Encoding

In [None]:
# Encode your categorical columns

#### What all categorical encoding techniques have you used & why did you use those techniques?

Answer Here.

### 4. Textual Data Preprocessing 
(It's mandatory for textual dataset i.e., NLP, Sentiment Analysis, Text Clustering etc.)

#### 1. Expand Contraction

In [None]:
# Expand Contraction

#### 2. Lower Casing

In [None]:
# Lower Casing

#### 3. Removing Punctuations

In [None]:
# Remove Punctuations

#### 4. Removing URLs & Removing words and digits contain digits.

In [None]:
# Remove URLs & Remove words and digits contain digits

#### 5. Removing Stopwords & Removing White spaces

In [None]:
# Remove Stopwords

In [None]:
# Remove White spaces

#### 6. Rephrase Text

In [None]:
# Rephrase Text

#### 7. Tokenization

In [None]:
# Tokenization

#### 8. Text Normalization

In [None]:
# Normalizing Text (i.e., Stemming, Lemmatization etc.)

##### Which text normalization technique have you used and why?

Answer Here.

#### 9. Part of speech tagging

In [None]:
# POS Taging

#### 10. Text Vectorization

In [None]:
# Vectorizing Text

##### Which text vectorization technique have you used and why?

Answer Here.

### 4. Feature Manipulation & Selection

#### 1. Feature Manipulation

In [None]:
# Manipulate Features to minimize feature correlation and create new features

#### 2. Feature Selection

In [None]:
# Select your features wisely to avoid overfitting

##### What all feature selection methods have you used  and why?

Answer Here.

##### Which all features you found important and why?

Answer Here.

### 5. Data Transformation

#### Do you think that your data needs to be transformed? If yes, which transformation have you used. Explain Why?

In [None]:
# Transform Your data

### 6. Data Scaling

In [None]:
# Scaling your data

##### Which method have you used to scale you data and why?

### 7. Dimesionality Reduction

##### Do you think that dimensionality reduction is needed? Explain Why?

Answer Here.

In [None]:
# DImensionality Reduction (If needed)

##### Which dimensionality reduction technique have you used and why? (If dimensionality reduction done on dataset.)

Answer Here.

### 8. Data Splitting

In [None]:
# Split your data to train and test. Choose Splitting ratio wisely.

##### What data splitting ratio have you used and why? 

Answer Here.

### 9. Handling Imbalanced Dataset

##### Do you think the dataset is imbalanced? Explain Why.

Answer Here.

In [None]:
# Handling Imbalanced Dataset (If needed)

##### What technique did you use to handle the imbalance dataset and why? (If needed to be balanced)

Answer Here.

## ***7. ML Model Implementation***

### ML Model - 1

In [None]:
# ML Model - 1 Implementation

# Fit the Algorithm

# Predict on the model

#### 1. Explain the ML Model used and it's performance using Evaluation metric Score Chart.

In [None]:
# Visualizing evaluation Metric Score chart

#### 2. Cross- Validation & Hyperparameter Tuning

In [None]:
# ML Model - 1 Implementation with hyperparameter optimization techniques (i.e., GridSearch CV, RandomSearch CV, Bayesian Optimization etc.)

# Fit the Algorithm

# Predict on the model

##### Which hyperparameter optimization technique have you used and why?

Answer Here.

##### Have you seen any improvement? Note down the improvement with updates Evaluation metric Score Chart.

Answer Here.

### ML Model - 2

#### 1. Explain the ML Model used and it's performance using Evaluation metric Score Chart.

In [None]:
# Visualizing evaluation Metric Score chart

#### 2. Cross- Validation & Hyperparameter Tuning

In [None]:
# ML Model - 1 Implementation with hyperparameter optimization techniques (i.e., GridSearch CV, RandomSearch CV, Bayesian Optimization etc.)

# Fit the Algorithm

# Predict on the model

##### Which hyperparameter optimization technique have you used and why?

Answer Here.

##### Have you seen any improvement? Note down the improvement with updates Evaluation metric Score Chart.

Answer Here.

#### 3. Explain each evaluation metric's indication towards business and the business impact pf the ML model used.

Answer Here.

### ML Model - 3

In [None]:
# ML Model - 3 Implementation

# Fit the Algorithm

# Predict on the model

#### 1. Explain the ML Model used and it's performance using Evaluation metric Score Chart.

In [None]:
# Visualizing evaluation Metric Score chart

#### 2. Cross- Validation & Hyperparameter Tuning

In [None]:
# ML Model - 3 Implementation with hyperparameter optimization techniques (i.e., GridSearch CV, RandomSearch CV, Bayesian Optimization etc.)

# Fit the Algorithm

# Predict on the model

##### Which hyperparameter optimization technique have you used and why?

Answer Here.

##### Have you seen any improvement? Note down the improvement with updates Evaluation metric Score Chart.

Answer Here.

### 1. Which Evaluation metrics did you consider for a positive business impact and why?

Answer Here.

### 2. Which ML model did you choose from the above created models as your final prediction model and why?

Answer Here.

### 3. Explain the model which you have used and the feature importance using any model explainability tool?

Answer Here.

## ***8.*** ***Future Work (Optional)***

### 1. Save the best performing ml model in a pickle file or joblib file format for deployment process.


In [None]:
# Save the File

### 2. Again Load the saved model file and try to predict unseen data for a sanity check.


In [None]:
# Load the File and predict unseen data.

### ***Congrats! Your model is successfully created and ready for deployment on a live server for a real user interaction !!!***

# **Conclusion**

Write the conclusion here.

### ***Hurrah! You have successfully completed your Machine Learning Capstone Project !!!***