<a href="https://colab.research.google.com/github/ruksz/Bike-Sharing-Demand-Prediction-Capstone/blob/main/Capstone_Regression_Bike_Sharing_demand_prediction.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    - Bike Sharing Demand Prediction



##### **Project Type**    - Regression
##### **Contribution**    - Individual
##### **Team Member 1 -** Rukshar Shaikh


# **Project Summary -**

Bike-sharing services have emerged as a popular and sustainable mode of urban transportation, offering people an environmentally friendly way to navigate cities. These services have revolutionized commuting, providing convenient access to bikes on-demand. However, for bike-sharing companies to operate efficiently and meet customer demand, accurate predictions of bike rental demand are essential.

The "Bike Sharing Demand Prediction" project delves into the realm of data science and machine learning to address this challenge. By leveraging historical data related to bike-sharing activities, this project aims to build predictive models that forecast the number of bike rentals accurately. This predictive capability allows bike-sharing companies to optimize their resource allocation, ensuring that bikes are available when and where they are needed the most.



# **GitHub Link -**

Provide your GitHub Link here.

# **Problem Statement**


The project's main objective is to create a predictive model for bike rental demand using historical usage patterns and weather data. The target variable we want to predict is 'count,' representing the total bike rentals for a specific time. We also have 'casual' and 'registered' columns, breaking down 'count' into rentals by casual and registered users. Our model will consider various features, such as date and time, season, weather conditions, and more, to accurately forecast total bike rental demand ('count').

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 15 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





6. You may add more ml algorithms for model creation. Make sure for each and every algorithm, the following format should be answered.


*   Explain the ML Model used and it's performance using Evaluation metric Score Chart.


*   Cross- Validation & Hyperparameter Tuning

*   Have you seen any improvement? Note down the improvement with updates Evaluation metric Score Chart.

*   Explain each evaluation metric's indication towards business and the business impact pf the ML model used.




















# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib import style
import seaborn as sns
import missingno as msno

# Ignore  the warnings
import warnings
warnings.filterwarnings('always')
warnings.filterwarnings('ignore')


### Dataset Loading

In [None]:
# Load Dataset
from google.colab import drive
drive.mount('/content/drive')

### Dataset First View

In [None]:
# Dataset First Look
data = pd.read_csv('/content/drive/MyDrive/MLProject/SeoulBikeData.csv', encoding='ISO-8859-1')

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
data.shape

### Dataset Information

In [None]:
# Dataset Info
data.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
data.duplicated().sum()

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
data.isnull().sum()

In [None]:
# Visualizing the missing values
plt.figure(figsize=(8, 6))
sns.heatmap(data.isnull(), cbar=False)
plt.title('Null Values Heatmap')
plt.show()

### What did you know about your dataset?

In [None]:
data.head(5)

Date: The date of the recorded data.

Rented Bike Count: The number of bikes rented or used on that date.

Hour: The hour of the day when the data was recorded.

Temperature (°C): The temperature in degrees Celsius.

Humidity (%): The relative humidity as a percentage.

Wind Speed (m/s): The wind speed in meters per second.

Visibility (10m): The visibility in meters.

Dew Point Temperature (°C): The dew point temperature in degrees Celsius.

Solar Radiation (MJ/m2): The solar radiation measured in mega-joules per square meter.

Rainfall (mm): The amount of rainfall in millimeters.

Snowfall (cm): The amount of snowfall in centimeters.

Seasons: The season when the data was recorded (e.g., Winter, Spring).

Holiday: Whether it was a holiday on that date (e.g., No Holiday).

Functioning Day: Indicates whether it was a functioning day (e.g., Yes).

This dataset seems to be suitable for time-series analysis and regression tasks, as it contains both temporal and weather-related features that can be used to predict the demand for rented bikes.

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
data.columns

In [None]:
# Dataset Describe
data.describe()

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
for i in data.columns.tolist():
  print("Unique values in ",i,"is: ",data[i].nunique())

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.
df=data.copy()

In [None]:
# converting date column dtype object to date
df['Date']=pd.to_datetime(df['Date'])

In [None]:
# split day of week, month and year in three column
df['day_of_week'] = df['Date'].dt.day_name() # extract week name from Date column
df["month"] = df['Date'].dt.month_name()   # extract month name from Date column
df["year"] = df['Date'].map(lambda x: x.year).astype("object")     # extract year from Date column and convert it in object type

In [None]:
# drop the Date column
df.drop(columns=['Date'],inplace=True)

In [None]:
df.head(3)

### What all manipulations have you done and insights you found?

When we observe the data we can see that Hour column is a numerical column but it is a time stamp so we have to treat Hour as a categorical feature

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
# Chart - 1 visualization code
rentals_by_holiday = df.groupby('Holiday')['Rented Bike Count'].mean()

# Create a bar chart
plt.figure(figsize=(8, 6))
rentals_by_holiday.plot(kind='bar', color=['skyblue', 'lightcoral'])
plt.title('Average Bike Rentals on Holidays vs. Non-Holidays')
plt.xlabel('Holiday')
plt.ylabel('Average Bike Rentals')
plt.xticks([0, 1], ['Non-Holiday', 'Holiday'], rotation=0)

# Show the plot
plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?

Is there a difference in bike rentals on holidays vs. non-holidays?

##### 2. What is/are the insight(s) found from the chart?

This bar chart compares bike rentals on holidays and non-holidays. In holidays average bike rental increases

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 2

In [None]:
# Chart - 2 visualization code
plt.figure(figsize=(10, 6))
sns.lineplot(data=df, x='Hour', y='Rented Bike Count', ci=None)
plt.title('Bike Rentals by Hour of the Day')
plt.xlabel('Hour')
plt.ylabel('Rented Bike Count')
plt.xticks(range(24))
plt.show()

##### 1. Why did you pick the specific chart?

What is the hourly trend in bike rentals?

##### 2. What is/are the insight(s) found from the chart?

The line plot illustrates how bike rentals vary by the hour of the day. It reveals peak rental hours and the overall pattern of bike usage throughout the day.


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 3

In [None]:
# Chart - 3 visualization code

plt.figure(figsize=(10, 6))
sns.scatterplot(data=df, x='Temperature(°C)', y='Rented Bike Count')
plt.title('Temperature vs. Bike Rentals')
plt.xlabel('Temperature(°C)')
plt.ylabel('Rented Bike Count')
plt.show()


##### 1. Why did you pick the specific chart?

How does bike rental demand vary with temperature?

##### 2. What is/are the insight(s) found from the chart?

The scatterplot examines the relationship between temperature and bike rentals. It helps us understand how temperature influences bike rental demand.


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 4

In [None]:
# Chart - 4 visualization code
plt.figure(figsize=(10, 6))
sns.barplot(data=df, x='month', y='Rented Bike Count', ci=None)
plt.title('Bike Rentals by Month')
plt.xlabel('Month')
plt.ylabel('Rented Bike Count')
plt.show()

##### 1. Why did you pick the specific chart?

Are there any seasonal patterns in bike rentals?

##### 2. What is/are the insight(s) found from the chart?

The bar chart presents bike rentals by month, allowing us to identify any seasonal trends in bike usage.


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 5

In [None]:
# Chart - 5 visualization code
plt.figure(figsize=(10, 6))
sns.boxplot(data=df, x='Seasons', y='Rented Bike Count')
plt.title('Bike Rentals by Seasons')
plt.xlabel('Seasons')
plt.ylabel('Rented Bike Count')
plt.show()


##### 1. Why did you pick the specific chart?

Does weather condition affect bike rentals?

##### 2. What is/are the insight(s) found from the chart?

This box plot visualizes bike rentals across different seasons: Spring, Summer, Autumn, and Winter. It provides insights into how bike rental counts vary with changing seasons.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 6

In [None]:
# Chart - 6 visualization code
plt.figure(figsize=(10, 6))
sns.scatterplot(data=df, x='Humidity(%)', y='Rented Bike Count')
plt.title('Humidity vs. Bike Rentals')
plt.xlabel('Humidity(%)')
plt.ylabel('Rented Bike Count')
plt.show()


##### 1. Why did you pick the specific chart?

 How does bike rental demand change with humidity?

##### 2. What is/are the insight(s) found from the chart?

The scatterplot explores how humidity levels are related to bike rental counts.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 7

In [None]:
# Chart - 7 visualization code
plt.figure(figsize=(10, 6))
sns.lineplot(data=df, x='year', y='Rented Bike Count', ci=None)
plt.title('Bike Rentals by Year')
plt.xlabel('Year')
plt.ylabel('Rented Bike Count')
plt.show()


##### 1. Why did you pick the specific chart?

How has bike rental demand changed over the years?

##### 2. What is/are the insight(s) found from the chart?

The line plot visualizes bike rentals over the years, showing how the demand for bike rentals has evolved

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 8

In [None]:
# Chart - 8 visualization code
rentals_by_day = df.groupby('day_of_week')['Rented Bike Count'].sum()

# Define the order of days of the week for proper sorting
days_of_week_order = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']

# Create a bar chart
plt.figure(figsize=(10, 6))
rentals_by_day[days_of_week_order].plot(kind='bar', color='skyblue')
plt.title('Total Bike Rentals by Day of the Week')
plt.xlabel('Day of the Week')
plt.ylabel('Total Bike Rentals')

# Show the plot
plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?

Does the day of the week impact bike rental demand?

##### 2. What is/are the insight(s) found from the chart?

 Bar chart showing bike rentals on each day of the week. Highest are on Thursday whereas lowest bike renatls are on Sunday

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 9

In [None]:
# Chart - 9 visualization code
df['Rainy'] = df['Rainfall(mm)'].apply(lambda x: 'Rainy' if x > 0 else 'Non-Rainy')

# Group the data by the 'Rainy' column and calculate the mean of 'Rented Bike Count'
rainy_vs_non_rainy = df.groupby('Rainy')['Rented Bike Count'].mean().reset_index()

# Create a bar chart
plt.figure(figsize=(8, 6))
plt.bar(rainy_vs_non_rainy['Rainy'], rainy_vs_non_rainy['Rented Bike Count'])
plt.title('Bike Rentals on Rainy vs. Non-Rainy Days')
plt.xlabel('Day Type')
plt.ylabel('Average Bike Rentals')
plt.show()

##### 1. Why did you pick the specific chart?

Are bike rentals affected by rain?

##### 2. What is/are the insight(s) found from the chart?

Average bike rentals are more on non-rainy days compare to rainy days.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 14 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code
corr_matrix = df.corr()

# Create a heatmap
plt.figure(figsize=(10, 6))
sns.heatmap(corr_matrix, annot=True, cmap='coolwarm', fmt=".2f", linewidths=0.5)
plt.title('Correlation Heatmap')
plt.show()

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Rented Bike Count vs. Temperature(°C): There is a positive correlation of approximately 0.54. As temperature increases, bike rentals tend to increase. This is expected as people are more likely to ride bikes in warmer weather.

Rented Bike Count vs. Hour: There is a moderate positive correlation of around 0.41. Bike rentals tend to be higher during certain hours, possibly peak commuting times.

Rented Bike Count vs. Humidity(%): There is a negative correlation of about -0.20. As humidity increases, bike rentals decrease, although the correlation is relatively weak.

Rented Bike Count vs. Wind Speed (m/s): There is a positive correlation of approximately 0.12. Higher wind speeds may be associated with slightly higher bike rentals.

Rented Bike Count vs. Dew Point Temperature(°C): There is a positive correlation of about 0.38. As the dew point temperature increases, bike rentals tend to increase.

Rented Bike Count vs. Solar Radiation (MJ/m2): There is a positive correlation of approximately 0.26. Bike rentals are influenced by the amount of solar radiation, with more rentals occurring on sunnier days.

Rented Bike Count vs. Rainfall(mm) and Snowfall (cm): Both rainfall and snowfall have negative correlations with bike rentals, though the correlations are weak. Rainy or snowy conditions tend to decrease bike rentals, but the effect is not very strong.

This

#### Chart - 15 - Pair Plot

In [None]:
# Pair Plot visualization code
sns.pairplot(df)

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

## ***5. Hypothesis Testing***

### Based on your chart experiments, define three hypothetical statements from the dataset. In the next three questions, perform hypothesis testing to obtain final conclusion about the statements through your code and statistical testing.



1.  The average number of rented bikes during holidays is significantly different from the average number of rented bikes on non-holidays.
2.   There is a significant difference in the average number of rented bikes on weekdays compared to weekends.
3.  The temperature (in °C) has a significant impact on the number of rented bikes.



### Hypothetical Statement - 1

#### 1. State Your research hypothesis as a null hypothesis and alternate hypothesis.

Null Hypothesis (H0): The average number of rented bikes during holidays is equal to the average number of rented bikes on non-holidays. (avg_holiday = avg_non-holiday)

Alternative Hypothesis (H1): The average number of rented bikes during holidays is not equal to the average number of rented bikes on non-holidays. (avg_holiday ≠ avg_non-holiday)


#### 2. Perform an appropriate statistical test.

In [None]:
# Perform Statistical Test to obtain P-Value
import scipy.stats as stats

# Separating the data into two groups: rented bike counts during holidays and non-holidays
bike_counts_holiday = df[df['Holiday'] == 'Holiday']['Rented Bike Count']
bike_counts_non_holiday = df[df['Holiday'] == 'No Holiday']['Rented Bike Count']

# Perform a two-sample t-test
t_stat, p_value = stats.ttest_ind(bike_counts_holiday, bike_counts_non_holiday, equal_var=False)

# Set the significance level (alpha)
alpha = 0.05

# Print the results
if p_value < alpha:
    print("Statement 1: Reject the null hypothesis")
    print("Conclusion: The average number of rented bikes during holidays is significantly different from non-holidays.")
else:
    print("Statement 1: Fail to reject the null hypothesis")
    print("Conclusion: There is no significant difference in the average number of rented bikes during holidays and non-holidays.")


##### Which statistical test have you done to obtain P-Value?

Two-Sample Independent t-Test

##### Why did you choose the specific statistical test?

In this case, I'm comparing the means of two independent groups (bike counts during holidays and non-holidays).

The t-test is appropriate for comparing means of two groups when we want to determine if there is a statistically significant difference between them.

We use the "equal_var=False" parameter in the t-test because we assume unequal variances between the two groups.

### Hypothetical Statement - 2

#### 1. State Your research hypothesis as a null hypothesis and alternate hypothesis.

Null Hypothesis (H0): There is no significant difference in the average number of rented bikes on weekdays compared to weekends. (avg_weekdays = avg_weekends)

Alternative Hypothesis (H1): There is a significant difference in the average number of rented bikes on weekdays compared to weekends. (avg_weekdays ≠ avg_weekends)

#### 2. Perform an appropriate statistical test.

In [None]:
# Perform Statistical Test to obtain P-Value
# Separating the data into two groups: rented bike counts on weekdays and weekends
bike_counts_weekdays = df[df['Functioning Day'] == 'Yes']['Rented Bike Count']
bike_counts_weekends = df[df['Functioning Day'] == 'No']['Rented Bike Count']

# Perform a two-sample t-test
t_stat, p_value = stats.ttest_ind(bike_counts_weekdays, bike_counts_weekends, equal_var=False)

# Set the significance level (alpha)
alpha = 0.05

# Print the results
if p_value < alpha:
    print("Statement 2: Reject the null hypothesis")
    print("Conclusion: There is a significant difference in the average number of rented bikes on weekdays and weekends.")
else:
    print("Statement 2: Fail to reject the null hypothesis")
    print("Conclusion: There is no significant difference in the average number of rented bikes on weekdays and weekends.")


##### Which statistical test have you done to obtain P-Value?

Two-Sample Independent t-Test

##### Why did you choose the specific statistical test?

Similar to Statement 1, we are comparing the means of two independent groups (bike counts on weekdays and weekends).

Again, the t-test is suitable for this comparison when we want to test if there is a significant difference in means between the groups.

### Hypothetical Statement - 3

#### 1. State Your research hypothesis as a null hypothesis and alternate hypothesis.

Null Hypothesis (H0): The temperature (in C) has no significant impact on the number of rented bikes. (p = 0)

Alternative Hypothesis (H1): The temperature (in C) has a significant impact on the number of rented bikes. (p ≠ 0)

#### 2. Perform an appropriate statistical test.

In [None]:
# Perform Statistical Test to obtain P-Value
# correlation test between temperature and rented bike counts
correlation_coefficient, p_value = stats.pearsonr(df['Temperature(°C)'], df['Rented Bike Count'])

# Set the significance level (alpha)
alpha = 0.05

# Print the results
if p_value < alpha:
    print("Statement 3: Reject the null hypothesis")
    print("Conclusion: The temperature has a significant impact on the number of rented bikes.")
else:
    print("Statement 3: Fail to reject the null hypothesis")
    print("Conclusion: There is no significant impact of temperature on the number of rented bikes.")


##### Which statistical test have you done to obtain P-Value?

Pearson Correlation Coefficient

##### Why did you choose the specific statistical test?

Statement 3 involves examining the relationship between two continuous variables (temperature and rented bike counts).

The Pearson Correlation Coefficient is used to measure the strength and direction of a linear relationship between two continuous variables.

A correlation test is appropriate for this scenario to determine if there is a significant correlation between temperature and bike rentals.

## ***6. Feature Engineering & Data Pre-processing***

### 1. Handling Missing Values

In [None]:
# Handling Missing Values & Missing Value Imputation
# Check for missing values
missing_values = df.isnull().sum()
print(missing_values)

No missing values

#### What all missing value imputation techniques have you used and why did you use those techniques?

Answer Here.

### 2. Handling Outliers

In [None]:
# Handling Outliers & Outlier treatments

# Define a Z-score threshold
z_score_threshold = 3
columns_with_outliers = []
filtered_data = pd.DataFrame()

# Loop through each column for outlier detection and treatment
for column in df.select_dtypes(include=['number']):
    # Calculate Z-scores for the current column
    z_scores = np.abs(stats.zscore(df[column]))

    # Check if any Z-scores are greater than the threshold
    if np.any(z_scores > z_score_threshold):
         columns_with_outliers.append(column)

    # Identify outlier indices
    outlier_indices = np.where(z_scores > z_score_threshold)

    # Remove outliers and append to the filtered DataFrame
    filtered_data[column] = df[column].drop(outlier_indices[0])

# # Visualize the original data and filtered data using box plots
# plt.figure(figsize=(12, 4))
# plt.subplot(1, 2, 1)
# df[columns_with_outliers].boxplot()
# plt.title('Original Data with Outliers')

# plt.subplot(1, 2, 2)
# filtered_data.boxplot()
# plt.title('Data after Outlier Removal')

# plt.tight_layout()
# plt.show()

# print("Columns with Outliers:")
# print(columns_with_outliers)

# Print("the original and filtered datasets")
# print("Original Data:")
# print(df[columns_with_outliers])

# print("\nData after Outlier Removal:")
# print(filtered_data)


### 3. Categorical Encoding

In [None]:
# Encode your categorical columns

#### What all categorical encoding techniques have you used & why did you use those techniques?

Answer Here.

### 4. Textual Data Preprocessing
(It's mandatory for textual dataset i.e., NLP, Sentiment Analysis, Text Clustering etc.)

#### 1. Expand Contraction

In [None]:
# Expand Contraction

#### 2. Lower Casing

In [None]:
# Lower Casing

#### 3. Removing Punctuations

In [None]:
# Remove Punctuations

#### 4. Removing URLs & Removing words and digits contain digits.

In [None]:
# Remove URLs & Remove words and digits contain digits

#### 5. Removing Stopwords & Removing White spaces

In [None]:
# Remove Stopwords

In [None]:
# Remove White spaces

#### 6. Rephrase Text

In [None]:
# Rephrase Text

#### 7. Tokenization

In [None]:
# Tokenization

#### 8. Text Normalization

In [None]:
# Normalizing Text (i.e., Stemming, Lemmatization etc.)

##### Which text normalization technique have you used and why?

Answer Here.

#### 9. Part of speech tagging

In [None]:
# POS Taging

#### 10. Text Vectorization

In [None]:
# Vectorizing Text

##### Which text vectorization technique have you used and why?

Answer Here.

### 4. Feature Manipulation & Selection

#### 1. Feature Manipulation

In [None]:
# Manipulate Features to minimize feature correlation and create new features

#### 2. Feature Selection

In [None]:
# Select your features wisely to avoid overfitting

##### What all feature selection methods have you used  and why?

Answer Here.

##### Which all features you found important and why?

Answer Here.

### 5. Data Transformation

#### Do you think that your data needs to be transformed? If yes, which transformation have you used. Explain Why?

In [None]:
# Transform Your data

### 6. Data Scaling

In [None]:
# Scaling your data

##### Which method have you used to scale you data and why?

### 7. Dimesionality Reduction

##### Do you think that dimensionality reduction is needed? Explain Why?

Answer Here.

In [None]:
# DImensionality Reduction (If needed)

##### Which dimensionality reduction technique have you used and why? (If dimensionality reduction done on dataset.)

Answer Here.

### 8. Data Splitting

In [None]:
# Split your data to train and test. Choose Splitting ratio wisely.

##### What data splitting ratio have you used and why?

Answer Here.

### 9. Handling Imbalanced Dataset

##### Do you think the dataset is imbalanced? Explain Why.

Answer Here.

In [None]:
# Handling Imbalanced Dataset (If needed)

##### What technique did you use to handle the imbalance dataset and why? (If needed to be balanced)

Answer Here.

## ***7. ML Model Implementation***

### ML Model - 1

In [None]:
# ML Model - 1 Implementation

# Fit the Algorithm

# Predict on the model

#### 1. Explain the ML Model used and it's performance using Evaluation metric Score Chart.

In [None]:
# Visualizing evaluation Metric Score chart

#### 2. Cross- Validation & Hyperparameter Tuning

In [None]:
# ML Model - 1 Implementation with hyperparameter optimization techniques (i.e., GridSearch CV, RandomSearch CV, Bayesian Optimization etc.)

# Fit the Algorithm

# Predict on the model

##### Which hyperparameter optimization technique have you used and why?

Answer Here.

##### Have you seen any improvement? Note down the improvement with updates Evaluation metric Score Chart.

Answer Here.

### ML Model - 2

#### 1. Explain the ML Model used and it's performance using Evaluation metric Score Chart.

In [None]:
# Visualizing evaluation Metric Score chart

#### 2. Cross- Validation & Hyperparameter Tuning

In [None]:
# ML Model - 1 Implementation with hyperparameter optimization techniques (i.e., GridSearch CV, RandomSearch CV, Bayesian Optimization etc.)

# Fit the Algorithm

# Predict on the model

##### Which hyperparameter optimization technique have you used and why?

Answer Here.

##### Have you seen any improvement? Note down the improvement with updates Evaluation metric Score Chart.

Answer Here.

#### 3. Explain each evaluation metric's indication towards business and the business impact pf the ML model used.

Answer Here.

### ML Model - 3

In [None]:
# ML Model - 3 Implementation

# Fit the Algorithm

# Predict on the model

#### 1. Explain the ML Model used and it's performance using Evaluation metric Score Chart.

In [None]:
# Visualizing evaluation Metric Score chart

#### 2. Cross- Validation & Hyperparameter Tuning

In [None]:
# ML Model - 3 Implementation with hyperparameter optimization techniques (i.e., GridSearch CV, RandomSearch CV, Bayesian Optimization etc.)

# Fit the Algorithm

# Predict on the model

##### Which hyperparameter optimization technique have you used and why?

Answer Here.

##### Have you seen any improvement? Note down the improvement with updates Evaluation metric Score Chart.

Answer Here.

### 1. Which Evaluation metrics did you consider for a positive business impact and why?

Answer Here.

### 2. Which ML model did you choose from the above created models as your final prediction model and why?

Answer Here.

### 3. Explain the model which you have used and the feature importance using any model explainability tool?

Answer Here.

## ***8.*** ***Future Work (Optional)***

### 1. Save the best performing ml model in a pickle file or joblib file format for deployment process.


In [None]:
# Save the File

### 2. Again Load the saved model file and try to predict unseen data for a sanity check.


In [None]:
# Load the File and predict unseen data.

### ***Congrats! Your model is successfully created and ready for deployment on a live server for a real user interaction !!!***

# **Conclusion**

Write the conclusion here.

### ***Hurrah! You have successfully completed your Machine Learning Capstone Project !!!***