<a href="https://colab.research.google.com/github/jay-D-Deshmukh/Mobile-Price-Range-Prediclassification-proction-/blob/main/Mobile_Price_pre.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    - Mobile Price Range Prediction



##### **Project Type**    - Classification
##### **Contribution**    - Individual
##### **Name  -**   Jay Deshmukh


# **Project Summary -**

Introduction:
Mobile price prediction is a significant application of machine learning that aims to predict the price range of mobile devices based on various features. Classification algorithms play a vital role in this domain, as they help categorize mobile phones into distinct price brackets. This summary explores the concept of mobile price prediction using classification machine learning techniques, outlining the process, benefits, challenges, and real-world applications.

Conclusion:
Mobile price prediction through classification machine learning empowers consumers and businesses with valuable insights into the mobile phone market. By analyzing features and applying classification algorithms, accurate price range predictions can be achieved. This technology finds applications in consumer decision-making, marketing, market analysis, and competitive pricing. Addressing challenges such as data quality and model interpretability ensures that mobile price prediction remains a valuable tool in the ever-evolving mobile industry.



# **GitHub Link -**

[link text](https://)https://github.com/jay-D-Deshmukh/Mobile-Price-Range-Prediclassification-proction-

# **Problem Statement**


 Mobile Price Prediction using Classification Machine Learning

In the dynamic and competitive landscape of the mobile phone industry, accurately predicting the price range of mobile devices based on their features is of paramount importance. Consumers seek smartphones that align with their budget constraints while meeting their performance and feature requirements. Manufacturers and retailers, on the other hand, need insights into market trends and customer preferences to make informed decisions about pricing strategies and product offerings. To address these challenges, the problem at hand is to develop an effective classification machine learning model that can predict the price range of mobile phones based on their technical specifications.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 15 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





6. You may add more ml algorithms for model creation. Make sure for each and every algorithm, the following format should be answered.


*   Explain the ML Model used and it's performance using Evaluation metric Score Chart.


*   Cross- Validation & Hyperparameter Tuning

*   Have you seen any improvement? Note down the improvement with updates Evaluation metric Score Chart.

*   Explain each evaluation metric's indication towards business and the business impact pf the ML model used.




















# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import pandas as pd
import numpy as np
import warnings
warnings.filterwarnings('ignore')
from scipy.stats import skew
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
from sklearn.model_selection import RandomizedSearchCV, GridSearchCV
# import lime
# import lime.lime_tabular
from sklearn.tree import DecisionTreeClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import  KNeighborsClassifier
from sklearn.ensemble  import  BaggingClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.ensemble  import  VotingClassifier
from sklearn.neighbors import  KNeighborsClassifier
from sklearn.svm import SVC
import xgboost
from sklearn.ensemble import StackingClassifier


### Dataset Loading

In [None]:
# Load Google Drive
from google.colab import drive
drive.mount('/content/drive')
df=pd.read_csv("/content/drive/MyDrive/Data sets/data_mobile_price_range.csv")

In [None]:
# Load Dataset
df=pd.read_csv("/content/drive/MyDrive/Data sets/data_mobile_price_range.csv")

### Dataset First View

In [None]:
# Dataset First Look
df.head(10)

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
print("The NO of ROW is ", df.shape[0])
print("-"*30)
print("The NO of columns is ", df.shape[1])

### Dataset Information

In [None]:
# Dataset Info
df.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
df.duplicated().sum()

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
df.isnull().sum()

In [None]:
# Visualizing the missing values
plt.figure(figsize=(8, 6))
sns.heatmap(df.isnull(), cmap='viridis', cbar=False)
plt.title('Missing Values Heatmap')
plt.show()


### What did you know about your dataset?

1) The given dataset is clean there is no missing data.
2) Dataset is having 2000 entries, 0 to 1999
Data columns (total 21 columns):


## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
for i in df.columns:
  print(i)

In [None]:
# Dataset Describe
df.describe(include="all")

### Variables Description



1.  Battery_power - Total energy a battery can store in one time measured in mAh

2.  Blue - Has bluetooth or not

3.  Clock_speed - speed at which microprocessor executes instructions


4.   Dual_sim - Has dual sim support or not


5.  Fc - Front Camera mega pixels


5.  Four_g - Has 4G or not

7.  Int_memory - Internal Memory in Gigabytes

8. M_dep - Mobile Depth in cm


9.  Mobile_wt - Weight of mobile phone



10.  N_cores - Number of cores of processor


11.  Pc - Primary Camera mega pixels



12.  Px_height - Pixel Resolution Heigh


13.  Px_width - Pixel Resolution Width


14.   Ram - Random Access Memory in Mega

15.   Touch_screen - Has touch screen or not

16.   Wifi - Has wifi or not
17.   Sc_h - Screen Height of mobile in cm


18.   Sc_w - Screen Width of mobile in cm

19.  Talk_time - longest time that a single battery charge will last when you are

20.   Three_g - Has 3G or not

21.   Price_range - This is the target variable with value of 0(low cost), 1(medium cost),2(high cost) and 3(very high cost).











### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
for column in df.columns:
    unique_values = df[column].unique()
    print(f"Unique values in {column}: {unique_values}")
    print("--"*70)

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.
for column in df.columns:
    plt.figure(figsize=(5, 1))
    sns.boxplot(x=column, data=df)
    plt.title(f'Box Plot for {column}')
    plt.show()

In [None]:
fig, axs = plt.subplots(1,2, figsize=(15,4))
sns.kdeplot(df['px_height'], ax=axs[0],color='red')
sns.kdeplot(df['fc'], ax=axs[1],color='red')
plt.title('KDE Plot')
plt.show()

In [None]:
print("Skewness coefficient of 'px_height' is :",skew(df['px_height']))
print("Skewness coefficient of 'fc' is :",skew(df['fc']))

### What all manipulations have you done and insights you found?

Dataset does not have any missing value and most of the columns except
  'px_height' and 'fc' the distribution of feature is  Right Skewed

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1 Count plot(Univariate)

In [None]:
# Chart - 1 visualization code

#classes
sns.set()
price_plot=df['price_range'].value_counts().plot(kind='bar')
plt.xlabel('price_range')
plt.ylabel('Count')
plt.show()

##### 1. Why did you pick the specific chart?


A count plot is a type of visualization used to show the count of observations in each category of a categorical variable. It's particularly useful for understanding the distribution of categorical data and identifying the most common categories.

##### 2. What is/are the insight(s) found from the chart?

Here we can see that our dependent feature is having 4 type of categories i.e 0 , 1, 2 , 3 each represents price range . and we can also find that the data is balance


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Here we just see the distribution of output feature

#### Chart - 2 Displo(Univariate)

In [None]:
sns.set(rc={'figure.figsize':(5,5)})
ax=sns.displot(df["battery_power"])
plt.show()

##### 1. Why did you pick the specific chart?

A Distplot or distribution plot, depicts the variation in the data distribution.

##### 2. What is/are the insight(s) found from the chart?

Here we can find that the battery_bower is having values like in rance between 500 to 2000 as we found out and max count in between 500 to 750 i.e ~175

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The count of of battery_power is high in betwwen 750 it can sell more

#### Chart - 3 Barplot (Bivariate)

In [None]:
# Chart - 3 visualization code

#anlysis of data by vizualisation
fig,ax=plt.subplots(figsize=(10,5))
sns.barplot(x=df['blue'],y=df['price_range'],ax=ax)


##### 1. Why did you pick the specific chart?

Certainly! A bar plot (or bar chart) is a common type of data visualization that uses rectangular bars to represent data values. Bar plots are useful for comparing the magnitudes of different categories or groups. Here's an overview of how to create and interpret a bar plot using Python's Matplotlib library:

##### 2. What is/are the insight(s) found from the chart?

half the devices have Bluetooth, and half don’t.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

it will impact in positive way to growth

#### Chart - 4 KDE plot and Box plot

In [None]:
# Chart - 4 visualization code
fig, axs = plt.subplots(1,2, figsize=(15,5))
sns.kdeplot(data=df, x='px_width', hue='price_range', ax=axs[0])
sns.boxplot(data=df, x='price_range', y='px_width', ax=axs[1])
plt.show()


##### 1. Why did you pick the specific chart?

A Kernel Density Estimation (KDE) plot is a type of data visualization that represents the probability density function of a continuous random variable. It provides a smoothed representation of the data's distribution, allowing you to understand the underlying pattern without making assumptions about the specific distribution.

##### 2. What is/are the insight(s) found from the chart?

There is not a continuous increase in pixel width as we move from Low cost to Very high cost. Mobiles with 'Medium cost' and 'High cost' has almost equal pixel width. so we can say that it would be a driving factor in deciding price_range

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Here we can see that there is no change in price distribution

#### Chart - 5 Count plot and Pie Chart

In [None]:
# Chart - 5 visualization code
# Plot of binary features against price range
binary_features = [ 'four_g', 'three_g']


for col in binary_features:
  fig, (ax1, ax2) = plt.subplots(ncols = 2, figsize = (12, 6))

  df[col].value_counts().plot.pie (autopct='%1.1f%%', ax = ax1, shadow=True, labeldistance=None)
  ax1.set_title('Distribution by price range')
  ax1.legend(['Support', 'Does not Support'])
  sns.countplot(x = col, hue = 'price_range', data = df, ax = ax2, color = 'red')
  ax2.set_title('Distribution by price range')
  ax2.set_xlabel(col)
  ax2.legend(['Low Cost', 'Medium Cost', 'High Cost', 'Very High Cost'])
  ax2.set_xticklabels(['Does not Support', 'Support'])

##### 1. Why did you pick the specific chart?

A pie chart is a circular data visualization used to represent the proportions of different categories or parts of a whole.

If you need a one-liner explanation of a pie chart's purpose and usage, here it is:

"A pie chart visually displays the relative sizes of different data categories as slices of a circular pie."

##### 2. What is/are the insight(s) found from the chart?

There is 52.1% of mobile support 4g and 47.9% does not and 76.2% support 3g and 23.8% does not . If we produce the 4g and 3g moblile that can help in increase in sell


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Here we can say that the no of the 3g ans 4g mobile need to produce in more no

#### Chart - 6 Count plot

In [None]:
# Chart - 6 visualization code
count_df = df[['clock_speed', 'fc','m_dep','n_cores','pc','sc_h', 'sc_w', 'talk_time']]
plt.figure(figsize=(25,20))
j = 1
for i in count_df.columns:
  plt.subplot(5,3,j)
  sns.countplot(x=df[i])

  j=j+1

##### 1. Why did you pick the specific chart?

Ploting all categorical Feature

##### 2. What is/are the insight(s) found from the chart?

Here we are just ploting the count of an categorical feature and we found that the the clock_speed with 0.5 is having high in No


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

We found that we can count using the count chart

#### Chart - 7 Scatter Plot

In [None]:
# Chart - 7 visualization code
sc_area=df['sc_w']*df['sc_h']
plt.scatter(x=sc_area,y=df['price_range'])
plt.xlabel("Screen Area")
plt.ylabel("Price")


##### 1. Why did you pick the specific chart?

A scatter plot is a type of data visualization that is used to display the relationship between two continuous variables. It is particularly useful for exploring and understanding the correlation or pattern between these variables. Scatter plots consist of points, each representing an individual data point with its respective values on the x and y axes

##### 2. What is/are the insight(s) found from the chart?

Here we are creating the new feature by mutiplying the screen height and screen width i.e screen area

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

We can reduce the dimention of an dataset and can help in an improve the performance

#### Chart - 8 Pie Chart

In [None]:
category_counts = df['n_cores'].value_counts()
c=df['clock_speed'].unique()

# Create a pie chart
plt.figure(figsize=(10, 6))  # Set the figure size

# Colors for the pie chart segments
colors = ['red', 'green', 'orange','blue']

plt.pie(category_counts, labels=category_counts.index, colors=colors, autopct='%1.1f%%', startangle=140)

# Equal aspect ratio ensures that pie is drawn as a circle
plt.axis('equal')

# Add a title
plt.title('Distribution of Cores Feature')
plt.legend(c)
# Display the pie chart
plt.show()

##### 1. Why did you pick the specific chart?


Pie charts are a type of data visualization that display the distribution of categorical data as a circular graph. Each "slice" of the pie represents a different category, and the size of the slice corresponds to the proportion of the data that falls into that category. Pie charts are commonly used for the following purposes

##### 2. What is/are the insight(s) found from the chart?

The each core is having the aprox equal % in dataset

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

No of cores does not effect on price rance

#### Chart - 9

In [None]:
# Chart - 9 visualization code
category_counts = df['clock_speed'].value_counts()
cp=df['clock_speed'].unique()

plt.figure(figsize=(10, 6))
colors = [
    '#1f77b4', '#ff7f0e', '#2ca02c', '#d62728', '#9467bd', '#8c564b',
    '#e377c2', '#7f7f7f', '#bcbd22', '#17becf', '#636efa', '#00cc96',
    '#19d3f3', '#f74e8e', '#ffc107', '#6d904f', '#ef553b', '#8c92ac',
    '#c5b0d5', '#c49c94', '#fabc09', '#ffcb7f', '#7f7f7f', '#e7969c',
    '#ce6dbd', '#9cdede'
]
plt.pie(category_counts, labels=category_counts.index, colors=colors, autopct='%1.1f%%')
plt.axis('equal')
plt.title('Distribution of Clock speed Feature')
plt.legend(cp)
plt.show()




##### 1. Why did you pick the specific chart?

It shows as % of the each clock speed

##### 2. What is/are the insight(s) found from the chart?

o.5 cover the maximum no of clock dpeed

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

there are 26 different type of clock speed

#### Chart - 10

In [None]:
# Chart - 10 visualization code
grouped = df.groupby('clock_speed')['ram'].mean()
grouped1 = df.groupby('price_range')['ram'].mean()
fig, axs = plt.subplots(1,2, figsize=(15,5))
sns.lineplot(grouped,ax=axs[0], marker='o',color='b')
plt.title("Line Plot")
sns.lineplot(grouped1,ax=axs[1])
plt.title("Line Plot")
plt.show()

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 11

In [None]:
# Chart - 11 visualization code
grouped = df.groupby('price_range')['talk_time'].mean()


sns.lineplot(grouped, marker='o',color='b')
plt.title("Line Plot")

plt.show()


##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 12

In [None]:
# Chart - 12 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 13

In [None]:
# Chart - 13 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 14 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code
correlation_matrix = df.corr()
plt.figure(figsize=(20, 6))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', center=0)
plt.title('Correlation Heatmap')
plt.show()

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

#### Chart - 15 - Pair Plot

In [None]:
# Pair Plot visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

## ***5. Hypothesis Testing***

### Based on your chart experiments, define three hypothetical statements from the dataset. In the next three questions, perform hypothesis testing to obtain final conclusion about the statements through your code and statistical testing.

### Hypothetical Statement - 1

#### 1. State Your research hypothesis as a null hypothesis and alternate hypothesis.

Based on above observation on "Three_g" network will affect the price range .


1.  **Null hypothesis (H0):** There will be no change in price range
2.  **Alternative hypothesis (Ha):** There will be change in price range



#### 2. Perform an appropriate statistical test.

In [None]:
# Perform Statistical Test to obtain P-Value
from scipy.stats import f_oneway

# Creating a seperate Series for StoreType b and other StoreType Sales.

b_sales = df[df["three_g"] == 0]["price_range"]
other_sales = df[df["three_g"] != 0]["price_range"]

# Performing the One-Way Anova Test:

f_value, p_value = f_oneway(b_sales, other_sales)

# defining significance level(alpha = 0.05):

alpha = 0.05

print(f'f_value is {f_value}')
print(f'p_value is {p_value}')

if p_value <= alpha:
  print("We reject the Null Hypothesis")
else:
  print("We fail to reject the Null Hypothesis")


##### Which statistical test have you done to obtain P-Value?

As we can see p value is 0.29123661631664455  and f_value is  1.1144854599995042 so we are accepting the Null Hypothesis

Test Applied - One-Way Anova Test.

##### Why did you choose the specific statistical test?

It will best fit the problem of Hypothesis

### Hypothetical Statement - 2

#### 1. State Your research hypothesis as a null hypothesis and alternate hypothesis.

Based on above observation on "" network will affect the price range .



1.  **Null hypothesis (H0):**There will be no change in price range
2.   **Alternative hypothesis (Ha):**There will be change in price range




#### 2. Perform an appropriate statistical test.

In [None]:
# Perform Statistical Test to obtain P-Value
# Perform Statistical Test to obtain P-Value
from scipy.stats import f_oneway

# Creating a seperate Series for StoreType b and other StoreType Sales.

b_sales = df[df["four_g"] == 0]["price_range"]
other_sales = df[df["four_g"] != 0]["price_range"]

# Performing the One-Way Anova Test:

f_value, p_value = f_oneway(b_sales, other_sales)

# defining significance level(alpha = 0.05):

alpha = 0.05

print(f'f_value is {f_value}')
print(f'p_value is {p_value}')

if p_value <= alpha:
  print("We reject the Null Hypothesis")
else:
  print("We fail to reject the Null Hypothesis")

##### Which statistical test have you done to obtain P-Value?

Test Applied - One-Way Anova Test

##### Why did you choose the specific statistical test?

As we can see p value is 0.5091036529767595  and f_value is 0.43606566050713613
   so we are accepting the Null Hypothesis

### Hypothetical Statement - 3

#### 1. State Your research hypothesis as a null hypothesis and alternate hypothesis.

Answer Here.

#### 2. Perform an appropriate statistical test.

In [None]:
# Perform Statistical Test to obtain P-Value

##### Which statistical test have you done to obtain P-Value?

Answer Here.

##### Why did you choose the specific statistical test?

Answer Here.

## ***6. Feature Engineering & Data Pre-processing***

### 1. Handling Missing Values

In [None]:
# Handling Missing Values & Missing Value Imputation

#### What all missing value imputation techniques have you used and why did you use those techniques?

Answer Here.

### 2. Handling Outliers

In [None]:
# Handling Outliers & Outlier treatments

##### What all outlier treatment techniques have you used and why did you use those techniques?

Answer Here.

### 3. Categorical Encoding

In [None]:
# Encode your categorical columns

#### What all categorical encoding techniques have you used & why did you use those techniques?

Answer Here.

### 4. Textual Data Preprocessing
(It's mandatory for textual dataset i.e., NLP, Sentiment Analysis, Text Clustering etc.)

#### 1. Expand Contraction

In [None]:
# Expand Contraction

#### 2. Lower Casing

In [None]:
# Lower Casing

#### 3. Removing Punctuations

In [None]:
# Remove Punctuations

#### 4. Removing URLs & Removing words and digits contain digits.

In [None]:
# Remove URLs & Remove words and digits contain digits

#### 5. Removing Stopwords & Removing White spaces

In [None]:
# Remove Stopwords

In [None]:
# Remove White spaces

#### 6. Rephrase Text

In [None]:
# Rephrase Text

#### 7. Tokenization

In [None]:
# Tokenization

#### 8. Text Normalization

In [None]:
# Normalizing Text (i.e., Stemming, Lemmatization etc.)

##### Which text normalization technique have you used and why?

Answer Here.

#### 9. Part of speech tagging

In [None]:
# POS Taging

#### 10. Text Vectorization

In [None]:
# Vectorizing Text

##### Which text vectorization technique have you used and why?

Answer Here.

### 4. Feature Manipulation & Selection

#### 1. Feature Manipulation

In [None]:
# Manipulate Features to minimize feature correlation and create new features

#### 2. Feature Selection

In [None]:
# Select your features wisely to avoid overfitting

##### What all feature selection methods have you used  and why?

Answer Here.

##### Which all features you found important and why?

Answer Here.

### 5. Data Transformation

#### Do you think that your data needs to be transformed? If yes, which transformation have you used. Explain Why?

In [None]:
# Transform Your data

### 6. Data Scaling

In [None]:
# Scaling your data

##### Which method have you used to scale you data and why?

### 7. Dimesionality Reduction

##### Do you think that dimensionality reduction is needed? Explain Why?

Answer Here.

In [None]:
# DImensionality Reduction (If needed)

##### Which dimensionality reduction technique have you used and why? (If dimensionality reduction done on dataset.)

Answer Here.

### 8. Data Splitting

In [None]:
# Split your data to train and test. Choose Splitting ratio wisely.

##### What data splitting ratio have you used and why?

Answer Here.

### 9. Handling Imbalanced Dataset

##### Do you think the dataset is imbalanced? Explain Why.

Answer Here.

In [None]:
# Handling Imbalanced Dataset (If needed)

##### What technique did you use to handle the imbalance dataset and why? (If needed to be balanced)

Answer Here.

## ***7. ML Model Implementation***

### ML Model - 1

In [None]:
# ML Model - 1 Implementation

# Fit the Algorithm

# Predict on the model

#### 1. Explain the ML Model used and it's performance using Evaluation metric Score Chart.

In [None]:
# Visualizing evaluation Metric Score chart

#### 2. Cross- Validation & Hyperparameter Tuning

In [None]:
# ML Model - 1 Implementation with hyperparameter optimization techniques (i.e., GridSearch CV, RandomSearch CV, Bayesian Optimization etc.)

# Fit the Algorithm

# Predict on the model

##### Which hyperparameter optimization technique have you used and why?

Answer Here.

##### Have you seen any improvement? Note down the improvement with updates Evaluation metric Score Chart.

Answer Here.

### ML Model - 2

#### 1. Explain the ML Model used and it's performance using Evaluation metric Score Chart.

In [None]:
# Visualizing evaluation Metric Score chart

#### 2. Cross- Validation & Hyperparameter Tuning

In [None]:
# ML Model - 1 Implementation with hyperparameter optimization techniques (i.e., GridSearch CV, RandomSearch CV, Bayesian Optimization etc.)

# Fit the Algorithm

# Predict on the model

##### Which hyperparameter optimization technique have you used and why?

Answer Here.

##### Have you seen any improvement? Note down the improvement with updates Evaluation metric Score Chart.

Answer Here.

#### 3. Explain each evaluation metric's indication towards business and the business impact pf the ML model used.

Answer Here.

### ML Model - 3

In [None]:
# ML Model - 3 Implementation

# Fit the Algorithm

# Predict on the model

#### 1. Explain the ML Model used and it's performance using Evaluation metric Score Chart.

In [None]:
# Visualizing evaluation Metric Score chart

#### 2. Cross- Validation & Hyperparameter Tuning

In [None]:
# ML Model - 3 Implementation with hyperparameter optimization techniques (i.e., GridSearch CV, RandomSearch CV, Bayesian Optimization etc.)

# Fit the Algorithm

# Predict on the model

##### Which hyperparameter optimization technique have you used and why?

Answer Here.

##### Have you seen any improvement? Note down the improvement with updates Evaluation metric Score Chart.

Answer Here.

### 1. Which Evaluation metrics did you consider for a positive business impact and why?

Answer Here.

### 2. Which ML model did you choose from the above created models as your final prediction model and why?

Answer Here.

### 3. Explain the model which you have used and the feature importance using any model explainability tool?

Answer Here.

## ***8.*** ***Future Work (Optional)***

### 1. Save the best performing ml model in a pickle file or joblib file format for deployment process.


In [None]:
# Save the File

### 2. Again Load the saved model file and try to predict unseen data for a sanity check.


In [None]:
# Load the File and predict unseen data.

### ***Congrats! Your model is successfully created and ready for deployment on a live server for a real user interaction !!!***

# **Conclusion**

Write the conclusion here.

### ***Hurrah! You have successfully completed your Machine Learning Capstone Project !!!***