# Importing Necessary Libraries

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

Reading the csv file 

In [None]:
df_train = pd.read_csv("/content/drive/MyDrive/mobile price prediction/train.csv")

df_train.head()

Unnamed: 0,battery_power,blue,clock_speed,dual_sim,fc,four_g,int_memory,m_dep,mobile_wt,n_cores,...,px_height,px_width,ram,sc_h,sc_w,talk_time,three_g,touch_screen,wifi,price_range
0,842,0,2.2,0,1,0,7,0.6,188,2,...,20,756,2549,9,7,19,0,0,1,1
1,1021,1,0.5,1,0,1,53,0.7,136,3,...,905,1988,2631,17,3,7,1,1,0,2
2,563,1,0.5,1,2,1,41,0.9,145,5,...,1263,1716,2603,11,2,9,1,1,0,2
3,615,1,2.5,0,0,0,10,0.8,131,6,...,1216,1786,2769,16,8,11,1,0,0,2
4,1821,1,1.2,0,13,1,44,0.6,141,2,...,1208,1212,1411,8,2,15,1,1,0,1





*   Checking how many rows and Columns are there in the data set.
*   There are **2000** rows and **21** columns in the training dataset.



In [None]:
df_train.shape

(2000, 21)

* We are here grouping the dataset based on the price_range which is our target 
  or the answer we want from the prediction of the test dataset. 
* Lets Check what the values 0,1,2,3 hold in the price_range column of the    dataset.
 

*  0 - low cost
*  1 - medium cost
*  2 - high cost
*  3 - very high cost




In [None]:
df_train.groupby("price_range")["price_range"].agg("count").head(10)

price_range
0    500
1    500
2    500
3    500
Name: price_range, dtype: int64

# Data Cleaning



*   Here We are checking the null values of the dataset.
*   By checking the values in the dataset we foung that there is'nt any null values in the data set 



In [None]:
df_train.isnull().sum()

battery_power    0
blue             0
clock_speed      0
dual_sim         0
fc               0
four_g           0
int_memory       0
m_dep            0
mobile_wt        0
n_cores          0
pc               0
px_height        0
px_width         0
ram              0
sc_h             0
sc_w             0
talk_time        0
three_g          0
touch_screen     0
wifi             0
price_range      0
dtype: int64

* Here we are reading the test dataset with help  of pandas read_csv.
* This is the dataset which we have to test by applying the values of the test dataset in the model we will get to know the predicted price of the mobile.

In [None]:
df_test = pd.read_csv("/content/drive/MyDrive/mobile price prediction/test.csv")
df_test.head(3)

Unnamed: 0,id,battery_power,blue,clock_speed,dual_sim,fc,four_g,int_memory,m_dep,mobile_wt,...,pc,px_height,px_width,ram,sc_h,sc_w,talk_time,three_g,touch_screen,wifi
0,1,1043,1,1.8,1,14,0,5,0.1,193,...,16,226,1412,3476,12,7,2,0,1,0
1,2,841,1,0.5,1,4,1,61,0.8,191,...,12,746,857,3895,6,0,7,1,0,0
2,3,1807,1,2.8,0,1,0,27,0.9,186,...,4,1270,1366,2396,17,10,10,0,1,1


* Now lets see the shape of the dataset or size of the dataset.
* Here we can see that there are **1000** rows and **21** columns

In [None]:
df_test.shape

(1000, 21)

* Checking if there  is any null values in the dataset,
but there isn't.

In [None]:
df_test.isnull().sum()

id               0
battery_power    0
blue             0
clock_speed      0
dual_sim         0
fc               0
four_g           0
int_memory       0
m_dep            0
mobile_wt        0
n_cores          0
pc               0
px_height        0
px_width         0
ram              0
sc_h             0
sc_w             0
talk_time        0
three_g          0
touch_screen     0
wifi             0
dtype: int64

# Machine Learning Model

### Inputs & Target
* Lets make the inputs and target for the model for the prediction of the mobile price.
* We have took all the columns as input except price_range because it is our target variable means it is the value we want to predict through the model.

In [None]:
inputs = df_train.drop(["price_range"],axis=1)
target = df_train["price_range"]

### Importing the Machine Learning models 
* Here we are importing the machine learning models for checking the price prediction of the mobile.
* I've imported 6 machine learning models, you can import more if you want to check more model and find the best suited model for dataset.

In [None]:
from sklearn import svm
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.naive_bayes import GaussianNB
from sklearn.naive_bayes import MultinomialNB
from sklearn.tree import DecisionTreeClassifier

* Here I've created the json file having all the machine learning models which were imported with the parameters we want to check for the prediction of price.

In [None]:
model_params = {
    'svm': {
        'model': svm.SVC(gamma='auto'),
        'params' : {
            'C': [1,10,20],
            'kernel': ['rbf','linear']
        }  
    },
    'random_forest': {
        'model': RandomForestClassifier(),
        'params' : {
            'n_estimators': [1,5,10]
        }
    },
    'logistic_regression' : {
        'model': LogisticRegression(solver='liblinear',multi_class='auto'),
        'params': {
            'C': [1,5,10]
        }
    },
    'naive_bayes_gaussian': {
        'model': GaussianNB(),
        'params': {}
    },
    'naive_bayes_multinomial': {
        'model': MultinomialNB(),
        'params': {}
    },
    'decision_tree': {
        'model': DecisionTreeClassifier(),
        'params': {
            'criterion': ['gini','entropy'],
            
        }
    }     
}

* Here We are going to check our models which one is best for our dataset with help of **GridSearchCV**

* By running the code we get the following output
Based on the given answers the svm model is best for this datqset bescause its score is **97.30%**,  which will be good for our dataset
---



*   **model**	       -           **best_score**	- **best_params**
0. 	**svm**	          -          **0.9730**	  -   **{'C': 1, 'kernel': 'linear'}**
1.	random_forest	  -        0.8135	  -   {'n_estimators': 10}
2.	logistic_regression	 -   0.8185	  -   {'C': 10}
3.	naive_bayes_gaussian	-  0.8090	  -   {}
4.	naive_bayes_multinomial-	0.5195	-     {}
5.	decision_tree	  -        0.8445	  -  {'criterion': 'entropy'}

Note- If you are runnig the following code of gridsearch cv please wait for some time because it can take some time for the processing may be approx **7-8 minutes**

In [None]:
from sklearn.model_selection import GridSearchCV
import pandas as pd
scores = []

for model_name, mp in model_params.items():
    clf =  GridSearchCV(mp['model'], mp['params'], cv=5, return_train_score=False)
    clf.fit(inputs, target)
    scores.append({
        'model': model_name,
        'best_score': clf.best_score_,
        'best_params': clf.best_params_
    })
    
df = pd.DataFrame(scores,columns=['model','best_score','best_params'])
df

Unnamed: 0,model,best_score,best_params
0,svm,0.973,"{'C': 1, 'kernel': 'linear'}"
1,random_forest,0.8135,{'n_estimators': 10}
2,logistic_regression,0.8185,{'C': 10}
3,naive_bayes_gaussian,0.809,{}
4,naive_bayes_multinomial,0.5195,{}
5,decision_tree,0.8445,{'criterion': 'entropy'}


## Train Test Split

Now we are going to use the train test split method for the divinding our dataset for the training and testing of model.

In [None]:
from sklearn.model_selection import train_test_split

test_size is 0.2 means that the model will be divided 80% for training and 20% for testing.

In [None]:
X_train, X_test, y_train, y_test = train_test_split(inputs , target , test_size=0.2)


In [None]:
len(X_train)

1600

In [None]:
len(X_test)


400

In [None]:
from sklearn.svm import SVC
model = SVC(C = 1,kernel='linear')

#### Fitting of the model 

In [None]:
model.fit(X_train, y_train)

###Score of the Model

By applying the parameters which were best suited of the model for the price prediction we are getting the score of the model 

In [None]:
model.score(X_test, y_test)

0.9625

## Prediction

#### Now lets Predict the price of the mobile first by taking the value from the training dataset

*  battery_power - 842
*   blue - 0
*  clock_speed - 2.2
* dual_sim - 0
*  fc - 1
*  four_g - 0
*  int_memory - 7
*  m_dep- 0.6
*  mobile_wt - 188
*  n_cores - 2
* pc- 2
* px_height - 20
* px_width - 756
* ram - 2549
* sc_h - 9
* sc_w - 7
* talk_time - 19
* three_g - 0
* touch_screen - 0
* wifi - 1
* whose price range is 1 according to dataset 




---
* As we can see that the prediction is correct we also got the answer 1.





In [None]:
model.predict([[842,0,2.2,0,1,0,7,0.6,188,2,2,20,756,2549,9,7,19,0,0,1]])



array([1])

In [None]:
model.predict([[1589,1,0.6,1,0,1,58,0.9,85,7,7,319,1206,3464,19,10,6,1,1,1]])



array([3])

#### Testing dataset 



*   Now lets check the testing data set which was provided for checking the machine learning model
*   The Following prediction is from the testing dataset, now the owner can check the price of the mobile based on the features given in the dataset 



In [None]:
model.predict([[1043,1,1.8,1,14,0,5,0.1,193,3,16,226,1412,3476,12,7,2,0,1,0]])



array([3])

In [None]:
model.predict([[1972,0,2.1,0,2,0,48,0.6,188,3,14,480,748,366,5,3,17,1,1,0]])



array([0])