# Notebook Context - Mobile Price Classification
This project is for a new mobile company. The idea is to give tough fight to big companies like Apple,Samsung etc.

We will estimate price of mobiles of this new company. In this competitive mobile phone market you cannot simply assume things. To solve this problem sales data collected of mobile phones are collected from various companies.

So we can find relation between features of a mobile phone(eg:- RAM,Internal Memory etc) and its selling price.

In this problem we will not predict actual price but a price range indicating how high the price is.

## Importing Libraries

In [151]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

## Reading Datasets

In [159]:
dbtrain = pd.read_csv('train.csv')
dbtest = pd.read_csv('test.csv')

In [160]:
#Dataframe Visualization
dbtrain.head()

Unnamed: 0,battery_power,blue,clock_speed,dual_sim,fc,four_g,int_memory,m_dep,mobile_wt,n_cores,...,px_height,px_width,ram,sc_h,sc_w,talk_time,three_g,touch_screen,wifi,price_range
0,842,0,2.2,0,1,0,7,0.6,188,2,...,20,756,2549,9,7,19,0,0,1,1
1,1021,1,0.5,1,0,1,53,0.7,136,3,...,905,1988,2631,17,3,7,1,1,0,2
2,563,1,0.5,1,2,1,41,0.9,145,5,...,1263,1716,2603,11,2,9,1,1,0,2
3,615,1,2.5,0,0,0,10,0.8,131,6,...,1216,1786,2769,16,8,11,1,0,0,2
4,1821,1,1.2,0,13,1,44,0.6,141,2,...,1208,1212,1411,8,2,15,1,1,0,1


In [152]:
#Dataframe Visualization
dbtest.head()

Unnamed: 0,id,battery_power,blue,clock_speed,dual_sim,fc,four_g,int_memory,m_dep,mobile_wt,...,pc,px_height,px_width,ram,sc_h,sc_w,talk_time,three_g,touch_screen,wifi
0,1,1043,1,1.8,1,14,0,5,0.1,193,...,16,226,1412,3476,12,7,2,0,1,0
1,2,841,1,0.5,1,4,1,61,0.8,191,...,12,746,857,3895,6,0,7,1,0,0
2,3,1807,1,2.8,0,1,0,27,0.9,186,...,4,1270,1366,2396,17,10,10,0,1,1
3,4,1546,0,0.5,1,18,1,25,0.5,96,...,20,295,1752,3893,10,0,7,1,1,0
4,5,1434,0,1.4,0,11,1,49,0.5,108,...,18,749,810,1773,15,8,7,1,0,1


## Searching for missing values

In [154]:
dbtrain.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2000 entries, 0 to 1999
Data columns (total 21 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   battery_power  2000 non-null   int64  
 1   blue           2000 non-null   int64  
 2   clock_speed    2000 non-null   float64
 3   dual_sim       2000 non-null   int64  
 4   fc             2000 non-null   int64  
 5   four_g         2000 non-null   int64  
 6   int_memory     2000 non-null   int64  
 7   m_dep          2000 non-null   float64
 8   mobile_wt      2000 non-null   int64  
 9   n_cores        2000 non-null   int64  
 10  pc             2000 non-null   int64  
 11  px_height      2000 non-null   int64  
 12  px_width       2000 non-null   int64  
 13  ram            2000 non-null   int64  
 14  sc_h           2000 non-null   int64  
 15  sc_w           2000 non-null   int64  
 16  talk_time      2000 non-null   int64  
 17  three_g        2000 non-null   int64  
 18  touch_sc

## Separating data for train and test

In [26]:
X = dbtrain.drop('price_range', axis=1)
y = dbtrain['price_range']

In [119]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=1)

## The idea of Support Vector Classification with Linear Kernel is simple: We are using a small dataset and the algorithm creates a line or a hyperplane which separates the data into classes.

In [155]:
from sklearn.svm import SVC
svc = SVC(kernel='linear')
svc.fit(X_train, y_train)
prediction = svc.predict(X_test)

## Model evaluation and metrics

In [156]:
from sklearn.metrics import classification_report, confusion_matrix
print(classification_report(y_test, prediction))
print('------------------------------------------------')
print('Confusion Matrix: ','\n',confusion_matrix(y_test, prediction))
print('------------------------------------------------')
print("Model Score train data: ",svc.score(X_train,y_train))
print("Model Score test data: ",svc.score(X_test,y_test))

              precision    recall  f1-score   support

           0       0.99      0.99      0.99       148
           1       0.95      0.97      0.96       169
           2       0.97      0.93      0.95       182
           3       0.96      0.99      0.98       161

    accuracy                           0.97       660
   macro avg       0.97      0.97      0.97       660
weighted avg       0.97      0.97      0.97       660

------------------------------------------------
Confusion Matrix:  
 [[146   2   0   0]
 [  2 164   3   0]
 [  0   7 169   6]
 [  0   0   2 159]]
------------------------------------------------
Model Score train data:  0.994776119402985
Model Score test data:  0.9666666666666667


## Predicting values based in test data set

In [157]:
predicttestdata = svc.predict(dbtest.drop('id', axis=1))

## Building a new dataset with predicted test data, included price ranges.

In [158]:
dbtest['predicted_price_range'] = predicttestdata
FinalResult = dbtest.drop('id', axis=1)
FinalResult

Unnamed: 0,battery_power,blue,clock_speed,dual_sim,fc,four_g,int_memory,m_dep,mobile_wt,n_cores,...,px_height,px_width,ram,sc_h,sc_w,talk_time,three_g,touch_screen,wifi,predicted_price_range
0,1043,1,1.8,1,14,0,5,0.1,193,3,...,226,1412,3476,12,7,2,0,1,0,3
1,841,1,0.5,1,4,1,61,0.8,191,5,...,746,857,3895,6,0,7,1,0,0,3
2,1807,1,2.8,0,1,0,27,0.9,186,3,...,1270,1366,2396,17,10,10,0,1,1,2
3,1546,0,0.5,1,18,1,25,0.5,96,8,...,295,1752,3893,10,0,7,1,1,0,3
4,1434,0,1.4,0,11,1,49,0.5,108,6,...,749,810,1773,15,8,7,1,0,1,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
995,1700,1,1.9,0,0,1,54,0.5,170,7,...,644,913,2121,14,8,15,1,1,0,2
996,609,0,1.8,1,0,0,13,0.9,186,4,...,1152,1632,1933,8,1,19,0,1,1,1
997,1185,0,1.4,0,1,1,8,0.5,80,1,...,477,825,1223,5,0,14,1,0,0,0
998,1533,1,0.5,1,0,0,50,0.4,171,2,...,38,832,2509,15,11,6,0,1,0,2


## Exporting data to csv format.

In [150]:
FinalResult.to_csv('result_mobile_price_classification.csv')