## Please do vote up.

- Linkdin: www.linkedin.com/in/harpreet-singh-8218761a9

## Import library

In [None]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt


## Load data

In [None]:
df_train= pd.read_csv('../input/mobile-price-classification/train.csv')

In [None]:
df_test = pd.read_csv('../input/mobile-price-classification/test.csv')

The data set contains the information of varioous mobile phones prices based on different features.

***In this notebook-***

- The study will lead us to find out some relation between features of mobiles phones (eg- RAM, Internal memory, Dual sim etc) and its selling price.
- We will compare 2 Machine Learning models i.e. (Logistic Regression and Linear Regression) and predict the accuracy of both the models.



In this problem we do not have to predict actual price but a price range i.e.(0 (low cost), 1 (medium cost), 2 (high cost) and 3 (very high cost)) indicating how high the price is.

***USE***

- This kind of prediction will help companies estimate price of mobiles to give tough competion to other mobile manufacturer
- It will be usefull for Consumers to verify that they are paying best price for a mobile.

## Data exploration and cleaning

***Checking the data by using head() function***


In [None]:
df_train.head()

***Quick view on some statical details***

In [None]:
df_train.describe().T

***Getting columns names***

In [None]:
df_train.columns
#df_test.columns

### In this data:

battery_power:Total energy a battery can store in one time measured in mAh

blue:Has bluetooth or not

clock_speed:speed at which microprocessor executes instructions

dual_sim:Has dual sim support or not

fc:Front Camera mega pixels

four_g:Has 4G or not

int_memory:Internal Memory in Gigabytes

m_dep:Mobile Depth in cm

mobile_wt:Weight of mobile phone

n_cores:Number of cores of processor

pc:Primary Camera mega pixels


px_height:Pixel Resolution Height

px_width:Pixel Resolution Width

ram:Random Access Memory in Megabytes

sc_h:Screen Height of mobile in cm

sc_w:Screen Width of mobile in cm

talk_time:longest time that a single battery charge will last when you are

three_g:Has 3G or not

touch_screen:Has touch screen or not

wifi:Has wifi or not

price_range: This is the target variable with value of 0 (low cost), 1 (medium cost), 2 (high cost) and 3 (very high cost)


***Check the shape of data***

In [None]:
df_train.shape
#df_test.shape   (had same number of columns (21) and 1000 rows)

***Getting information of data***

In [None]:
df_train.info()
#df_test.info()

***Are there any null values***

In [None]:
df_train.isnull().sum()
#df_test.isnull().sum()

Luckly both the data's didn't contain any of the null values.

***Check to any duplicate rows present in data***

In [None]:
df_train.duplicated(keep=False).any()

## Data visualization

### Pie chart

***Mobiles supports 3G***

In [None]:
labels= ['Supported 3G', ' Not-Supported']
values=df_train['three_g'].value_counts().values

In [None]:
plt.axis('equal')
plt.pie(values, labels=labels,autopct='%0.00f%%', explode=[0.1,0], shadow=True)
plt.show

76% mobile supports 3G network

***Mobiles supports 4G***

In [None]:
labels= ['Supported 4G', ' Not-Supported']
values=df_train['four_g'].value_counts().values

In [None]:
plt.axis('equal')
plt.pie(values, labels=labels,autopct='%0.00f%%', explode=[0.1,0], shadow=True)
plt.show

Only 52% mobiles supports 4G

### Bar graph visualization

***Dual sim slot***

In [None]:
sns.countplot(df_train['dual_sim'])

Dual sim mobiles are slightly more than the single sim


***Front camera***

In [None]:
sns.distplot(df_train['fc'], color='red', kde=False)

Majority of mobiles having front camera megapixcel in between 3 to 7

***Battery power***

In [None]:
sns.distplot(df_train['battery_power'],color='maroon', kde=False)

Low power batteries are slightly more in count

***Weight of mobile***

In [None]:
sns.distplot(df_train['mobile_wt'],color='teal', kde=False)

weight of mobiles spread almost evenly in data

***Screen width of mobile***

In [None]:
sns.distplot(df_train['sc_w'],color='chocolate', kde=False)

Width of mobiles are mailnly in range of 3 to 7

### Correlation with heatmap

In [None]:
plt.figure(figsize=(12, 9))
corr=df_train.corr()
sns.heatmap(corr[(corr<=0.5) | (corr>=-0.5)],vmin=-1, vmax=1, annot=True)

Some of the features are surely correlated with each other

### Box plot for price range

***Dual sim VS Price range***

In [None]:
sns.boxplot(df_train['dual_sim'],df_train['price_range'])

Price range of dual sim mobiles are definately more than single sim

***4G VS Price range***

In [None]:
sns.boxplot(df_train['four_g'],df_train['price_range'])

Price of 4G mobiles are considerably higher than non 4G mbiles

In [None]:
sns.boxplot(df_train['three_g'],df_train['price_range'])

Price of 3G mobiles are considerably higher than non 3G mbiles

In [None]:
sns.jointplot(x='price_range', y='ram', data=df_train, kind='kde')

Price gets effected as the size of ram increases

In [None]:
sns.boxplot(df_train['dual_sim'],df_train['price_range'])

Price range of dual sim mobiles are greater than single sim

In [None]:
sns.boxplot(df_train['wifi'],df_train['price_range'])

mobiles having wifi have more price range

In [None]:
sns.boxplot(df_train['touch_screen'],df_train['price_range'])

Price Range of touch screen mobiles are low. Quite strange considering all the 4G,3G and Wifi phones are in higher price range

## Test and Train split

In [None]:
#Since all the features are in different range, preprocessing scalar is applied
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split

scaler = StandardScaler()
X = df_train.drop('price_range',axis=1)
y = df_train['price_range']

scaler.fit(X)
X_transformed = scaler.transform(X)

X_train,X_test,y_train,y_test = train_test_split(X_transformed,y,test_size=0.3, random_state=31)

## Logistic regression

In [None]:
from sklearn.linear_model import LogisticRegression
model= LogisticRegression()

In [None]:
model.fit(X_train,y_train)

***Accuracy of Model***

In [None]:
model.score(X_test,y_test)

***Logistic Regression model perform a good accuracy of 95%***

## Confusion matrix
***Checking where the logestic regression model fails***

In [None]:
from sklearn.metrics import confusion_matrix

In [None]:
y_pred= model.predict(X_test)

In [None]:
cm= confusion_matrix(y_test,y_pred)

In [None]:
plt.figure(figsize= (8,5))
sns.heatmap(cm, annot=True)
plt.xlabel= 'predict'
plt.ylabel='truth'

## Linear Regression

In [None]:
from sklearn.linear_model import LinearRegression
lr=LinearRegression()
lr.fit(X_train,y_train)

***Accuracy of Model***

In [None]:
lr.score(X_test,y_test)

***Linear regression model has a accuracy of 91% which is also good but lower than logistic regression***