# Mobile Price Prediction

In this project we will predict possible price of mobile phones on the base of phone specification. Mobile industry is widely expanding day by day. Lots of new mobile models coming everyday with various specification. Sometimes its hard to compare mobiles and decide which one is good and worth to buy.


# Dataset Introduction

**Columns**
* id: ID
* battery_power: Battry power in mAH (Total energy a battery can store in one time measured)
* blue: If mobile has bluetooth or not
* clock_speed: microprocessor speed to executes instructions
* dual_sim: Dual sim or not
* fc:Front Camera mega pixels
* four_g:Has 4G or not
* int_memory: Internal Memory in Gigabytes
* m_dep: Mobile Depth in cm
* mobile_wt: Weight
* n_cores: Number of cores of processor
* pc:Primary Camera mega pixels
* px_height: Pixel Resolution Height
* px_width: Pixel Resolution Width
* ram: RAM in Megabytes
* sc_h: Screen Height of mobile in cm
* sc_w: Screen Width of mobile in cm
* talk_time: Aingle battery charge's longest time
* three_g: Has 3G or not
* touch_screen: Touch screen or not
* wifi: Has wifi or not


In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Charts
import matplotlib.pyplot as plt
import seaborn as sns

import os

# Import datase and initial dataset review

In [None]:
# Import Datasets

# Train DF
df_train = pd.read_csv('../input/mobile-price-classification/train.csv')

# Test DF
df_test = pd.read_csv('../input/mobile-price-classification/test.csv')

In [None]:
df_train.head()

In [None]:
df_train.info()

df_test.info()

We have 2000 records in train dataset and 1000 records in test dataset

# Missing Values Corrections

In [None]:
# Check missing value from dataset
plt.figure(figsize=(18,18))
plt.subplot(221)
sns.heatmap(data=df_train.isnull())
plt.subplot(222)
sns.heatmap(data=df_test.isnull())

Both datasets are pretty good. There is not missing values in both datasets

# Data Visualization

In [None]:
# Let's see howmany price ranges we have in train dataset

plt.figure(figsize=(10,4))
sns.set_style('whitegrid')
sns.set(font_scale=1)

sns.countplot(x='price_range', data=df_train)
plt.xlabel('Price Ranges')
plt.ylabel('Number of devices')
plt.title('Price Range wise visualization')
plt.show()

In [None]:
# Count 4G devices
df_train.four_g.value_counts()

1043 devices has 4G and 957 devices does not have 4G

In [None]:
# Percentage of mobiles with 4G feature

labels = ["4G Supported",'Not supported']
values = df_train['four_g'].value_counts().values
plt.pie(values, labels=labels, autopct='%1.1f%%',shadow=True,startangle=90)

In [None]:
# Visulize price range on internal memory and ram which is top most parameters when anyone buy mobile phone

g = sns.FacetGrid(df_train, col="price_range", hue="price_range")
g.map(sns.scatterplot, "int_memory", "ram")
g.set_axis_labels("Internal Memory (GB)", "Ram (MB)")

Here we can see price range 0 has lower configuration. Configuration is increasing when price range is increasing

In [None]:
# Visulize 

plt.figure(figsize=(10,4))
sns.set_style('whitegrid')
sns.set(font_scale=1)

sns.scatterplot(data=df_train, x='px_width', y='px_height')
plt.xlabel('Width')
plt.ylabel('Height')
plt.title('Mobile Size')
plt.show()

Phone size data is consistant. As height is increasing, width is increasing which is ideal case

In [None]:
# Visulize Screen size (Height x Width) vs Price Range

plt.figure(figsize=(10,4))
sns.set_style('whitegrid')
sns.set(font_scale=1)
g = sns.jointplot(data=df_train, x="sc_w", y="sc_h", hue="price_range")
plt.show()

In [None]:
# Mobile Weght vs Price range

sns.jointplot(x='mobile_wt', y='price_range',data=df_train, kind='kde')

# Data Spliting
As we have only feature columns in test dataset. We will split train dataset and evaluate model in train dataset only

In [None]:
train_x = df_train.drop('price_range', axis=1)
train_y = df_train.price_range

In [None]:
print("Freature shape: ", train_x.shape, " | Labels", train_y.shape)

In [None]:
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(train_x, train_y, test_size=0.35, random_state=5)

# Model Evaluation

In [None]:
# Importing required libraries
from sklearn.model_selection import GridSearchCV

from sklearn.linear_model import LinearRegression, LogisticRegression, Lasso
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier

from sklearn.metrics import make_scorer, accuracy_score, classification_report, confusion_matrix

In [None]:
# Define models in object
models = {
    "Linear Regression": LinearRegression(fit_intercept=True),
    "KNN": KNeighborsClassifier(),
    "Decisiontree": DecisionTreeClassifier(),
    'RandomForest': RandomForestClassifier(max_features='sqrt', random_state=5),
    'LogisticRegression': LogisticRegression(),
    'Lasso': Lasso(alpha=0.1)
}

In [None]:
# Looping through the models and fiting dataset to each model. Calculate score
for model in models.items():
    m = model[1]
    m.fit(x_train, y_train)
    print(model[0])
    print("Score: ", m.score(x_test,y_test))
    print("")
    

From above test we found that KNeighborsClassifier is predict the price range accurately

# Predicting Test Data

In [None]:
# Preparing test dataset
data_test = df_test.drop('id',axis=1)

In [None]:
data_test.info()
data_test.shape

In [None]:
x = df_train.drop('price_range', axis=1)
y = df_train['price_range']

In [None]:
# Predict price range using KNN model
knn = KNeighborsClassifier()
knn.fit(x, y)
pred_price = knn.predict(data_test)

In [None]:
# Adding Predicted price range to test dataset 
data_test['price_range'] = pred_price

In [None]:
data_test.head()

**Please upvote if you like **

**To be continued**