# Mobile Phone Price Prediction

#### About Dataset :

- In this Project,On the basis of the mobile Specification like Battery power, 3G enabled , wifi ,Bluetooth, Ram etc we are predicting Price range of the mobile. To know more about data https://www.kaggle.com/iabhishekofficial/mobile-price-classification

##### In this data:
- id:ID
- battery_power:Total energy a battery can store in one time measured in mAh
- blue:Has bluetooth or not
- clock_speed:speed at which microprocessor executes instructions
- dual_sim:Has dual sim support or not
- fc:Front Camera mega pixels
- four_g:Has 4G or not
- int_memory:Internal Memory in Gigabytes
- m_dep:Mobile Depth in cm
- mobile_wt:Weight of mobile phone
- n_cores:Number of cores of processor
- pc:Primary Camera mega pixels
- px_height:Pixel Resolution Height
- px_width:Pixel Resolution Width
- ram:Random Access Memory in Megabytes
- sc_h:Screen Height of mobile in cm
- sc_w:Screen Width of mobile in cm
- talk_time:longest time that a single battery charge will last when you are
- three_g:Has 3G or not
- touch_screen:Has touch screen or not
- wifi:Has wifi or not
- __price_range__: This is the target variable with value of __0 (low cost)__, __1 (medium cost)__, __2 (high cost)__ and __3 (very high cost)__

### 1. importing libraries & loading dataset

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
dataset = pd.read_csv('../input/mobile-price-classification/train.csv')
dataset.head()

- in this dataset there are 4 o/p varible.
- its multiclass classification problem.

- 0 - Low
- 1 - Medium
- 2 - High
- 3 - Very High

### 2. EDA

#### Checking the missing values.

In [None]:
dataset.isnull().sum()

- There are no null values. So now We can check the datatype


#### 2.1 Checking the datatype

In [None]:
dataset.dtypes

- Seems Like there are no Categorical Feature. All value are numeric dataset. So We can do further process.

In [None]:
dataset.shape

- We have 2000 samples and 21 Features.
- The last Feature is Target Feature which means we have label dataset.

#### 2.2 Descriptive Analysis

In [None]:
dataset.describe()

### 2.3 Data Visulization & Analysis

####  2.3.1 checking balanced or imbalanced dataset

In [None]:
value_counts = pd.value_counts(dataset['price_range'])
value_counts.values # converting into numpy array cause other wise we can't plot pie
label  = ['very high', 'high', 'medium', 'low']
colors = ['yellow','turquoise','lightblue', 'pink']
fig1, axarr = plt.subplots()

plt.pie(value_counts.values, autopct = '%0.01f', explode = [0.1,0.1,0.1,0.1], shadow = True, labels = label, colors = colors)

axarr.set_title('balanced or imbalaced?')
plt.show()


- In above the pie chart all class have same number of dataset.
- 0 - 500 (low price)
- 1 - 500 (medium price)
- 2 - 500 (high price)
- 3 - 500 (very high price)

#### 2.3.2  Ram affect on price

In [None]:
sns.jointplot(x = 'ram', y = 'price_range', data = dataset, kind = 'kde', color = 'green')

####  2.3.4 internal_memoery vs price

In [None]:
sns.pointplot(y = 'int_memory', x = 'price_range', data = dataset)

#### 2.3.5 Battery Power vs Price range

In [None]:
sns.boxplot(x = 'price_range', y = 'battery_power',data = dataset)

#### 2.3.6 4g supported or not

In [None]:
values = dataset['four_g'].value_counts()
label = ['4G-supported', 'Not supported']
color = ['lightgreen', 'lightpink']
fig, ax1 = plt.subplots()
plt.pie(values, autopct = '%0.01f', labels = label, startangle = 90, colors  =color, shadow = True)
ax1.set_title('4G supported or not supported?')
plt.show()

#### 2.3.7 3G support or not

In [None]:
values = dataset['three_g'].value_counts()
label = ['3G supported', 'Not supported']
fig, ax1 = plt.subplots()
plt.pie(values, startangle = 70, labels = label, autopct = '%0.01f%%', explode = [0,0.1], shadow  = True)
ax1.set_title('3G supported or not supported?') 
plt.show()

#### 2.3.8 No. of Phones vs Camera megapixels of front and primary camera

In [None]:
plt.figure(figsize=(10,6))
dataset['fc'].hist(alpha=0.5,color='blue',label='Front camera')
dataset['pc'].hist(alpha=0.5,color='red',label='Primary camera')
plt.legend()
plt.xlabel('MegaPixels')


####  2.3.9 Mobile Weight vs price

In [None]:
sns.jointplot(x = 'mobile_wt',y = 'price_range', data = dataset,kind = 'kde', color = 'green')
plt.show()

#### 2.3.10 time talk vs price_range

In [None]:
sns.pointplot(y = 'talk_time',x = 'price_range', data = dataset,kind = 'kde', color = 'gold')
plt.show()

#### 2.3.11 Finding the realation b/w the features

In [None]:
#sns.pairplot(data = dataset, hue = 'price_range')

#### 2.3.12 Finding the correlation b/w the features

In [None]:
dataset.corr()

In [None]:
plt.figure(figsize = (20,20))
sns.heatmap(dataset.corr(), annot = True, cmap = 'RdYlGn')

## 3. Data preparing

#####  3.1 Dependent and indepedent dataset

In [None]:
X  = dataset.iloc[:,:-1]
y = dataset.iloc[:,-1]

In [None]:
X

In [None]:
y

#### 3.2 Splitting data into train and test set

In [None]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size = 0.20, random_state = 0)

#### 3.3 Feature Scaling

In [None]:
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

In [None]:
X_train

In [None]:
X_test

## 4. Modeling

### 4.1 Logistic Regression

In [None]:
from sklearn.linear_model import LogisticRegression
lr = LogisticRegression(multi_class='multinomial',solver = 'sag') # (sag = Stochastic Average Gradient)
lr.fit(X_train, y_train)

# Predict the test set
y_pred = lr.predict(X_test)

# evauate the preformance
from sklearn.metrics import confusion_matrix, accuracy_score, classification_report
cm = confusion_matrix(y_test, y_pred)
print(cm)
print(accuracy_score(y_test, y_pred))

In [None]:
from sklearn.model_selection import cross_val_score
cvs = cross_val_score(estimator = lr,X = X_train, y = y_train)
print('accuracy of validation set :', cvs.mean())
print('accuracy of the training set :', lr.score(X_train,y_train))
print('accuracy of the testset :', lr.score(X_test, y_test))

### 4.2 DecisonTreeClassifier

In [None]:
from sklearn.tree import DecisionTreeClassifier
dt = DecisionTreeClassifier(criterion = 'entropy')
dt.fit(X_train,y_train)

# Predict the test set
y_pred = dt.predict(X_test)

# evauate the preformance
from sklearn.metrics import confusion_matrix, accuracy_score
cm = confusion_matrix(y_test, y_pred)
print(cm)
print(accuracy_score(y_test, y_pred))

In [None]:
from sklearn.model_selection import cross_val_score
cvs = cross_val_score(estimator = dt,X = X_train, y = y_train)
print('accuracy of validation set :', cvs.mean())
print('accuracy of the training set :', dt.score(X_train,y_train))
print('accuracy of the testset :', dt.score(X_test, y_test))

###  4.3 RandomForest Classifier

In [None]:
from sklearn.ensemble import RandomForestClassifier
rf = RandomForestClassifier(n_estimators = 100, criterion = 'entropy', random_state = 0)
rf.fit(X_train, y_train)

# Predict the test set
y_pred = rf.predict(X_test)

# evauate the preformance
from sklearn.metrics import confusion_matrix, accuracy_score
cm = confusion_matrix(y_test, y_pred)
print(cm)
print(accuracy_score(y_test, y_pred))

In [None]:
from sklearn.model_selection import cross_val_score
cvs = cross_val_score(estimator = rf,X = X_train, y = y_train)
print('accuracy of validation set :', cvs.mean())
print('accuracy of the training set :', rf.score(X_train,y_train))
print('accuracy of the testset :', rf.score(X_test, y_test))

### 4.4  Gaussian  Naive Bayes Classifier

In [None]:
from sklearn.naive_bayes import GaussianNB
nb = GaussianNB()
nb.fit(X_train, y_train)

# Predict the test set
y_pred = nb.predict(X_test)

# evauate the preformance
from sklearn.metrics import confusion_matrix, accuracy_score
cm = confusion_matrix(y_test, y_pred)
print(cm)
print(accuracy_score(y_test, y_pred))

In [None]:
from sklearn.model_selection import cross_val_score
cvs = cross_val_score(estimator = nb,X = X_train, y = y_train)
print('accuracy of validation set :', cvs.mean())
print('accuracy of the training set :', nb.score(X_train,y_train))
print('accuracy of the testset :', nb.score(X_test, y_test))

### 4.5 SVM

#####  using gridsearch find  the best parameter

In [None]:
parameters ={
'C' : [1,0.1,0.25,0.5,2,0.75],
'kernel' : ["linear","rbf"],
'gamma' : ["auto",0.01,0.001,0.0001,1],
'decision_function_shape' : ["ovo" ,"ovr"]}

In [None]:
from sklearn.model_selection import GridSearchCV
from sklearn.svm import SVC

grid_search = GridSearchCV(estimator = SVC(),
                           param_grid = parameters,
                           scoring = 'accuracy',
                           cv = 10,
                           )
grid_search = grid_search.fit(X_train, y_train)
best_accuracy = grid_search.best_score_
best_parameters = grid_search.best_params_
print("Best Accuracy: {:.2f} %".format(best_accuracy*100))
print("Best Parameters:", best_parameters)

#### Applying SVM with best Parameters

In [None]:
from sklearn.svm import SVC

svc=SVC(C=2,gamma="auto",decision_function_shape="ovo",kernel="linear",random_state=0)
svc.fit(X_train, y_train)


# Predict the test set
y_pred = svc.predict(X_test)

# evaluate the preformance
from sklearn.metrics import confusion_matrix, accuracy_score
cm = confusion_matrix(y_test, y_pred)
print(cm)
print(accuracy_score(y_test, y_pred))

In [None]:
from sklearn.model_selection import cross_val_score
cvs = cross_val_score(estimator = svc,X = X_train, y = y_train)
print('accuracy of validation set :', cvs.mean())
print('accuracy of the training set :', svc.score(X_train,y_train))
print('accuracy of the testset :', svc.score(X_test, y_test))

## 5. Conclusion:

In [None]:
plt.figure(figsize = (12,6))
label = ['Logistic Regression', 'Decision Tree', 'Random Forest', 'GaussainNB', 'Support Vector Machine',]
acc_score = [0.95, 0.85, 0.87, 0.83, 0.95]

plt.bar(label,acc_score, color=['lightblue', 'pink', 'lightgrey','gold', 'cyan'])
plt.title('Which model is the most accurate?')
plt.xlabel('')
plt.ylabel('Accuracy Scores')
plt.show()

- After training our dataset with five different model, we conclude that __SVM__ & __Logistic Regression__ is best model for our dataset. (via the highest accuracy score = 0.95)
- But here i'm selecting __SVM__ to predict the test dataset. but we can also use Logsitic Regression.

## 6. Applying the SVM to Test dataset

#### 6.1 Loading test data

In [None]:
test_data = pd.read_csv('../input/mobile-price-classification/test.csv')
test_data.head()

- Note : We don't have id column in 'train.csv' data so can  drop this columns from our test_dataset.To make the dimension of input dataset same.

#### 6.2 dropping the 'id' Column

In [None]:
test_df  = test_data.drop('id', axis = 1)

In [None]:
test_df

- Successfully removed id column from test_data.

##### 6.3 Applying Feature scaling to test set

In [None]:

sc = StandardScaler()
test_df1 = sc.fit_transform(test_df)


#### 6.4 Applying SVM to test_df

In [None]:
predicted_price_range = svc.predict(test_df1) 

In [None]:
predicted_price_range

- here above,we have predicted price by SVM Model for this __test_df__ dataset. Now we are going to add __predicted_price_range__ to the __test_df__ dataset.

#### 6.5 Adding the predicted price to test_df

In [None]:
test_df['price_range'] = predicted_price_range

In [None]:
test_df

- __We have achieved our goal and predicted price ranges for mobile phones in our new dataset__.