# Mobile Price Prediction
Dataset Courtesy of Abhishek Sharma (https://www.kaggle.com/iabhishekofficial/).

Exercise: 
Recently, I was gifted a smartphone. Now I want to know what the value of the phone I was gifted is likely worth.

So in this notebook, I will seek to find the best predictive model based on mobile phone features in this structured dataset. 

In this exercise, we explore the algorithms:
* Logistic Regression
* KNN
* Decision Tree
* Random Forest
and also:  a Linear Regression as a bonus!

We will then use the model with the highest score to predict the price-range of my own handset, let's hope I get a good price range.


In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

In [None]:
# Read in CSV in train_df DataFrame
train_df = pd.read_csv('../input/mobile-price-classification/train.csv')

In [None]:
# Preview train_df's head
train_df.head()

In [None]:
train_df[:50]

In [None]:
# Explore dataset with .info()
train_df.info()

In [None]:
# Use describe function to explore dataset
train_df.describe()

# Model Creation

In [None]:
# Create X and y datasets
X = train_df.drop('price_range', axis=1)
y = train_df['price_range']

In [None]:
# Split dataset
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Creating and Training Logistic Regression Model

In [None]:
# Import and instantiate Logistic Regression model
from sklearn.linear_model import LogisticRegression
logmodel = LogisticRegression(max_iter=1000)

In [None]:
# Fit model
logmodel.fit(X_train, y_train)

In [None]:
logmodel.score(X_test, y_test)

In [None]:
# Import and instantiate Logistic Regression model (this time with more iterations - 10000)
from sklearn.linear_model import LogisticRegression
logmodel_10k = LogisticRegression(max_iter=10000)

In [None]:
# Fit model
logmodel_10k.fit(X_train, y_train)

In [None]:
logmodel_10k.score(X_test, y_test)

# Creating & Training KNN Model

In [None]:
# Impport, instantiate and fit model
from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors=10)
knn.fit(X_train,y_train)

In [None]:
# Check score of KNN model
knn.score(X_test,y_test)

# Elbow Method For optimum value of K

In [None]:
error_rate = []
for i in range(1,20):
    
    knn = KNeighborsClassifier(n_neighbors=i)
    knn.fit(X_train,y_train)
    pred_i = knn.predict(X_test)
    error_rate.append(np.mean(pred_i != y_test))

In [None]:
# Reveal plot of Error rate vs K Value (Elbow Method)
plt.figure(figsize=(10,6))
plt.plot(range(1,20),error_rate,color='red', linestyle='dashed', marker='o',
         markerfacecolor='yellow', markersize=5)
plt.title('Error Rate vs. K Value')
plt.xlabel('K')
plt.ylabel('Error Rate')

From the  Visualization of Error rate vs K Value (above), we can visually and intuitively conclude that the n_neighbors parameter equalling 10 is sufficiently optimal. 

# Creating & Training Decision Tree Model

In [None]:
# Import, instantiate and fit Decision Tree Model
from sklearn.tree import DecisionTreeClassifier
dtree = DecisionTreeClassifier()
dtree.fit(X_train,y_train)

In [None]:
# Check score of Decision Tree model
dtree.score(X_test,y_test)

# Visualizing the Decision Tree model

In [None]:
feature_names=['battery_power', 'blue', 'clock_speed', 'dual_sim', 'fc', 'four_g',
       'int_memory', 'm_dep', 'mobile_wt', 'n_cores', 'pc', 'px_height',
       'px_width', 'ram', 'sc_h', 'sc_w', 'talk_time', 'three_g',
       'touch_screen', 'wifi']

# Creating & Training Random Forest Model

In [None]:
# Import, instantiate and fit model
from sklearn.ensemble import RandomForestClassifier
rfc = RandomForestClassifier(n_estimators=200)
rfc.fit(X_train, y_train)

In [None]:
# Check score of Random Forest Classifier model
rfc.score(X_test, y_test)

# From the findings, we can rank the models by their score. The KNN model performed the best, followed by Random Forest Classifier, then Decision Tree and finally the Logistic Regression Model.

# Let's assess the classification report and confusion matrix of the KNN and Random Forest Classifier models.

In [None]:
# Import Classification Report and Confusion Matrix functions
from sklearn.metrics import classification_report, confusion_matrix

Let's assess KNN's performance first

In [None]:
# Generate predictions from X_test dataset for KNN model and Random Forest Classifier models
knn_pred = knn.predict(X_test)
rfc_pred = rfc.predict(X_test)

In [None]:
# Reveal Classification Report for the KNN model
print(classification_report(y_test,knn_pred))

In [None]:
# Reveal Confusion Matrix for KNN model
print(confusion_matrix(y_test, knn_pred))

In [None]:
# Reveal Classification Report for the Random Forest Classifer model
print(classification_report(y_test,rfc_pred))

In [None]:
# Reveal Confusion Matrix for Random Forest Classifier model
print(confusion_matrix(y_test, rfc_pred))

In [None]:
# Importing own mobile sample to test prediction
own_mobile_test = pd.read_csv('/kaggle/input/mobilesample/own_mobile2.csv')

In [None]:
# Show data sample
own_mobile_test

In [None]:
# Generate single sample 
own_mobile_X_test = own_mobile_test.drop('price_range', axis=1)

# With the knowledge that my own mobile handset for sampling is a premium priced handset with high-end specifications but in a form-factor associated a lower price range. 

In [None]:
# Create predictions from the 4 models and compare

# KNN Model prediction
knn_own_pred = knn.predict(own_mobile_X_test)
knn_own_pred

In [None]:
# Logistic Regression Model prediction
logmodel_own_pred = logmodel.predict(own_mobile_X_test)
logmodel_own_pred

In [None]:
# Decision Tree Model prediction
dtree_own_pred = dtree.predict(own_mobile_X_test)
dtree_own_pred

In [None]:
# Random Forest Classifier Model prediction
rfc_own_pred = rfc.predict(own_mobile_X_test)
rfc_own_pred

# It seems all 4 models predicted that it was in the highest price range (which is true in real-life experience).

# Creating & Training Linear Regression Model

In [None]:
# Import, instantiate and fit Linear Regression Model
from sklearn.linear_model import LinearRegression
lr = LinearRegression()
lr.fit(X_train, y_train)

In [None]:
y_pred_linear = lr.predict(X_test)

In [None]:
# RMSE: np.sqrt(mean_square_error()):
from sklearn.metrics import mean_squared_error
rmse = np.sqrt(mean_squared_error(y_test, y_pred_linear))
print("Root Mean Squared Error: {}".format(rmse))

In [None]:
# Check score of Linear Regression model
lr.score(X_test, y_test)

In [None]:
# Let's check the prediction with the Linear Regression Model on my own mobile sample
lr_own_pred = lr.predict(own_mobile_X_test)
lr_own_pred

# Conclusion
Indeed the Linear Regression model shows a value that is a higher price range than the price range for the entire training dataset (whilst the earlier Classifier models predict the training dataset's maximum price_range of 3). 

These results reflect the fact that the sample mobile phone was released many years after  date of the original dataset and thus has features that far exceed almost all of the phone specifications in the existing dataset. A prediction from linear regression that exceeds the price_range of 3 is entirely plausible. 