# Used Phones & Tablets Pricing Dataset

CONTEXT- The used and refurbished device market has grown considerably over the past decade as it provide cost-effective alternatives to both consumers and businesses that are looking to save money when purchasing one. Maximizing the longevity of devices through second-hand trade also reduces their environmental impact and helps in recycling and reducing waste. Here is a sample dataset of normalized used and new pricing data of refurbished / used devices.

OBJECTIVE- The objective is to do Exploratory Data Analytics and apply Linear Regression to create a model which can help in pricing of such devices.


More Information about the Dataset

device_brand: Name of manufacturing brand   
os: OS on which the device runs   
screen_size: Size of the screen in cm            
4g: Whether 4G is available or not                 
5g: Whether 5G is available or not                           
front_camera_mp: Resolution of the rear camera in megapixels                    
back_camera_mp: Resolution of the front camera in megapixels                    
internal_memory: Amount of internal memory (ROM) in GB                     
ram: Amount of RAM in GB                                 
battery: Energy capacity of the device battery in mAh                  
weight: Weight of the device in grams                          
release_year: Year when the device model was released                    
days_used: Number of days the used/refurbished device has been used                       
normalized_new_price: Normalized price of a new device of the same model
                               
Kaggele Dataset Link :            
https://www.kaggle.com/datasets/ahsan81/used-handheld-device-data

In [1]:
# Importing the important libraries and dependencies

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import r2_score,mean_squared_error

In [2]:
mobile_data = pd.read_csv("c://users/santhosh reddy/desktop/untitled folder/projects/used_device_data.csv")

In [3]:
mobile_data.head()

Unnamed: 0,device_brand,os,screen_size,4g,5g,rear_camera_mp,front_camera_mp,internal_memory,ram,battery,weight,release_year,days_used,normalized_used_price
0,Honor,Android,14.5,yes,no,13.0,5.0,64.0,3.0,3020.0,146.0,2020,127,4.307572
1,Honor,Android,17.3,yes,yes,13.0,16.0,128.0,8.0,4300.0,213.0,2020,325,5.162097
2,Honor,Android,16.69,yes,yes,13.0,8.0,128.0,8.0,4200.0,213.0,2020,162,5.111084
3,Honor,Android,25.5,yes,yes,13.0,8.0,64.0,6.0,7250.0,480.0,2020,345,5.135387
4,Honor,Android,15.32,yes,no,13.0,8.0,64.0,3.0,5000.0,185.0,2020,293,4.389995


In [4]:
mobile_data.isnull().sum()

device_brand               0
os                         0
screen_size                0
4g                         0
5g                         0
rear_camera_mp           179
front_camera_mp            2
internal_memory            4
ram                        4
battery                    6
weight                     7
release_year               0
days_used                  0
normalized_used_price      0
dtype: int64

In [5]:
mobile_data.shape

(3454, 14)

In [6]:
mobile_data = mobile_data.dropna(axis=0)

In [7]:
mobile_data.isnull().sum()

device_brand             0
os                       0
screen_size              0
4g                       0
5g                       0
rear_camera_mp           0
front_camera_mp          0
internal_memory          0
ram                      0
battery                  0
weight                   0
release_year             0
days_used                0
normalized_used_price    0
dtype: int64

In [8]:
# Importing the Regression models

from sklearn.linear_model import LinearRegression
from sklearn.linear_model import Ridge
from sklearn.linear_model import Lasso
from sklearn.linear_model import BayesianRidge
from sklearn.linear_model import ElasticNet
from sklearn.tree import DecisionTreeRegressor
from sklearn.svm import SVR
from sklearn.ensemble import GradientBoostingRegressor

# LabelEncoding the dataset

In [9]:
le = LabelEncoder()

In [10]:
print(mobile_data['device_brand'].unique())
print(mobile_data['os'].unique())
print(mobile_data['4g'].unique())
print(mobile_data['5g'].unique())

['Honor' 'Others' 'HTC' 'Huawei' 'Lava' 'Lenovo' 'LG' 'Micromax' 'Nokia'
 'Oppo' 'Samsung' 'Vivo' 'Xiaomi' 'ZTE' 'Apple' 'Asus' 'Acer' 'Alcatel'
 'BlackBerry' 'Celkon' 'Coolpad' 'Gionee' 'Google' 'Karbonn' 'Meizu'
 'Microsoft' 'Motorola' 'OnePlus' 'Panasonic' 'Realme' 'Sony' 'Spice'
 'XOLO']
['Android' 'Others' 'iOS' 'Windows']
['yes' 'no']
['no' 'yes']


In [11]:
mobile_data.device_brand = le.fit_transform(mobile_data.device_brand)

In [12]:
mobile_data['os'] = le.fit_transform(mobile_data['os'])
mobile_data['4g'] = le.fit_transform(mobile_data['4g'])
mobile_data['5g'] = le.fit_transform(mobile_data['5g'])

In [13]:
mobile_data.head()

Unnamed: 0,device_brand,os,screen_size,4g,5g,rear_camera_mp,front_camera_mp,internal_memory,ram,battery,weight,release_year,days_used,normalized_used_price
0,10,0,14.5,1,0,13.0,5.0,64.0,3.0,3020.0,146.0,2020,127,4.307572
1,10,0,17.3,1,1,13.0,16.0,128.0,8.0,4300.0,213.0,2020,325,5.162097
2,10,0,16.69,1,1,13.0,8.0,128.0,8.0,4200.0,213.0,2020,162,5.111084
3,10,0,25.5,1,1,13.0,8.0,64.0,6.0,7250.0,480.0,2020,345,5.135387
4,10,0,15.32,1,0,13.0,8.0,64.0,3.0,5000.0,185.0,2020,293,4.389995


# Standardizing the data using StandardScaler

In [14]:
scalar = StandardScaler()

In [15]:
# Separating the features and target values

feature_data = mobile_data.drop(columns='normalized_used_price', axis=1)
target_data = mobile_data['normalized_used_price']

In [16]:
mobile_std_data = scalar.fit_transform(feature_data)

In [17]:
mobile_std_data

array([[-0.88416818, -0.24776263,  0.22917174, ..., -0.40393231,
         1.93547792, -2.41568453],
       [-0.88416818, -0.24776263,  0.95626927, ...,  0.33766177,
         1.93547792, -1.57462696],
       [-0.88416818, -0.24776263,  0.79786588, ...,  0.33766177,
         1.93547792, -2.26701274],
       ...,
       [-1.89782648, -0.24776263,  0.56675274, ..., -0.19362951,
         1.93547792, -2.10134988],
       [-1.89782648, -0.24776263,  0.56675274, ..., -0.24897236,
         1.93547792, -2.32223369],
       [-1.89782648, -0.24776263, -0.20448999, ..., -0.16042381,
         1.93547792, -2.20754402]])

In [19]:
# Splitting into training and testing data

x_train, x_test, y_train, y_test = train_test_split(mobile_std_data, target_data, test_size=0.2, random_state=3) 

In [20]:
print(x_train.shape, x_test.shape, mobile_std_data.shape)

(2602, 13) (651, 13) (3253, 13)


In [21]:
print(x_train, y_train)

[[-0.09576729 -0.24776263 -0.85108743 ... -0.53675513 -0.35035917
   0.87208598]
 [ 0.69263361 -0.24776263 -0.85108743 ...  0.23804465 -0.80752658
   1.47526868]
 [ 0.58000491  1.9543462  -0.85108743 ... -0.24897236 -0.80752658
   1.24164158]
 ...
 [-0.54628208 -0.24776263 -0.27200619 ... -0.58102941 -1.264694
  -0.26631518]
 [-0.54628208 -0.24776263 -0.8770552  ... -0.71385223 -1.264694
  -0.55516324]
 [ 0.12949011 -0.24776263 -0.20448999 ... -0.21576665  0.10680825
   1.46677315]] 1873    3.969348
2241    4.294424
639     4.216857
324     3.404857
2225    4.309993
          ...   
1056    4.904904
3119    4.236278
1757    4.241902
1778    3.746677
1993    4.434145
Name: normalized_used_price, Length: 2602, dtype: float64


In [22]:
# Creating the r2_score and mean_squared_error function

def accuracy_score(actual_values, predicted_values):
    r2_scr = r2_score(actual_values, predicted_values)
    print('R2 score of the model is :',r2_scr)
    
    mean_sqsr = mean_squared_error(actual_values, predicted_values)
    print('Mean Squared error of the model is :',mean_sqsr)

# Linear Regression Model

In [23]:
linear_model = LinearRegression()

In [24]:
linear_model.fit(x_train, y_train)

In [25]:
# Linear_regression model train data Prediction

linear_train_prediction = linear_model.predict(x_train)

In [26]:
accuracy_score(y_train, linear_train_prediction)

R2 score of the model is : 0.7353726272889094
Mean Squared error of the model is : 0.09004666367300126


In [27]:
# Linear_regression model test data prediction

linear_test_pred = linear_model.predict(x_test)

In [28]:
accuracy_score(y_test, linear_test_pred)

R2 score of the model is : 0.7589991600642522
Mean Squared error of the model is : 0.07634491813086854


# Ridge Regression model

In [29]:
ridge_param = {'alpha':[0.2,0.5,0.8,1],
              'solver':['svd','sparse_cg','lsqr','saga'],
              }

In [30]:
ridge_grid = GridSearchCV(Ridge(), ridge_param, cv=5)

In [31]:
ridge_grid.fit(x_train, y_train)

In [32]:
ridge_grid.best_params_

{'alpha': 1, 'solver': 'svd'}

In [33]:
ridge_model = Ridge(alpha=1, solver='svd', copy_X=True, tol=0.0001)

In [34]:
ridge_model.fit(x_train, y_train)

In [35]:
# Ridge regression train data prediction

ridge_train_pred = ridge_model.predict(x_train)

In [36]:
accuracy_score(y_train, ridge_train_pred)

R2 score of the model is : 0.7353725418382075
Mean Squared error of the model is : 0.09004669274992755


In [37]:
# Ridge regression test data prediction

ridge_test_pred = ridge_model.predict(x_test)

In [38]:
accuracy_score(y_test, ridge_test_pred)

R2 score of the model is : 0.7590088508230046
Mean Squared error of the model is : 0.07634184826528724


# Lasso Regression model

In [39]:
lasso_param = {'alpha':[0.1,0.2,0.5,0.8,1],
              'max_iter':[100,1000,2000],
              'selection':['cyclic','random']
              }

In [40]:
lasso_grid = GridSearchCV(Lasso(), lasso_param, cv=5)

In [41]:
lasso_grid.fit(x_train, y_train)

In [42]:
lasso_grid.best_params_

{'alpha': 0.1, 'max_iter': 100, 'selection': 'cyclic'}

In [43]:
lasso_model = Lasso(alpha=0.01, max_iter=1000, selection='cyclic', fit_intercept=True, copy_X=True)

In [44]:
lasso_model.fit(x_train, y_train)

In [45]:
# Lasso regression train data prediction

lasso_train_pred = lasso_model.predict(x_train)

In [46]:
accuracy_score(y_train, lasso_train_pred)

R2 score of the model is : 0.7321692778716615
Mean Squared error of the model is : 0.09113669047048206


In [47]:
# Lasso regression test data prediction

lasso_test_pred = lasso_model.predict(x_test)

In [48]:
accuracy_score(y_test, lasso_test_pred)

R2 score of the model is : 0.758539244643847
Mean Squared error of the model is : 0.07649061142026667


# BayesianRidge Regression model

In [49]:
bridge_model = BayesianRidge(tol=0.001, fit_intercept=True)

In [50]:
bridge_model.fit(x_train, y_train)

In [51]:
# BayesianRidge regression train prediction

bridge_train_pred = bridge_model.predict(x_train)

In [52]:
accuracy_score(y_train, bridge_train_pred)

R2 score of the model is : 0.7353646302291036
Mean Squared error of the model is : 0.09004938489018803


In [53]:
# BayesianRidge regression test prediction

bridge_test_pred = bridge_model.predict(x_test)

In [54]:
accuracy_score(y_test, bridge_test_pred)

R2 score of the model is : 0.7590858425825692
Mean Squared error of the model is : 0.07631745860099279


# ElasticNet Regression Model

In [55]:
elasticnet_param = {'alpha':[0.1,0.2,0.5,0.8,1],
                   'l1_ratio':[0.01,0.2,0.5,0.8]
                   }

In [56]:
elasticnet_grid = GridSearchCV(ElasticNet(), elasticnet_param, cv=5)

In [57]:
elasticnet_grid.fit(x_train, y_train)

In [58]:
elasticnet_grid.best_params_

{'alpha': 0.1, 'l1_ratio': 0.01}

In [59]:
elasticnet_model = ElasticNet(alpha=0.01, l1_ratio=0.01, fit_intercept=True, copy_X=True)

In [60]:
elasticnet_model.fit(x_train, y_train)

In [61]:
# ElasticNet regression train prediction

elasticnet_train_pred = elasticnet_model.predict(x_train)

In [62]:
accuracy_score(y_train, elasticnet_train_pred)

R2 score of the model is : 0.735313724266215
Mean Squared error of the model is : 0.09006670702913465


In [63]:
# ElasticNet regression test prediction

elasticnet_test_pred = elasticnet_model.predict(x_test)

In [64]:
accuracy_score(y_test, elasticnet_test_pred)

R2 score of the model is : 0.7592134772947634
Mean Squared error of the model is : 0.07627702612094119


# DecisionTree Regression model

In [65]:
decision_param = {'criterion':['squared_error','friedman_mse','absolute_error'],
                 'min_samples_split':[2,4,5,8],
                 'min_samples_leaf':[1,3,5],
                  'random_state':[3]
                 }

In [66]:
decision_grid = GridSearchCV(DecisionTreeRegressor(), decision_param, cv=5)

In [67]:
decision_grid.fit(x_train, y_train)

In [68]:
decision_grid.best_params_

{'criterion': 'friedman_mse',
 'min_samples_leaf': 5,
 'min_samples_split': 2,
 'random_state': 3}

In [69]:
decision_model = DecisionTreeRegressor(criterion='friedman_mse', min_samples_leaf=5, min_samples_split=2, random_state=3)

In [70]:
decision_model.fit(x_train, y_train)

In [71]:
# Decision Tree Regressor train prediction

decision_train_pred = decision_model.predict(x_train)

In [72]:
accuracy_score(y_train, decision_train_pred)

R2 score of the model is : 0.899645479992919
Mean Squared error of the model is : 0.03414835592616067


In [73]:
# Decision Tree Regressor test prediction

decision_test_pred = decision_model.predict(x_test)

In [74]:
accuracy_score(y_test, decision_test_pred)

R2 score of the model is : 0.7198959236101645
Mean Squared error of the model is : 0.08873215041825482


# SVM regression model

In [75]:
svr_param = {'kernel':['linear','rbf','sigmoid'],
            'coef0':[0.01,0.5,0.8],
            'C':[0.1,0.7,1]
            }

In [76]:
svr_grid = GridSearchCV(SVR(), svr_param, cv=5)

In [77]:
svr_grid.fit(x_train, y_train)

In [78]:
svr_grid.best_params_

{'C': 1, 'coef0': 0.01, 'kernel': 'rbf'}

In [79]:
svr_model = SVR(C=1, coef0=0.01, kernel='rbf')

In [80]:
svr_model.fit(x_train, y_train)

In [81]:
# SVR train prediction

svr_train_pred = svr_model.predict(x_train)

In [82]:
accuracy_score(y_train, svr_train_pred)

R2 score of the model is : 0.8346149049131398
Mean Squared error of the model is : 0.05627677848002792


In [83]:
# SVR test prediction

svr_test_pred = svr_model.predict(x_test)

In [84]:
accuracy_score(y_test, svr_test_pred)

R2 score of the model is : 0.8178212096710455
Mean Squared error of the model is : 0.05771110522499734


# Gradient Boosting Regression model

In [115]:
GradientBoostingRegressor?

In [111]:
gboost_param = {'loss':['squared_error','absolute_error','quantile'],
               'learning_rate':[0.01,0.1,0.3],
               'n_estimators':[50,100,500],
               'max_features':['sqrt','log2'],
               'alpha':[0.01,0.1],
               }

In [112]:
gboost_grid = GridSearchCV(GradientBoostingRegressor(), gboost_param, cv=5)

In [113]:
gboost_grid.fit(x_train, y_train)

In [114]:
gboost_grid.best_params_

{'alpha': 0.1,
 'learning_rate': 0.1,
 'loss': 'squared_error',
 'max_features': 'log2',
 'n_estimators': 500}

In [116]:
gboost_model = GradientBoostingRegressor(alpha=0.01, learning_rate=0.1, loss='squared_error', max_features='log2', n_estimators=1000)

In [117]:
gboost_model.fit(x_train, y_train)

In [118]:
# Gradient Boosting model train prediction

gboost_train_pred = gboost_model.predict(x_train)

In [119]:
accuracy_score(y_train, gboost_train_pred)

R2 score of the model is : 0.923763000886942
Mean Squared error of the model is : 0.025941713240932328


In [120]:
# Gradient Boosting model test prediction

gboost_test_pred = gboost_model.predict(x_test)

In [121]:
accuracy_score(y_test, gboost_test_pred)

R2 score of the model is : 0.8134327475914652
Mean Squared error of the model is : 0.059101294480251844


# Finding the model with least mean squared error 

In [122]:
models_list = [LinearRegression(),
              Ridge(alpha=1, solver='svd', copy_X=True, tol=0.0001),
              Lasso(alpha=0.01, max_iter=1000, selection='cyclic', fit_intercept=True, copy_X=True),
              BayesianRidge(tol=0.001, fit_intercept=True),
              DecisionTreeRegressor(criterion='friedman_mse', min_samples_leaf=5, min_samples_split=2, random_state=3),
              SVR(C=1, coef0=0.01, kernel='rbf'),
              GradientBoostingRegressor(alpha=0.01, learning_rate=0.1, loss='squared_error', max_features='log2', n_estimators=1000)
              ]

In [129]:
for model in models_list:
    model.fit(x_train, y_train)
    train_pred = model.predict(x_train)
    test_pred = model.predict(x_test)
    
    mse_train = mean_squared_error(y_train, train_pred)
    mse_test = mean_squared_error(y_test, test_pred)
    
    model_name = str(model)
    model_name = model_name.split('(')
    print(model_name[0])
    print('Mean Squared Error of the train data is :',mse_train)
    print('Mean Squared Error of the test data is :',mse_test)
    print('-----------------------------------------------------------')

LinearRegression
Mean Squared Error of the train data is : 0.09004666367300126
Mean Squared Error of the test data is : 0.07634491813086854
-----------------------------------------------------------
Ridge
Mean Squared Error of the train data is : 0.09004669274992755
Mean Squared Error of the test data is : 0.07634184826528724
-----------------------------------------------------------
Lasso
Mean Squared Error of the train data is : 0.09113669047048206
Mean Squared Error of the test data is : 0.07649061142026667
-----------------------------------------------------------
BayesianRidge
Mean Squared Error of the train data is : 0.09004938489018803
Mean Squared Error of the test data is : 0.07631745860099279
-----------------------------------------------------------
DecisionTreeRegressor
Mean Squared Error of the train data is : 0.03414835592616067
Mean Squared Error of the test data is : 0.08873215041825482
-----------------------------------------------------------
SVR
Mean Squared Err

Comparing all the models,                       
1. Support Vector Regressor                                
2. Gradient Boosting Regressor                               
has the least Mean Squared Error.                                         
                                               
We can use one of these models for creating the predictive systems                  