# Mobile Price Range Classification

### Table of contenet:
<li><a href='#intro'>Introduction</a></li>
<li><a href='#dc'>Data Collecting</a></li>
<li><a href='#dq'> Data Questions</a></li>
<li><a href='#sql'> Export Data to SQL</a></li>

<a id='intro'></a>
## Introduction

* **Context**
<p>Bob has started his own mobile company. He wants to give tough fight to big companies like Apple,Samsung etc.

He does not know how to estimate price of mobiles his company creates. In this competitive mobile phone market you cannot simply assume things. To solve this problem he collects sales data of mobile phones of various companies.

Bob wants to find out some relation between features of a mobile phone(eg:- RAM,Internal Memory etc) and its selling price. But he is not so good at Machine Learning. So he needs your help to solve this problem.

In this problem you do not have to predict actual price but a price range indicating how high the price is. </p>
you can find the dataset [here](https://www.kaggle.com/iabhishekofficial/mobile-price-classification).
* **Dataset files** 
    * train.csv - relation between features of a mobile phone(eg:- RAM,Internal Memory etc) and its selling price range.
    * test.csv - new data need to predict in which price range.
* **Project Goal**
<p>Predict a price range indicating how high the price is</p>
* **Attributes Defintion**
<li>battery_power: Total energy a battery can store in one time measured in mAh</li>
Hint: Capacity loss: the faster the charging speed, the greater the capacity loss.
<li>blue: Has bluetooth or not</li>
<li>clock_speed: speed at which microprocessor executes instructionst</li>
<li>dual_sim: Has dual sim support or not</li>
<li>fc: Front Camera mega pixels</li>
<li>four_g: Has 4G or not</li>
<li>int_memory: Internal Memory in Gigabytes</li>
<li>m_dep: Mobile Depth in cm</li>
<li>mobile_wt: Weight of mobile phone</li>
<li>n_cores: Number of cores of processor</li>
<li>pc: Primary Camera mega pixels</li>
<li>px_height: Pixel Resolution Height</li>
<li>px_width: Pixel Resolution Width</li>
<li>ram: Random Access Memory in Megabytes</li>
<li>sc_h: Screen Height of mobile in cm</li>
<li>sc_w: Screen Width of mobile in cm</li>
<li>talk_time: longest time that a single battery charge will last when you are</li>
<li>three_g: Has 3G or not</li>
<li>touch_screen: Has touch screen or not</li>
<li>wifi: Has wifi or not</li>
<li>price_range: This is the target variable with value of 0(low cost), 1(medium cost), 2(high cost) and 3(very high cost).</li>


In [4]:
## Basic Importing 
import numpy as np
import pandas as pd
pd.set_option('display.max_columns',500)
import sqlite3

<a id='dc'></a>
## Data Collecting
#### 1- Load Datasets

In [5]:
# Files Path
sales = '../Datasets/train.csv'
test = '../Datasets/test.csv'

In [6]:
# create loading file function
def load_csv(fpath):
    '''load csv file
        Args:
            fpath(str): str, path object or file-like object.
            A local file could be: file://localhost/path/to/table.csv.
        Return: DataFrame
    '''
    try:
        df = pd.read_csv(fpath)
        df.name = fpath.split('/')[-1].split('.')[0]
        print(f'{df.name} dataset loaded successfuly')
        return df
    except FileNotFoundError:
        print(f'ERROR: No such file or directory:{fpath}')

In [7]:
# Load sales train data
df_sales = load_csv(sales)

train dataset loaded successfuly


In [8]:
# Load test data
df_test = load_csv(test)

test dataset loaded successfuly


#### 2- Check Datasets

In [9]:
def explore_data(df,num_rows=5):
    '''explore dataframe
    Args:
        df(dataframe): DataFrame.
        num_rows(int)| 5: Top & last number of rows to print.
    print: 
        DataFrame number of rows & columns.
        Top & last number of DataFrame rows 
    '''
    pd.set_option('display.max_columns',df.shape[1])
    try:
        df_name = df.name
    except:
        df_name = 'new_df'
    finally:
        print(f'{df_name} contains {df.shape[0]} rows & {df.shape[1]} columns')
        print(f'Top {num_rows} rows:')
        display(df.head(num_rows))
        print(f'Last {num_rows} rows:')
        display(df.tail(num_rows))

In [10]:
def check_data(df):
    '''check some information of the dataframe features
    Args:
        df(dataframe): DataFrame.
    print: 
        first value, number of unique values,top_freq value, data type, count missing values, % of missing values for each feature
    '''
    try:
        name = df.name
    except:
        name ="new_df"
    finally:
        dt_df = pd.DataFrame()
        dt_df['features'] = df.columns
        dt_df['first_val'] = pd.Series([df[col][0] for col in df.columns])
        dt_df['nunique'] = pd.Series([df[col].nunique() for col in df.columns])
        dt_df['top_freq'] = pd.Series([df[col].value_counts().index[0] for col in df.columns])
        dt_df.set_index('features',inplace=True)
        dt_df['dtype'] = df.dtypes
        dt_df['cnt_missing'] = df.isna().sum()
        dt_df['missing_%'] = df.isna().sum() / df.shape[0]*100
        print(f'{name} datatypes & missing values information')
        pd.set_option('display.max_columns',dt_df.shape[1])
        display(dt_df)

#### Sales DataFrame "Train"

In [11]:
# apply explore_data function
explore_data(df_sales,num_rows=5)

train contains 2000 rows & 21 columns
Top 5 rows:


Unnamed: 0,battery_power,blue,clock_speed,dual_sim,fc,four_g,int_memory,m_dep,mobile_wt,n_cores,pc,px_height,px_width,ram,sc_h,sc_w,talk_time,three_g,touch_screen,wifi,price_range
0,842,0,2.2,0,1,0,7,0.6,188,2,2,20,756,2549,9,7,19,0,0,1,1
1,1021,1,0.5,1,0,1,53,0.7,136,3,6,905,1988,2631,17,3,7,1,1,0,2
2,563,1,0.5,1,2,1,41,0.9,145,5,6,1263,1716,2603,11,2,9,1,1,0,2
3,615,1,2.5,0,0,0,10,0.8,131,6,9,1216,1786,2769,16,8,11,1,0,0,2
4,1821,1,1.2,0,13,1,44,0.6,141,2,14,1208,1212,1411,8,2,15,1,1,0,1


Last 5 rows:


Unnamed: 0,battery_power,blue,clock_speed,dual_sim,fc,four_g,int_memory,m_dep,mobile_wt,n_cores,pc,px_height,px_width,ram,sc_h,sc_w,talk_time,three_g,touch_screen,wifi,price_range
1995,794,1,0.5,1,0,1,2,0.8,106,6,14,1222,1890,668,13,4,19,1,1,0,0
1996,1965,1,2.6,1,0,0,39,0.2,187,4,3,915,1965,2032,11,10,16,1,1,1,2
1997,1911,0,0.9,1,1,1,36,0.7,108,8,3,868,1632,3057,9,1,5,1,1,0,3
1998,1512,0,0.9,0,4,1,46,0.1,145,5,5,336,670,869,18,10,19,1,1,1,0
1999,510,1,2.0,1,5,1,45,0.9,168,6,16,483,754,3919,19,4,2,1,1,1,3


In [12]:
# apply check_data function
check_data(df_sales)

train datatypes & missing values information


Unnamed: 0_level_0,first_val,nunique,top_freq,dtype,cnt_missing,missing_%
features,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
battery_power,842.0,1094,1872.0,int64,0,0.0
blue,0.0,2,0.0,int64,0,0.0
clock_speed,2.2,26,0.5,float64,0,0.0
dual_sim,0.0,2,1.0,int64,0,0.0
fc,1.0,20,0.0,int64,0,0.0
four_g,0.0,2,1.0,int64,0,0.0
int_memory,7.0,63,27.0,int64,0,0.0
m_dep,0.6,10,0.1,float64,0,0.0
mobile_wt,188.0,121,182.0,int64,0,0.0
n_cores,2.0,8,4.0,int64,0,0.0


* Sales DataFrame information:
    * Sales dataframe named as **train** contains 2000 rows & 21 columns.
    * Data set has valid data type , all variable are numeric data type.
    * Data doesn't contain missing values.
    * price range is qualitative data type

#### Test Dataframe

In [13]:
# apply explore_data function
explore_data(df_test,num_rows=5)

test contains 1000 rows & 21 columns
Top 5 rows:


Unnamed: 0,id,battery_power,blue,clock_speed,dual_sim,fc,four_g,int_memory,m_dep,mobile_wt,n_cores,pc,px_height,px_width,ram,sc_h,sc_w,talk_time,three_g,touch_screen,wifi
0,1,1043,1,1.8,1,14,0,5,0.1,193,3,16,226,1412,3476,12,7,2,0,1,0
1,2,841,1,0.5,1,4,1,61,0.8,191,5,12,746,857,3895,6,0,7,1,0,0
2,3,1807,1,2.8,0,1,0,27,0.9,186,3,4,1270,1366,2396,17,10,10,0,1,1
3,4,1546,0,0.5,1,18,1,25,0.5,96,8,20,295,1752,3893,10,0,7,1,1,0
4,5,1434,0,1.4,0,11,1,49,0.5,108,6,18,749,810,1773,15,8,7,1,0,1


Last 5 rows:


Unnamed: 0,id,battery_power,blue,clock_speed,dual_sim,fc,four_g,int_memory,m_dep,mobile_wt,n_cores,pc,px_height,px_width,ram,sc_h,sc_w,talk_time,three_g,touch_screen,wifi
995,996,1700,1,1.9,0,0,1,54,0.5,170,7,17,644,913,2121,14,8,15,1,1,0
996,997,609,0,1.8,1,0,0,13,0.9,186,4,2,1152,1632,1933,8,1,19,0,1,1
997,998,1185,0,1.4,0,1,1,8,0.5,80,1,12,477,825,1223,5,0,14,1,0,0
998,999,1533,1,0.5,1,0,0,50,0.4,171,2,12,38,832,2509,15,11,6,0,1,0
999,1000,1270,1,0.5,0,4,1,35,0.1,140,6,19,457,608,2828,9,2,3,1,0,1


In [14]:
# apply check_data function
check_data(df_test)

test datatypes & missing values information


Unnamed: 0_level_0,first_val,nunique,top_freq,dtype,cnt_missing,missing_%
features,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
id,1.0,1000,1.0,int64,0,0.0
battery_power,1043.0,721,1074.0,int64,0,0.0
blue,1.0,2,1.0,int64,0,0.0
clock_speed,1.8,26,0.5,float64,0,0.0
dual_sim,1.0,2,1.0,int64,0,0.0
fc,14.0,20,0.0,int64,0,0.0
four_g,0.0,2,0.0,int64,0,0.0
int_memory,5.0,63,56.0,int64,0,0.0
m_dep,0.1,10,0.1,float64,0,0.0
mobile_wt,193.0,121,83.0,int64,0,0.0


* Test DataFrame information:
    * Test contains 1000 rows & 21 columns.
    * Dataset has valid data type , all numeric features.
    * Test dataset doesn't contain price range need to predict it later during project process.

<a id='sql'></a>
## Export Data to SQL

* only 'train' sales dataframe 

In [15]:
db_path = "../Database/data_db.db"

In [16]:
def export_to_sql(db_path,sql_tbname,save_df):
    '''insert dataframe or series to sqlite3 
        Args: 
            db_path(str): database path
            sql_tbname(str):'SQL Table Name'
            save_df: dataframe or series to insert into SQlite
        print: 
            if successfuly inserted or failed
    '''
    connection = sqlite3.connect(db_path)
    try:
        print(f"Updating {sql_tbname} table in data_db Database...")
        save_df.to_sql(sql_tbname,con=connection,if_exists='append',index=False)
        print("Data inserted Successuly")
    except sqlite3.IntegrityError:
        print("Duplicate Error: Data Inserted before")
    except Exception as e:
        print("e is : {}".format(e))
        print(type(e).__name__)
        print('Failed to Insert New data')
    connection.close()

#### Export sales_df to Features_tb

In [17]:
export_to_sql(db_path,sql_tbname='Features_tb',save_df=df_sales)

Updating Features_tb table in data_db Database...
Data inserted Successuly


#### Great, Successuly exported all required data to the data_db database will create new file to answer the following Question using the integration between SQLite & Python. 

In [18]:
df_sales.columns

Index(['battery_power', 'blue', 'clock_speed', 'dual_sim', 'fc', 'four_g',
       'int_memory', 'm_dep', 'mobile_wt', 'n_cores', 'pc', 'px_height',
       'px_width', 'ram', 'sc_h', 'sc_w', 'talk_time', 'three_g',
       'touch_screen', 'wifi', 'price_range'],
      dtype='object')

<a id='dq'></a>
## Data Questions
* One of the main goal of the project is answer the following questions:
<ol>
    <li>What is the maximum , minmum & average of mobile battery power?</li>
    <li>How many sales for mobile which has bluetooth?</li>
    <li>which price range have the most sales?</li>
    <li>What is mobile weight average per price range?</li>
    <li>Does number of cores of processor affect on sales?</li>   
    <li>what is the ram average for mobiles which have front camera per price range?</li>
    <li>What is the battery power average & clock speed for mobiles per price range & wifi</li> 
    <li>What is the average ram , battery power & number of sales for mobile support 4G</li>
    <li>What is average of px_height & px_width of mobiles which has front camera</li>
    <li>How many sales of mobile which has touch screen & wifi</li>
</ol>