## Data Description
The data contains the different attributes of used/refurbished phones and tablets. The detailed data dictionary is given below.

**Data Dictionary**

- brand_name: Name of manufacturing brand
- os: OS on which the device runs
- screen_size: Size of the screen in cm
- 4g: Whether 4G is available or not
- 5g: Whether 5G is available or not
- main_camera_mp: Resolution of the rear camera in megapixels
- selfie_camera_mp: Resolution of the front camera in megapixels
- int_memory: Amount of internal memory (ROM) in GB
- ram: Amount of RAM in GB
- battery: Energy capacity of the device battery in mAh
- weight: Weight of the device in grams
- release_year: Year when the device model was released
- days_used: Number of days the used/refurbished device has been used
- new_price: Price of a new device of the same model in euros
- used_price: Price of the used/refurbished device in euros

### Import Libraries 

In [1]:
## for data preprocessing
import numpy as np
import pandas as pd

## for data visualization
import matplotlib.pyplot as plt
import seaborn as sns


In [4]:
import os
os.getcwd()

'/home/moro/Documents/ml_weekendz_projects/ml_price_prediction_project'

### Load Data

In [5]:
## read dataset
data = pd.read_csv('dataset/used_device_data.csv')

In [6]:
## create a copy of the dataset
df = data.copy()

In [7]:
## lets preview the the dataset
df.head()

Unnamed: 0,brand_name,os,screen_size,4g,5g,main_camera_mp,selfie_camera_mp,int_memory,ram,battery,weight,release_year,days_used,new_price,used_price
0,Honor,Android,14.5,yes,no,13.0,5.0,64.0,3.0,3020.0,146.0,2020,127,111.62,74.26
1,Honor,Android,17.3,yes,yes,13.0,16.0,128.0,8.0,4300.0,213.0,2020,325,249.39,174.53
2,Honor,Android,16.69,yes,yes,13.0,8.0,128.0,8.0,4200.0,213.0,2020,162,359.47,165.85
3,Honor,Android,25.5,yes,yes,13.0,8.0,64.0,6.0,7250.0,480.0,2020,345,278.93,169.93
4,Honor,Android,15.32,yes,no,13.0,8.0,64.0,3.0,5000.0,185.0,2020,293,140.87,80.64


In [10]:
## lets check the number of rows and columns in the data
print(f"The number of rows: {df.shape[0]} -> The number of columns: {df.shape[1]}")

The number of rows: 3454 -> The number of columns: 15


In [11]:
## lets veiw a random sample of the data
df.sample(n=10, random_state=11)

Unnamed: 0,brand_name,os,screen_size,4g,5g,main_camera_mp,selfie_camera_mp,int_memory,ram,battery,weight,release_year,days_used,new_price,used_price
3426,Samsung,Android,15.42,yes,no,8.0,10.0,256.0,8.0,3300.0,183.0,2020,355,918.0,237.72
1907,Micromax,Android,12.7,no,no,16.0,8.0,32.0,4.0,2350.0,154.0,2014,888,239.57,90.89
3142,ZTE,Android,10.29,yes,no,8.0,5.0,32.0,4.0,2200.0,154.0,2015,711,68.59,57.62
2484,Samsung,Android,12.83,yes,no,13.0,5.0,16.0,4.0,3000.0,170.0,2017,706,180.62,79.71
2606,Samsung,Android,7.75,no,no,3.0,1.3,16.0,4.0,1300.0,118.2,2013,600,72.31,21.64
3006,XOLO,Android,12.7,no,no,5.0,0.3,16.0,4.0,2500.0,,2015,679,77.26,46.39
2696,Sony,Android,12.7,yes,no,13.0,8.0,16.0,4.0,2300.0,137.4,2016,555,200.55,94.0
14,Honor,Android,14.5,yes,no,13.0,5.0,32.0,2.0,3020.0,146.0,2019,230,88.9,61.03
984,Coolpad,Android,12.83,yes,no,13.0,5.0,64.0,4.0,2800.0,170.0,2016,703,380.6,135.81
2832,Vivo,Android,16.74,yes,no,13.0,16.0,128.0,4.0,4000.0,199.0,2018,583,698.66,125.31


In [12]:
## lets check the column names and data types
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3454 entries, 0 to 3453
Data columns (total 15 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   brand_name        3454 non-null   object 
 1   os                3454 non-null   object 
 2   screen_size       3454 non-null   float64
 3   4g                3454 non-null   object 
 4   5g                3454 non-null   object 
 5   main_camera_mp    3275 non-null   float64
 6   selfie_camera_mp  3452 non-null   float64
 7   int_memory        3450 non-null   float64
 8   ram               3450 non-null   float64
 9   battery           3448 non-null   float64
 10  weight            3447 non-null   float64
 11  release_year      3454 non-null   int64  
 12  days_used         3454 non-null   int64  
 13  new_price         3454 non-null   float64
 14  used_price        3454 non-null   float64
dtypes: float64(9), int64(2), object(4)
memory usage: 404.9+ KB


In [14]:
## lets check for the number of duplicated values
df.duplicated().sum()

0

In [15]:
## lets check for missing values in our dataset
df.isnull().sum()

brand_name            0
os                    0
screen_size           0
4g                    0
5g                    0
main_camera_mp      179
selfie_camera_mp      2
int_memory            4
ram                   4
battery               6
weight                7
release_year          0
days_used             0
new_price             0
used_price            0
dtype: int64

### Perform Exploratory Data Analysis