# Using Specs to Predict Laptop Prices

## Introduction
In today's modern age, technological devices have become an integral part of society. With ever-evolving computer components, it can be difficult to conduct research on a laptop that fits an individuals specific needs. In our algorithm, we will be providing price estimates based on the hardware specifications provided in our dataset. For instance, we could use variables such as CPU speed and Ram size to predict the price of a laptop. Ultimately, our goal is to assist the users and companies by providing a price estimate for their ideal laptop, thus, reducing the time needed for research. Thus our predictive question would be **"what will be the price of a laptop based on its specifications?"**. The dataset we will be using is an opensource file from kaggle. Link for the original dataset: https://www.kaggle.com/datasets/ehtishamsadiq/uncleaned-laptop-price-dataset/data

## Methods
To conduct our preliminary data analysis,we will first clean the dataset and wrangle its columns into a usable dataframe. We will then visualize the distributions of each variable in the data set and determine how to structure our predictive model. We will use graphs using the altair library to do this.

In [1]:
### Uncomment cell below whenever Altair stops working to reinstall latest version

## For some reason, whenever the jupyter server restarts, it
## sends you back to the old version of altair (4.2.2)

In [2]:
#pip install -U altair

In [3]:
## If the text below says anything below version 5.0.0,
## run the code above
import altair as alt; alt.__version__

'5.1.2'

In [4]:
### Run this cell before continuing.

import altair as alt
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from sklearn import set_config
from sklearn.model_selection import train_test_split


# Simplify working with large datasets in Altair
alt.data_transformers.disable_max_rows()

# Output dataframes instead of arrays
set_config(transform_output="pandas")
    
np.random.seed(1137110237) #Randomly picked seed

In [5]:
# Loading csv file data as a pandas dataframe
laptop_data = pd.read_csv("https://raw.githubusercontent.com/fyip3/ds_project/main/data/laptopData.csv")

# Cleaning data
laptop_data = laptop_data.drop(columns=["Unnamed: 0"])          # Filtering Columns
laptop_data = laptop_data.dropna()                              # Removing redundant non-numeric part
laptop_data['Ram'] = laptop_data['Ram'].str.extract('(\d+)', expand=False)
laptop_data['Weight'] = laptop_data['Weight'].str.removesuffix("kg")
laptop_data['Memory'] = laptop_data['Memory'].str.extract('(\d+)', expand=False)
laptop_data["Price"] = laptop_data["Price"] * 0.017                         # Convert Price from INR to CAD
laptop_data = laptop_data.rename(columns={"Inches": "ScreenSize_Inches", "Ram": "Memory_GB", "Memory" : "Storage", "Weight" : "Weight_Kg", "Price" : "Price_CAD"})
# Convert columns from strings to int/float
laptop_data["Memory_GB"] = pd.to_numeric(laptop_data.Memory_GB, errors='coerce')
laptop_data["Weight_Kg"] = pd.to_numeric(laptop_data.Weight_Kg, errors='coerce')
laptop_data["ScreenSize_Inches"] = pd.to_numeric(laptop_data.ScreenSize_Inches, errors='coerce')
laptop_data["Storage"] = pd.to_numeric(laptop_data.Storage, errors='coerce')
laptop_data.dtypes
count = laptop_data.nunique()
count

Company               19
TypeName               6
ScreenSize_Inches     24
ScreenResolution      40
Cpu                  118
Memory_GB             10
Storage               13
Gpu                  110
OpSys                  9
Weight_Kg            180
Price_CAD            777
dtype: int64

In [6]:
laptop_data

Unnamed: 0,Company,TypeName,ScreenSize_Inches,ScreenResolution,Cpu,Memory_GB,Storage,Gpu,OpSys,Weight_Kg,Price_CAD
0,Apple,Ultrabook,13.3,IPS Panel Retina Display 2560x1600,Intel Core i5 2.3GHz,8,128.0,Intel Iris Plus Graphics 640,macOS,1.37,1213.437614
1,Apple,Ultrabook,13.3,1440x900,Intel Core i5 1.8GHz,8,128.0,Intel HD Graphics 6000,macOS,1.34,814.223894
2,HP,Notebook,15.6,Full HD 1920x1080,Intel Core i5 7200U 2.5GHz,8,256.0,Intel HD Graphics 620,No OS,1.86,520.812000
3,Apple,Ultrabook,15.4,IPS Panel Retina Display 2880x1800,Intel Core i7 2.7GHz,16,512.0,AMD Radeon Pro 455,macOS,1.83,2298.320712
4,Apple,Ultrabook,13.3,IPS Panel Retina Display 2560x1600,Intel Core i5 3.1GHz,8,256.0,Intel Iris Plus Graphics 650,macOS,1.37,1633.628736
...,...,...,...,...,...,...,...,...,...,...,...
1298,Lenovo,2 in 1 Convertible,14.0,IPS Panel Full HD / Touchscreen 1920x1080,Intel Core i7 6500U 2.5GHz,4,128.0,Intel HD Graphics 520,Windows 10,1.80,577.874880
1299,Lenovo,2 in 1 Convertible,13.3,IPS Panel Quad HD+ / Touchscreen 3200x1800,Intel Core i7 6500U 2.5GHz,16,512.0,Intel HD Graphics 520,Windows 10,1.30,1357.734240
1300,Lenovo,Notebook,14.0,1366x768,Intel Celeron Dual Core N3050 1.6GHz,2,64.0,Intel HD Graphics,Windows 10,1.50,207.419040
1301,HP,Notebook,15.6,1366x768,Intel Core i7 6500U 2.5GHz,6,1.0,AMD Radeon R5 M330,Windows 10,2.19,692.000640


In [7]:
laptop_train, laptop_test = train_test_split(
    laptop_data,
    test_size=.25,
)

In [8]:
laptop_train.head(10)

Unnamed: 0,Company,TypeName,ScreenSize_Inches,ScreenResolution,Cpu,Memory_GB,Storage,Gpu,OpSys,Weight_Kg,Price_CAD
466,Acer,Notebook,15.6,1366x768,Intel Core i3 6006U 2GHz,4,500.0,Nvidia GeForce GTX 940MX,Windows 10,2.2,424.80144
1224,Dell,2 in 1 Convertible,15.0,Full HD / Touchscreen 1920x1080,Intel Core i3 7100U 2.4GHz,4,500.0,Intel HD Graphics 620,Windows 10,2.08,461.03184
240,Lenovo,Notebook,15.6,1366x768,Intel Core i3 6006U 2GHz,8,128.0,Intel HD Graphics 520,Windows 10,7.2,533.49264
757,HP,Workstation,15.6,Full HD 1920x1080,Intel Core i7 6700HQ 2.6GHz,8,256.0,Nvidia Quadro M1000M,Windows 7,2.59,1413.89136
147,Asus,Notebook,15.6,Full HD 1920x1080,Intel Celeron Dual Core N3350 1.1GHz,4,1.0,Intel HD Graphics 500,Windows 10,2.0,311.58144
950,HP,Workstation,15.6,IPS Panel Full HD 1920x1080,Intel Core i7 6820HQ 2.7GHz,8,8.0,Nvidia Quadro M1000M,Windows 10,2.0,2037.05424
641,HP,Notebook,17.0,1600x900,AMD A9-Series 9420 3GHz,8,1.0,AMD Radeon R5,Windows 10,2.6,471.810384
473,Dell,Ultrabook,13.3,Full HD / Touchscreen 1920x1080,Intel Core i5 8250U 1.6GHz,8,256.0,Intel UHD Graphics 620,Windows 10,1.42,1668.274056
1074,Lenovo,Ultrabook,12.5,IPS Panel Full HD / Touchscreen 1920x1080,Intel Core i7 6500U 2.5GHz,8,256.0,Intel HD Graphics 520,Windows 10,1.3,1552.010702
488,Lenovo,Ultrabook,12.5,IPS Panel Full HD 1920x1080,Intel Core i7 7500U 2.7GHz,16,512.0,Intel HD Graphics 620,Windows 10,1.36,1628.55648


In [9]:
laptop_brand_avg_price = (
    laptop_train.groupby(["Company"])
        .mean(["Price_CAD"])
        .reset_index()
        .rename(columns = {"Price_CAD" : "Average Price"})
)

laptop_brand_plot = alt.Chart(laptop_brand_avg_price).mark_bar().encode(
    x=alt.X("Company")
        .title("Laptop Brand"),
    y=alt.Y("Average Price")
        .title("Average Price of Laptops"),
    color=alt.Color("Company")
            .scale(scheme="category20b")
).configure_axisX(labelAngle=-45)
laptop_brand_plot

In [10]:
laptop_type_avg_price = (
    laptop_train.groupby(["TypeName"])
        .mean(["Price_CAD"])
        .reset_index()
        .rename(columns = {"Price_CAD" : "Average Price"})
)

laptop_type_plot = alt.Chart(laptop_type_avg_price).mark_bar().encode(
    x=alt.X("TypeName")
        .title("Laptop type"),
    y=alt.Y("Average Price")
        .title("Average Price of Laptops"),
    color=alt.Color("TypeName")
            .scale(scheme="category20b")
).configure_axisX(labelAngle=-45)
laptop_type_plot

In [11]:
laptop_screen_size_avg_price = (
    laptop_train.groupby(["ScreenSize_Inches"])
        .mean(["Price_CAD"])
        .reset_index()
        .rename(columns = {"Price_CAD" : "Average Price"})
)

laptop_screen_size_plot = alt.Chart(laptop_screen_size_avg_price).mark_point().encode(
    x=alt.X("ScreenSize_Inches")
        .title("Screen Size in inches").scale(zero=False),
    y=alt.Y("Average Price")
        .title("Average Price of Laptops"),
    color=alt.Color("ScreenSize_Inches")
            .scale(scheme="category20b")
)
laptop_screen_size_plot

In [12]:
laptop_resolution_avg_price = (
    laptop_train.groupby(["ScreenResolution"])
        .mean(["Price_CAD"])
        .reset_index()
        .rename(columns = {"Price_CAD" : "Average Price"})
)

laptop_resolution_plot = alt.Chart(laptop_resolution_avg_price).mark_bar().encode(
    x=alt.X("ScreenResolution")
        .title("Screen Resolution"),
    y=alt.Y("Average Price")
        .title("Average Price of Laptops"),
    color=alt.Color("ScreenResolution")
            .scale(scheme="category20b")
).configure_axisX(labelAngle=-45)
laptop_resolution_plot

In [13]:
laptop_cpu_avg_price = (
    laptop_train.groupby(["Cpu"])
        .mean(["Price_CAD"])
        .reset_index()
        .rename(columns = {"Price_CAD" : "Average Price"})
)

laptop_cpu_plot = alt.Chart(laptop_cpu_avg_price).mark_bar().encode(
    x=alt.X("Cpu")
        .title("Processor"),
    y=alt.Y("Average Price")
        .title("Average Price of Laptops"),
    color=alt.Color("Cpu")
            .scale(scheme="category20b")
).configure_axisX(labelAngle=-45)
laptop_cpu_plot

In [20]:
laptop_ram_avg_price = (
    laptop_train.groupby(["Memory_GB"])
        .mean(["Price_CAD"])
        .reset_index()
        .rename(columns = {"Price_CAD" : "Average Price"})
)

laptop_ram_plot = alt.Chart(laptop_ram_avg_price).mark_line().encode(
    x=alt.X("Memory_GB")
        .title("Installed Memory"),
    y=alt.Y("Average Price")
        .title("Average Price of Laptops"),
    # color=alt.Color("Memory_GB")
    #         .scale(scheme="category20b")
)
laptop_ram_plot

In [22]:
laptop_storage_avg_price = (
    laptop_train.groupby(["Storage"])
        .mean(["Price_CAD"])
        .reset_index()
        .rename(columns = {"Price_CAD" : "Average Price"})
)

laptop_storage_plot = alt.Chart(laptop_storage_avg_price).mark_point().encode(
    x=alt.X("Storage")
        .title("Storage Type"),
    y=alt.Y("Average Price")
        .title("Average Price of Laptops"),
    color=alt.Color("Storage")
            .scale(scheme="category20b")
)
laptop_storage_plot

In [16]:
laptop_gpu_avg_price = (
    laptop_train.groupby(["Gpu"])
        .mean(["Price_CAD"])
        .reset_index()
        .rename(columns = {"Price_CAD" : "Average Price"})
)

laptop_gpu_plot = alt.Chart(laptop_gpu_avg_price).mark_bar().encode(
    x=alt.X("Gpu")
        .title("Graphics Card"),
    y=alt.Y("Average Price")
        .title("Average Price of Laptops"),
    color=alt.Color("Gpu")
            .scale(scheme="category20b")
).configure_axisX(labelAngle=-45)
laptop_gpu_plot

In [17]:
laptop_os_avg_price = (
    laptop_train.groupby(["OpSys"])
        .mean(["Price_CAD"])
        .reset_index()
        .rename(columns = {"Price_CAD" : "Average Price"})
)

laptop_os_plot = alt.Chart(laptop_os_avg_price).mark_bar().encode(
    x=alt.X("OpSys")
        .title("Operating System"),
    y=alt.Y("Average Price")
        .title("Average Price of Laptops"),
    color=alt.Color("OpSys")
            .scale(scheme="category20b")
).configure_axisX(labelAngle=-45)
laptop_os_plot

In [18]:
laptop_screen_size_plot_not_avg = alt.Chart(laptop_data).mark_point(opacity = 0.3).encode(
    x=alt.X("ScreenSize_Inches")
        .title("Screen Size in inches").scale(zero=False),
    y=alt.Y("Price_CAD")
        .title("Price of Laptops"),
   # color=alt.Color("ScreenSize_Inches")
).facet('TypeName')
laptop_screen_size_plot_not_avg

## Expected Outcomes and Significance

We expect to find the accuracy with which the model can predict the price of a laptop using a selection of its hardware specifications. An impact of this model could be as a tool for laptop manufacturers and retail companies to more appropriately price their products and cater to consumer expectations. A major use-case if the model turn out to be accurate, could customers who can use this model to set expectations of how much they would need to spend based on their desired for specifications. A question that could arise is whether the price of a laptop should only depend on its specifications, and whether other potential factors such as brand, location, aesthetic features affect, or should affect, price.