# Using … to predict laptop prices


## Introduction:
In today's modern age, technological devices have become an integral part of society. With ever-evolving computer components, it can be difficult to conduct research on a laptop that fits an individuals specific needs. In our algorithm, we will be providing price estimates based on the hardware specifications provided in our dataset. For instance, we could use variables such as CPU and OS to predict the price of a laptop. Ultimately, our goal is to assist the users and companies by providing a price estimate for their ideal laptop, thus, reducing the time needed for research.

## Methods:
To conduct our preliminary data analysis, we will visualize the distributions of each variable in the data set. We will use graphs using the altair library to do this.



In [1]:
## For some reason, whenever the jupyter server restarts, it
## sends you back to the old version of altair (4.2.2)

## If the text below says anything below version 5.0.0,
## run the code below
import altair as alt; alt.__version__

'5.1.2'

In [2]:
#pip install -U altair

In [3]:
### Run this cell before continuing.

import altair as alt
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from sklearn import set_config
from sklearn.metrics import confusion_matrix
from sklearn.model_selection import GridSearchCV, cross_validate, train_test_split
from sklearn.neighbors import KNeighborsClassifier, KNeighborsRegressor

# Simplify working with large datasets in Altair
alt.data_transformers.disable_max_rows()

# Output dataframes instead of arrays
set_config(transform_output="pandas")

# Function needed to visualize images
# code below sourced from: https://gist.github.com/daviddalpiaz/ae62ae5ccd0bada4b9acd6dbc9008706
def show_digit(arr784):
    plt.imshow(np.array(arr784)[1:].reshape(28, 28), cmap="gray")
    
np.random.seed(1137110237) #Randomly picked seed

In [4]:
# Loading csv file data as a pandas dataframe
laptop_data = pd.read_csv("https://raw.githubusercontent.com/fyip3/ds_project/main/data/laptopData.csv")

# Cleaning data
laptop_data = laptop_data.drop(columns=["Unnamed: 0", "TypeName"]) # remove redundant column
laptop_data = laptop_data.dropna()                                          # Removing redundant non-numeric part
laptop_data['Ram'] = laptop_data['Ram'].str.extract('(\d+)', expand=False)
laptop_data["Price"] = laptop_data["Price"] * 0.017                         # Convert Price from INR to CAD
laptop_data = laptop_data.rename(columns={"Inches": "ScreenSize_Inches", "Ram": "Memory_GB", "Memory" : "Storage_And_Type", "Weight" : "Weight_Kg", "Price" : "Price_CAD"})


In [5]:
laptop_data

Unnamed: 0,Company,ScreenSize_Inches,ScreenResolution,Cpu,Memory_GB,Storage_And_Type,Gpu,OpSys,Weight_Kg,Price_CAD
0,Apple,13.3,IPS Panel Retina Display 2560x1600,Intel Core i5 2.3GHz,8,128GB SSD,Intel Iris Plus Graphics 640,macOS,1.37kg,1213.437614
1,Apple,13.3,1440x900,Intel Core i5 1.8GHz,8,128GB Flash Storage,Intel HD Graphics 6000,macOS,1.34kg,814.223894
2,HP,15.6,Full HD 1920x1080,Intel Core i5 7200U 2.5GHz,8,256GB SSD,Intel HD Graphics 620,No OS,1.86kg,520.812000
3,Apple,15.4,IPS Panel Retina Display 2880x1800,Intel Core i7 2.7GHz,16,512GB SSD,AMD Radeon Pro 455,macOS,1.83kg,2298.320712
4,Apple,13.3,IPS Panel Retina Display 2560x1600,Intel Core i5 3.1GHz,8,256GB SSD,Intel Iris Plus Graphics 650,macOS,1.37kg,1633.628736
...,...,...,...,...,...,...,...,...,...,...
1298,Lenovo,14,IPS Panel Full HD / Touchscreen 1920x1080,Intel Core i7 6500U 2.5GHz,4,128GB SSD,Intel HD Graphics 520,Windows 10,1.8kg,577.874880
1299,Lenovo,13.3,IPS Panel Quad HD+ / Touchscreen 3200x1800,Intel Core i7 6500U 2.5GHz,16,512GB SSD,Intel HD Graphics 520,Windows 10,1.3kg,1357.734240
1300,Lenovo,14,1366x768,Intel Celeron Dual Core N3050 1.6GHz,2,64GB Flash Storage,Intel HD Graphics,Windows 10,1.5kg,207.419040
1301,HP,15.6,1366x768,Intel Core i7 6500U 2.5GHz,6,1TB HDD,AMD Radeon R5 M330,Windows 10,2.19kg,692.000640


In [6]:
laptop_train, laptop_test = train_test_split(
    laptop_data,
    test_size=.25,
)

In [7]:
laptop_train.head(50)

Unnamed: 0,Company,ScreenSize_Inches,ScreenResolution,Cpu,Memory_GB,Storage_And_Type,Gpu,OpSys,Weight_Kg,Price_CAD
466,Acer,15.6,1366x768,Intel Core i3 6006U 2GHz,4,500GB HDD,Nvidia GeForce GTX 940MX,Windows 10,2.2kg,424.80144
1224,Dell,15.0,Full HD / Touchscreen 1920x1080,Intel Core i3 7100U 2.4GHz,4,500GB HDD,Intel HD Graphics 620,Windows 10,2.08kg,461.03184
240,Lenovo,15.6,1366x768,Intel Core i3 6006U 2GHz,8,128GB SSD,Intel HD Graphics 520,Windows 10,7.2kg,533.49264
757,HP,15.6,Full HD 1920x1080,Intel Core i7 6700HQ 2.6GHz,8,256GB SSD,Nvidia Quadro M1000M,Windows 7,2.59kg,1413.89136
147,Asus,15.6,Full HD 1920x1080,Intel Celeron Dual Core N3350 1.1GHz,4,1TB HDD,Intel HD Graphics 500,Windows 10,2kg,311.58144
950,HP,15.6,IPS Panel Full HD 1920x1080,Intel Core i7 6820HQ 2.7GHz,8,8GB SSD,Nvidia Quadro M1000M,Windows 10,2.0kg,2037.05424
641,HP,17.0,1600x900,AMD A9-Series 9420 3GHz,8,1TB HDD,AMD Radeon R5,Windows 10,2.6kg,471.810384
473,Dell,13.3,Full HD / Touchscreen 1920x1080,Intel Core i5 8250U 1.6GHz,8,256GB SSD,Intel UHD Graphics 620,Windows 10,1.42kg,1668.274056
1074,Lenovo,12.5,IPS Panel Full HD / Touchscreen 1920x1080,Intel Core i7 6500U 2.5GHz,8,256GB SSD,Intel HD Graphics 520,Windows 10,1.3kg,1552.010702
488,Lenovo,12.5,IPS Panel Full HD 1920x1080,Intel Core i7 7500U 2.7GHz,16,512GB SSD,Intel HD Graphics 620,Windows 10,1.36kg,1628.55648


In [8]:
laptop_brand_avg_price = (
    laptop_train.groupby(["Company"])
        .mean(["Price"])
        .reset_index()
        .rename(columns = {"Price" : "Average Price"})
)

laptop_brand_plot = alt.Chart(laptop_brand_avg_price).mark_bar().encode(
    x=alt.X("Company").title("Laptop Brand"),
    y=alt.Y("Average Price")
        .title("Average Price of Laptops") 
)
laptop_brand_plot

ValueError: Unable to determine data type for the field "Average Price"; verify that the field name is not misspelled. If you are referencing a field from a transform, also confirm that the data type is specified correctly.

alt.Chart(...)

## Expected outcomes and significance:
We expect to find the accuracy with which the model can predict the price of a laptop using a selection of its hardware specifications. A potential impact that our findings can have is that companies and laptop sellers would be able to better and more appropriately price their products and cater to consumers. Another impact could be raising consumer awareness, as they would be able to better understand the true worth of the items they are purchasing. A question that could arise is whether the price of a laptop should only depend on its specifications, and whether other potential factors such as brand, location, aesthetic features affect, or should affect, price.
