# Regression Model

## Introduction
In today's modern age, technological devices have become an integral part of society. With ever-evolving computer components, it can be difficult to conduct research on a laptop that fits an individual's specific needs. In our algorithm, we will be providing price estimates based on the hardware specifications provided in our dataset. For instance, we could use variables such as CPU speed and RAM size to predict the price of a laptop. Ultimately, our goal is to assist the users and companies by providing a price estimate for their ideal laptop, thus, reducing the time needed for research. Thus our predictive question would be **"What will be the price of a laptop based on its specifications?"**. The dataset we will be using is an open-source file from Kaggle. Link for the original dataset: https://www.kaggle.com/datasets/ehtishamsadiq/uncleaned-laptop-price-dataset/data'

## Preparing Libraries and Setup

Unfortunately, the python language does not natively support everything we will use to create this model, and it would be inefficient to re-invent the wheel. As such, we will be using the following libraries in our code:
#### **Pandas**
##### We use the DataFrame object from pandas to store and manipulate our data

#### **Altair**
##### Visualization library that allows us to make charts to help see the data

#### **Numpy**
##### Math library. We use it to set seeds and convert number Types. Also used by pandas in many operations

#### **Scikit learn**
##### Machine learning library. This will be the backbone of our regression model. We use it to build and train the model on our data, and to test the results.
 
#### **Matplotlib**
##### (insert something here)



In [1]:
### Uncomment cell below whenever Altair stops working to reinstall latest version

## For some reason, whenever the jupyter server restarts, it
## sends you back to the old version of altair (4.2.2)

In [2]:
# pip install -U altair      #<---- UNCOMMENT

In [3]:
## If the text below says anything below version 5.0.0,
## run the code above
import altair as alt; alt.__version__

'5.2.0'

In [8]:
### Run this cell before continuing.

import altair as alt
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from sklearn import set_config
from sklearn.model_selection import train_test_split


# Simplify working with large datasets in Altair
alt.data_transformers.disable_max_rows()

# Output dataframes instead of arrays
set_config(transform_output="pandas")
    
np.random.seed(1137110237) #Randomly picked seed

### Importing and Wrangling Data

(INSERT DESCRIPTION)
<br> Link for the original dataset: https://www.kaggle.com/datasets/ehtishamsadiq/uncleaned-laptop-price-dataset/data'

In [9]:
# Loading csv file data as a pandas dataframe
laptop_data = pd.read_csv("https://raw.githubusercontent.com/fyip3/ds_project/main/data/laptopData.csv")

# Cleaning data
laptop_data = laptop_data.drop(columns=["Unnamed: 0"])          # Filtering Columns
laptop_data = laptop_data.dropna()                              # Removing redundant non-numeric part
laptop_data['Ram'] = laptop_data['Ram'].str.extract('(\d+)', expand=False)
laptop_data['Weight'] = laptop_data['Weight'].str.removesuffix("kg")
laptop_data['Memory'] = laptop_data['Memory'].str.extract('(\d+)', expand=False)
laptop_data["Price"] = laptop_data["Price"] * 0.017                         # Convert Price from INR to CAD
laptop_data = laptop_data.rename(columns={"Inches": "ScreenSize_Inches", "Ram": "Memory_GB", "Memory" : "Storage", "Weight" : "Weight_Kg", "Price" : "Price_CAD"})
# Convert columns from strings to int/float
laptop_data["Memory_GB"] = pd.to_numeric(laptop_data.Memory_GB, errors='coerce')
laptop_data["Weight_Kg"] = pd.to_numeric(laptop_data.Weight_Kg, errors='coerce')
laptop_data["ScreenSize_Inches"] = pd.to_numeric(laptop_data.ScreenSize_Inches, errors='coerce')
laptop_data["Storage"] = pd.to_numeric(laptop_data.Storage, errors='coerce')
laptop_data.dtypes
count = laptop_data.nunique()  #number of unique datapoints for each varaible
count

Company               19
TypeName               6
ScreenSize_Inches     24
ScreenResolution      40
Cpu                  118
Memory_GB             10
Storage               13
Gpu                  110
OpSys                  9
Weight_Kg            180
Price_CAD            777
dtype: int64

In [10]:
laptop_data

Unnamed: 0,Company,TypeName,ScreenSize_Inches,ScreenResolution,Cpu,Memory_GB,Storage,Gpu,OpSys,Weight_Kg,Price_CAD
0,Apple,Ultrabook,13.3,IPS Panel Retina Display 2560x1600,Intel Core i5 2.3GHz,8,128.0,Intel Iris Plus Graphics 640,macOS,1.37,1213.437614
1,Apple,Ultrabook,13.3,1440x900,Intel Core i5 1.8GHz,8,128.0,Intel HD Graphics 6000,macOS,1.34,814.223894
2,HP,Notebook,15.6,Full HD 1920x1080,Intel Core i5 7200U 2.5GHz,8,256.0,Intel HD Graphics 620,No OS,1.86,520.812000
3,Apple,Ultrabook,15.4,IPS Panel Retina Display 2880x1800,Intel Core i7 2.7GHz,16,512.0,AMD Radeon Pro 455,macOS,1.83,2298.320712
4,Apple,Ultrabook,13.3,IPS Panel Retina Display 2560x1600,Intel Core i5 3.1GHz,8,256.0,Intel Iris Plus Graphics 650,macOS,1.37,1633.628736
...,...,...,...,...,...,...,...,...,...,...,...
1298,Lenovo,2 in 1 Convertible,14.0,IPS Panel Full HD / Touchscreen 1920x1080,Intel Core i7 6500U 2.5GHz,4,128.0,Intel HD Graphics 520,Windows 10,1.80,577.874880
1299,Lenovo,2 in 1 Convertible,13.3,IPS Panel Quad HD+ / Touchscreen 3200x1800,Intel Core i7 6500U 2.5GHz,16,512.0,Intel HD Graphics 520,Windows 10,1.30,1357.734240
1300,Lenovo,Notebook,14.0,1366x768,Intel Celeron Dual Core N3050 1.6GHz,2,64.0,Intel HD Graphics,Windows 10,1.50,207.419040
1301,HP,Notebook,15.6,1366x768,Intel Core i7 6500U 2.5GHz,6,1.0,AMD Radeon R5 M330,Windows 10,2.19,692.000640


### Separation Into Train and Test Data

We separate the data into train data (which will be used to develop the model), and test data (which will be used to gauge the accuracy of the model). The test data will not be seen by the model until it comes the time to test how well it performs.

In [11]:
laptop_train, laptop_test = train_test_split(
    laptop_data,
    test_size=.25,   # Test data will be a quarter of the full data set, train the rest
)

In [13]:
laptop_train.head(5)

Unnamed: 0,Company,TypeName,ScreenSize_Inches,ScreenResolution,Cpu,Memory_GB,Storage,Gpu,OpSys,Weight_Kg,Price_CAD
466,Acer,Notebook,15.6,1366x768,Intel Core i3 6006U 2GHz,4,500.0,Nvidia GeForce GTX 940MX,Windows 10,2.2,424.80144
1224,Dell,2 in 1 Convertible,15.0,Full HD / Touchscreen 1920x1080,Intel Core i3 7100U 2.4GHz,4,500.0,Intel HD Graphics 620,Windows 10,2.08,461.03184
240,Lenovo,Notebook,15.6,1366x768,Intel Core i3 6006U 2GHz,8,128.0,Intel HD Graphics 520,Windows 10,7.2,533.49264
757,HP,Workstation,15.6,Full HD 1920x1080,Intel Core i7 6700HQ 2.6GHz,8,256.0,Nvidia Quadro M1000M,Windows 7,2.59,1413.89136
147,Asus,Notebook,15.6,Full HD 1920x1080,Intel Celeron Dual Core N3350 1.1GHz,4,1.0,Intel HD Graphics 500,Windows 10,2.0,311.58144


## Building the Model

(DESCRIPTION)

## Putting the Model to the Test

(DESCRIPTION)