## Homework: 01

### Set up the environment

You need to install Python, NumPy, Pandas, Matplotlib and Seaborn. For that, you can the instructions from
[06-environment.md](../../../01-intro/06-environment.md).

In [1]:
# Import required modules.
import numpy as np
import pandas as pd

### Q1. Pandas version
What's the version of Pandas that you installed?

You can get the version information using the __version__ field:

In [2]:
pd.__version__

'2.2.3'

### Question 01 Answer: `'2.2.3'`

### Getting the data 

For this homework, we'll use the Laptops Price dataset. Download it from 
[here](https://raw.githubusercontent.com/alexeygrigorev/datasets/master/laptops.csv).

You can do it with wget:

```bash
wget https://raw.githubusercontent.com/alexeygrigorev/datasets/master/laptops.csv

In [3]:
# Define the data source and download locations.
PREFIX = 'https://raw.githubusercontent.com/alexeygrigorev/datasets/master/laptops.csv'
POSTFIX = '../data/laptops.csv'

In [4]:
# Download the data.
!wget -O $POSTFIX $PREFIX

--2024-11-30 22:26:52--  https://raw.githubusercontent.com/alexeygrigorev/datasets/master/laptops.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.111.133, 185.199.108.133, 185.199.109.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.111.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 298573 (292K) [text/plain]
Saving to: ‘../data/laptops.csv’


2024-11-30 22:26:52 (86.4 MB/s) - ‘../data/laptops.csv’ saved [298573/298573]



In [5]:
df = pd.read_csv(POSTFIX)

In [6]:
# Show the head of the data.
df.head()

Unnamed: 0,Laptop,Status,Brand,Model,CPU,RAM,Storage,Storage type,GPU,Screen,Touch,Final Price
0,ASUS ExpertBook B1 B1502CBA-EJ0436X Intel Core...,New,Asus,ExpertBook,Intel Core i5,8,512,SSD,,15.6,No,1009.0
1,Alurin Go Start Intel Celeron N4020/8GB/256GB ...,New,Alurin,Go,Intel Celeron,8,256,SSD,,15.6,No,299.0
2,ASUS ExpertBook B1 B1502CBA-EJ0424X Intel Core...,New,Asus,ExpertBook,Intel Core i3,8,256,SSD,,15.6,No,789.0
3,MSI Katana GF66 12UC-082XES Intel Core i7-1270...,New,MSI,Katana,Intel Core i7,16,1000,SSD,RTX 3050,15.6,No,1199.0
4,HP 15S-FQ5085NS Intel Core i5-1235U/16GB/512GB...,New,HP,15S,Intel Core i5,16,512,SSD,,15.6,No,669.01


### Q2. Records count

How many records are in the dataset?

- 12
- 1000
- 2160
- 12160

In [7]:
# Show the shape of the data.
df.shape

(2160, 12)

#### Question 02 Answer: `2160`

### Q3. Laptop brands

How many laptop brands are presented in the dataset?

- 12
- 27
- 28
- 2160

In [8]:
# Show number of brands.
df.Brand.nunique()

27

#### Question 03 Answer: `27`

### Q4. Missing values

How many columns in the dataset have missing values?

- 0
- 1
- 2
- 3

In [9]:
# Check for missing values.
miss_list = list(df.isna().sum())
miss_com = [1 if x > 0 else 0 for x in miss_list]
sum(miss_com)

3

#### Question 04 Answer: `3`

### Q5. Maximum final price

What's the maximum final price of Dell notebooks in the dataset?

- 869
- 3691
- 3849
- 3936

In [10]:
# Get maximum final price.
df[df.Brand=='Dell']['Final Price'].max()

3936.0

### Question 05 Answer: `3936`

### Q6. Median value of Screen

1. Find the median value of `Screen` column in the dataset.
2. Next, calculate the most frequent value of the same `Screen` column.
3. Use `fillna` method to fill the missing values in `Screen` column with the most frequent value from the previous step.
4. Now, calculate the median value of `Screen` once again.

Has it changed?

> Hint: refer to existing `mode` and `median` functions to complete the task.

- Yes
- No

In [11]:
# Get the median of the Screen column.
screen_med1 = df.Screen.median()
screen_med1

15.6

In [12]:
# Get the mode of the Screen column.
screen_mod = df.Screen.mode()
screen_mod

0    15.6
Name: Screen, dtype: float64

In [13]:
# Fill the missing values in the Screen column.
df.fillna({'Screen': screen_mod}, inplace=True)

In [14]:
# Get the median of the Screen column again.
screen_med2 = df.Screen.median()
screen_med2

15.6

In [15]:
screen_med1 == screen_med2

True

#### Question 06 Answer: `Yes`

### Q7. Sum of weights

1. Select all the "Innjoo" laptops from the dataset.
2. Select only columns `RAM`, `Storage`, `Screen`.
3. Get the underlying NumPy array. Let's call it `X`.
4. Compute matrix-matrix multiplication between the transpose of `X` and `X`. To get the transpose, use `X.T`. Let's call the result `XTX`.
5. Compute the inverse of `XTX`.
6. Create an array `y` with values `[1100, 1300, 800, 900, 1000, 1100]`.
7. Multiply the inverse of `XTX` with the transpose of `X`, and then multiply the result by `y`. Call the result `w`.
8. What's the sum of all the elements of the result?

> **Note**: You just implemented linear regression. We'll talk about it in the next lesson.

- 0.43
- 45.29
- 45.58
- 91.30

In [16]:
# Select all the "Innjoo" laptops from the dataset.
df_Innjoo = df[df.Brand == "Innjoo"]
df_Innjoo

Unnamed: 0,Laptop,Status,Brand,Model,CPU,RAM,Storage,Storage type,GPU,Screen,Touch,Final Price
1478,InnJoo Voom Excellence Intel Celeron N4020/8GB...,New,Innjoo,Voom,Intel Celeron,8,256,SSD,,15.6,No,311.37
1479,InnJoo Voom Excellence Pro Intel Celeron N4020...,New,Innjoo,Voom,Intel Celeron,8,512,SSD,,15.6,No,392.55
1480,Innjoo Voom Intel Celeron N3350/4GB/64GB eMMC/...,New,Innjoo,Voom,Intel Celeron,4,64,eMMC,,14.1,No,251.4
1481,Innjoo Voom Laptop Max Intel Celeron N3350/6GB...,New,Innjoo,Voom,Intel Celeron,6,64,eMMC,,14.1,No,383.61
1482,Innjoo Voom Laptop Pro Intel Celeron N3350/6GB...,New,Innjoo,Voom,Intel Celeron,6,128,SSD,,14.1,No,317.02
1483,Innjoo Voom Pro Intel Celeron N3350/6GB/128GB ...,New,Innjoo,Voom,Intel Celeron,6,128,eMMC,,14.1,No,431.38


In [17]:
# Select only columns RAM, Storage, Screen from df_Innjoo.
df_select = df_Innjoo[['RAM', 'Storage', 'Screen']]
df_select

Unnamed: 0,RAM,Storage,Screen
1478,8,256,15.6
1479,8,512,15.6
1480,4,64,14.1
1481,6,64,14.1
1482,6,128,14.1
1483,6,128,14.1


In [18]:
# Use df_select to create a matrix.
X = df_select.values
X

array([[  8. , 256. ,  15.6],
       [  8. , 512. ,  15.6],
       [  4. ,  64. ,  14.1],
       [  6. ,  64. ,  14.1],
       [  6. , 128. ,  14.1],
       [  6. , 128. ,  14.1]])

In [19]:
# Compute the transpose of X.
XTX = X.T @ X
XTX

array([[2.52000e+02, 8.32000e+03, 5.59800e+02],
       [8.32000e+03, 3.68640e+05, 1.73952e+04],
       [5.59800e+02, 1.73952e+04, 1.28196e+03]])

In [20]:
# Compute the inverse of X.
XTX_inv = np.linalg.inv(XTX)
XTX_inv

array([[ 2.78025381e-01, -1.51791334e-03, -1.00809855e-01],
       [-1.51791334e-03,  1.58286725e-05,  4.48052175e-04],
       [-1.00809855e-01,  4.48052175e-04,  3.87214888e-02]])

In [21]:
# Create the array y.
y = np.array([1100, 1300, 800, 900, 1000, 1100])

In [22]:
# Compute w
w = (XTX_inv @ X.T) @ y
w

array([45.58076606,  0.42783519, 45.29127938])

In [23]:
# Sum all the elements of w.
w.sum()

91.2998806299555

#### Question 07 Answer: `91.30`