## Homework

In [72]:
import pandas as pd
import numpy as np

### Q1. Pandas version

What's the version of Pandas that you installed?

You can get the version information using the `__version__` field:

In [106]:
pd.__version__

'2.2.3'

### Getting the data 

For this homework, we'll use the Laptops Price dataset. Download it from 
[here](https://raw.githubusercontent.com/alexeygrigorev/datasets/master/laptops.csv).

You can do it with wget:

```bash
wget https://raw.githubusercontent.com/alexeygrigorev/datasets/master/laptops.csv
```

Or just open it with your browser and click "Save as...".

Now read it with Pandas.

### Q2. Records count

How many records are in the dataset?

- 12
- 1000
- 2160
- 12160

In [107]:
df = pd.read_csv('laptops.csv')
df.head()

Unnamed: 0,Laptop,Status,Brand,Model,CPU,RAM,Storage,Storage type,GPU,Screen,Touch,Final Price
0,ASUS ExpertBook B1 B1502CBA-EJ0436X Intel Core...,New,Asus,ExpertBook,Intel Core i5,8,512,SSD,,15.6,No,1009.0
1,Alurin Go Start Intel Celeron N4020/8GB/256GB ...,New,Alurin,Go,Intel Celeron,8,256,SSD,,15.6,No,299.0
2,ASUS ExpertBook B1 B1502CBA-EJ0424X Intel Core...,New,Asus,ExpertBook,Intel Core i3,8,256,SSD,,15.6,No,789.0
3,MSI Katana GF66 12UC-082XES Intel Core i7-1270...,New,MSI,Katana,Intel Core i7,16,1000,SSD,RTX 3050,15.6,No,1199.0
4,HP 15S-FQ5085NS Intel Core i5-1235U/16GB/512GB...,New,HP,15S,Intel Core i5,16,512,SSD,,15.6,No,669.01


In [108]:
df.index

RangeIndex(start=0, stop=2160, step=1)

### Q3. Laptop brands

How many laptop brands are presented in the dataset?

- 12
- 27
- 28
- 2160

In [109]:
df["Brand"].unique()

array(['Asus', 'Alurin', 'MSI', 'HP', 'Lenovo', 'Medion', 'Acer', 'Apple',
       'Razer', 'Gigabyte', 'Dell', 'LG', 'Samsung', 'PcCom', 'Microsoft',
       'Primux', 'Prixton', 'Dynabook Toshiba', 'Thomson', 'Denver',
       'Deep Gaming', 'Vant', 'Innjoo', 'Jetwing', 'Millenium', 'Realme',
       'Toshiba'], dtype=object)

In [110]:
df["Brand"].nunique()

27

### Q4. Missing values

How many columns in the dataset have missing values?

- 0
- 1
- 2
- 3

In [111]:
df.isnull().sum()

Laptop             0
Status             0
Brand              0
Model              0
CPU                0
RAM                0
Storage            0
Storage type      42
GPU             1371
Screen             4
Touch              0
Final Price        0
dtype: int64

### Q5. Maximum final price

What's the maximum final price of Dell notebooks in the dataset?

- 869
- 3691
- 3849
- 3936

In [30]:
df[df["Brand"] == "Dell"]["Final Price"].describe()

count      84.000000
mean     1153.839881
std       671.795071
min       379.000000
25%       699.000000
50%      1003.000000
75%      1313.810000
max      3936.000000
Name: Final Price, dtype: float64

### Q6. Median value of Screen

1. Find the median value of `Screen` column in the dataset.
2. Next, calculate the most frequent value of the same `Screen` column.
3. Use `fillna` method to fill the missing values in `Screen` column with the most frequent value from the previous step.
4. Now, calculate the median value of `Screen` once again.

Has it changed?

> Hint: refer to existing `mode` and `median` functions to complete the task.

- Yes
- No


In [101]:
df["Screen"].describe()

count    2156.000000
mean       15.168112
std         1.203329
min        10.100000
25%        14.000000
50%        15.600000
75%        15.600000
max        18.000000
Name: Screen, dtype: float64

In [102]:
df["Screen"].median()

np.float64(15.6)

In [103]:
df.Screen.mode()

0    15.6
Name: Screen, dtype: float64

In [112]:
df['Screen'] = df['Screen'].fillna(df['Screen'].mode()[0])

In [113]:
df.isnull().sum()

Laptop             0
Status             0
Brand              0
Model              0
CPU                0
RAM                0
Storage            0
Storage type      42
GPU             1371
Screen             0
Touch              0
Final Price        0
dtype: int64

In [114]:
df["Screen"].describe()

count    2160.000000
mean       15.168912
std         1.202357
min        10.100000
25%        14.000000
50%        15.600000
75%        15.600000
max        18.000000
Name: Screen, dtype: float64

### Q7. Sum of weights

1. Select all the "Innjoo" laptops from the dataset.
2. Select only columns `RAM`, `Storage`, `Screen`.
3. Get the underlying NumPy array. Let's call it `X`.
4. Compute matrix-matrix multiplication between the transpose of `X` and `X`. To get the transpose, use `X.T`. Let's call the result `XTX`.
5. Compute the inverse of `XTX`.
6. Create an array `y` with values `[1100, 1300, 800, 900, 1000, 1100]`.
7. Multiply the inverse of `XTX` with the transpose of `X`, and then multiply the result by `y`. Call the result `w`.
8. What's the sum of all the elements of the result?

> **Note**: You just implemented linear regression. We'll talk about it in the next lesson.

- 0.43
- 45.29
- 45.58
- 91.30


In [115]:
new_df = df[df["Brand"]== "Innjoo"]

In [126]:
new_df = new_df[["RAM", "Storage", "Screen"]]
new_df

Unnamed: 0,RAM,Storage,Screen
1478,8,256,15.6
1479,8,512,15.6
1480,4,64,14.1
1481,6,64,14.1
1482,6,128,14.1
1483,6,128,14.1


In [127]:
X = np.array(new_df)
X

array([[  8. , 256. ,  15.6],
       [  8. , 512. ,  15.6],
       [  4. ,  64. ,  14.1],
       [  6. ,  64. ,  14.1],
       [  6. , 128. ,  14.1],
       [  6. , 128. ,  14.1]])

In [128]:
transposed = X.T
transposed

array([[  8. ,   8. ,   4. ,   6. ,   6. ,   6. ],
       [256. , 512. ,  64. ,  64. , 128. , 128. ],
       [ 15.6,  15.6,  14.1,  14.1,  14.1,  14.1]])

In [129]:
XTX = transposed.dot(X)
XTX

array([[2.52000e+02, 8.32000e+03, 5.59800e+02],
       [8.32000e+03, 3.68640e+05, 1.73952e+04],
       [5.59800e+02, 1.73952e+04, 1.28196e+03]])

In [130]:
XTX_inv = np.linalg.inv(XTX)
XTX_inv

array([[ 2.78025381e-01, -1.51791334e-03, -1.00809855e-01],
       [-1.51791334e-03,  1.58286725e-05,  4.48052175e-04],
       [-1.00809855e-01,  4.48052175e-04,  3.87214888e-02]])

In [131]:
y = [1100, 1300, 800, 900, 1000, 1100]

In [132]:
w = XTX_inv.dot(transposed)

In [133]:
w

array([[ 0.26298349, -0.12560233, -0.40646389,  0.14958687,  0.05244042,
         0.05244042],
       [-0.00110155,  0.00295059,  0.00125892, -0.00177691, -0.00076387,
        -0.00076387],
       [-0.08772226,  0.0269791 ,  0.17140891, -0.0302108 , -0.00153546,
        -0.00153546]])

In [134]:
w = w.dot(y)
w

array([45.58076606,  0.42783519, 45.29127938])

In [135]:
w.sum()

np.float64(91.29988062995753)