In [1]:
import pandas as pd

### Q1. Pandas version

What's the version of Pandas that you installed?

You can get the version information using the `__version__` field:

```python
pd.__version__
```

In [2]:
pd.__version__

'2.2.2'

### Getting the data 

For this homework, we'll use the Laptops Price dataset. Download it from 
[here](https://raw.githubusercontent.com/alexeygrigorev/datasets/master/laptops.csv).

You can do it with wget:

```bash
wget https://raw.githubusercontent.com/alexeygrigorev/datasets/master/laptops.csv
```

Or just open it with your browser and click "Save as...".

Now read it with Pandas.

### Q2. Records count

How many records are in the dataset?

- 12
- 1000
- **2160** (answer)
- 12160


In [4]:
FILEPATH = "laptops.csv"
df_laptop = pd.read_csv(FILEPATH)

In [5]:
print(df_laptop.shape)

(2160, 12)


### Q3. Laptop brands

How many laptop brands are presented in the dataset?

- 12
- **27** (answer)
- 28
- 2160

In [6]:
df_laptop.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2160 entries, 0 to 2159
Data columns (total 12 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   Laptop        2160 non-null   object 
 1   Status        2160 non-null   object 
 2   Brand         2160 non-null   object 
 3   Model         2160 non-null   object 
 4   CPU           2160 non-null   object 
 5   RAM           2160 non-null   int64  
 6   Storage       2160 non-null   int64  
 7   Storage type  2118 non-null   object 
 8   GPU           789 non-null    object 
 9   Screen        2156 non-null   float64
 10  Touch         2160 non-null   object 
 11  Final Price   2160 non-null   float64
dtypes: float64(2), int64(2), object(8)
memory usage: 202.6+ KB


In [8]:
laptop_brands = df_laptop['Brand'].unique()
print(laptop_brands)
print(len(laptop_brands))

['Asus' 'Alurin' 'MSI' 'HP' 'Lenovo' 'Medion' 'Acer' 'Apple' 'Razer'
 'Gigabyte' 'Dell' 'LG' 'Samsung' 'PcCom' 'Microsoft' 'Primux' 'Prixton'
 'Dynabook Toshiba' 'Thomson' 'Denver' 'Deep Gaming' 'Vant' 'Innjoo'
 'Jetwing' 'Millenium' 'Realme' 'Toshiba']
27


### Q4. Missing values

How many columns in the dataset have missing values?

- 0
- 1
- **2** (answer)
- 3

In [9]:
df_laptop.isna().sum() 

Laptop             0
Status             0
Brand              0
Model              0
CPU                0
RAM                0
Storage            0
Storage type      42
GPU             1371
Screen             4
Touch              0
Final Price        0
dtype: int64

### Q5. Maximum final price

What's the maximum final price of Dell notebooks in the dataset?

- 869
- 3691
- 3849
- **3936** (answer)

In [12]:
df_laptop[df_laptop['Brand'] == 'Dell']['Final Price'].max()

np.float64(3936.0)

### Q6. Median value of Screen

1. Find the median value of `Screen` column in the dataset.
2. Next, calculate the most frequent value of the same `Screen` column.
3. Use `fillna` method to fill the missing values in `Screen` column with the most frequent value from the previous step.
4. Now, calculate the median value of `Screen` once again.

Has it changed?

> Hint: refer to existing `mode` and `median` functions to complete the task.

- Yes
- **No** (answer)

In [13]:
prev_median_screen = df_laptop['Screen'].median()
print("prev_median_screen:", prev_median_screen)

mode_screen = df_laptop['Screen'].mode()
df_laptop['Screen'] = df_laptop['Screen'].fillna(mode_screen)
curr_median_screen = df_laptop['Screen'].median()
print("curr_median_screen:", curr_median_screen)


prev_median_screen: 15.6
curr_median_screen: 15.6


### Q7. Sum of weights

1. Select all the "Innjoo" laptops from the dataset.
2. Select only columns `RAM`, `Storage`, `Screen`.
3. Get the underlying NumPy array. Let's call it `X`.
4. Compute matrix-matrix multiplication between the transpose of `X` and `X`. To get the transpose, use `X.T`. Let's call the result `XTX`.
5. Compute the inverse of `XTX`.
6. Create an array `y` with values `[1100, 1300, 800, 900, 1000, 1100]`.
7. Multiply the inverse of `XTX` with the transpose of `X`, and then multiply the result by `y`. Call the result `w`.
8. What's the sum of all the elements of the result?

> **Note**: You just implemented linear regression. We'll talk about it in the next lesson.

- 0.43
- 45.29
- 45.58
- **91.30** (answer)


In [20]:
import numpy as np
from scipy.linalg import inv

df_innjo = df_laptop[df_laptop["Brand"] == "Innjoo"]
print("df_innjo.shape:", df_innjo.shape)

df_innjo = df_innjo[["RAM", "Storage", "Screen"]]

X = np.array(df_innjo)
print("X.shape:", X.shape)
X_T = X.T

XTX = np.matmul(X_T, X)
inv_XTX = inv(XTX)

y = np.array([1100, 1300, 800, 900, 1000, 1100])

w = np.matmul(np.matmul(inv_XTX, X_T), y)

print("np.sum(w):", np.sum(w))

df_innjo.shape: (6, 12)
X.shape: (6, 3)
np.sum(w): 91.29988062995648
