## Homework

### Q1. Pandas version

What's the version of Pandas that you installed?

You can get the version information using the `__version__` field:

```python
pd.__version__
```

In [1]:
# Importing the necessaries libraries
import numpy as np
import pandas as pd

In [61]:
# For changing the representation of Numpy scalars. 
# Otherwise, Numpy will show 'np.float64(3.0)' instead of '3.0'
np.set_printoptions(legacy="1.25")

In [2]:
# Q1 answer
pd.__version__

'2.2.3'

### Getting the data

For this homework, we'll use the Laptops Price dataset. Download it from 
[here](https://raw.githubusercontent.com/alexeygrigorev/datasets/master/laptops.csv).

You can do it with wget:

```bash
wget https://raw.githubusercontent.com/alexeygrigorev/datasets/master/laptops.csv
```

Or just open it with your browser and click "Save as...".

Now read it with Pandas.

In [39]:
# Creating a DataFrame from the dataset
df = pd.read_csv('./data/laptops.csv')

In [4]:
# Checking the df
df.head()

Unnamed: 0,Laptop,Status,Brand,Model,CPU,RAM,Storage,Storage type,GPU,Screen,Touch,Final Price
0,ASUS ExpertBook B1 B1502CBA-EJ0436X Intel Core...,New,Asus,ExpertBook,Intel Core i5,8,512,SSD,,15.6,No,1009.0
1,Alurin Go Start Intel Celeron N4020/8GB/256GB ...,New,Alurin,Go,Intel Celeron,8,256,SSD,,15.6,No,299.0
2,ASUS ExpertBook B1 B1502CBA-EJ0424X Intel Core...,New,Asus,ExpertBook,Intel Core i3,8,256,SSD,,15.6,No,789.0
3,MSI Katana GF66 12UC-082XES Intel Core i7-1270...,New,MSI,Katana,Intel Core i7,16,1000,SSD,RTX 3050,15.6,No,1199.0
4,HP 15S-FQ5085NS Intel Core i5-1235U/16GB/512GB...,New,HP,15S,Intel Core i5,16,512,SSD,,15.6,No,669.01


### Q2. Records count

How many records are in the dataset?

- 12
- 1000
- 2160
- 12160

In [26]:
# Number of records in the dataset
df.shape[0]

2160

### Q3. Laptop brands

How many laptop brands are presented in the dataset?

- 12
- 27
- 28
- 2160

In [32]:
# Number of unique brands in the dataset (excluding NA values)
df['Brand'].nunique()

27

### Q4. Missing values

How many columns in the dataset have missing values?

- 0
- 1
- 2
- 3

In [40]:
# Number of NA values per column
df.isna().sum()

Laptop             0
Status             0
Brand              0
Model              0
CPU                0
RAM                0
Storage            0
Storage type      42
GPU             1371
Screen             4
Touch              0
Final Price        0
dtype: int64

In [62]:
# Number of columns with NA values
df.isna().any(axis=0).sum()

3

### Q5. Maximum final price

What's the maximum final price of Dell notebooks in the dataset?

- 869
- 3691
- 3849
- 3936

In [63]:
df[df['Brand'] == 'Dell']['Final Price'].max()

3936.0

### Q6. Median value of Screen

1. Find the median value of `Screen` column in the dataset.
2. Next, calculate the most frequent value of the same `Screen` column.
3. Use `fillna` method to fill the missing values in `Screen` column with the most frequent value from the previous step.
4. Now, calculate the median value of `Screen` once again.

Has it changed?

> Hint: refer to existing `mode` and `median` functions to complete the task.

- Yes
- No

In [64]:
# Median value of the column 'Screen'
screen_median = df['Screen'].median()
screen_median

15.6

In [70]:
# Most frequent value of the column 'Screen'
screen_frq_val = df['Screen'].mode()[0]
screen_frq_val

15.6

In [72]:
# Filling the missing values with the most frequent value
df_screen_no_na = df['Screen'].fillna(screen_frq_val)
df_screen_no_na

0       15.6
1       15.6
2       15.6
3       15.6
4       15.6
        ... 
2155    17.3
2156    17.3
2157    17.3
2158    13.4
2159    13.4
Name: Screen, Length: 2160, dtype: float64

In [75]:
# Checking that there are no missing values
df_screen_no_na.isna().sum()

0

In [78]:
# Median of the column 'Screen' with no missing values
df_screen_no_na.median()

15.6

In [81]:
# Checking if the median value is the same
df['Screen'].median() == df_screen_no_na.median()

True

### Q7. Sum of weights

1. Select all the "Innjoo" laptops from the dataset.
2. Select only columns `RAM`, `Storage`, `Screen`.
3. Get the underlying NumPy array. Let's call it `X`.
4. Compute matrix-matrix multiplication between the transpose of `X` and `X`. To get the transpose, use `X.T`. Let's call the result `XTX`.
5. Compute the inverse of `XTX`.
6. Create an array `y` with values `[1100, 1300, 800, 900, 1000, 1100]`.
7. Multiply the inverse of `XTX` with the transpose of `X`, and then multiply the result by `y`. Call the result `w`.
8. What's the sum of all the elements of the result?

In [84]:
# Selecting the 'Innjoo' laptops
df_innjoo = df[df['Brand'] == 'Innjoo']
df_innjoo

Unnamed: 0,Laptop,Status,Brand,Model,CPU,RAM,Storage,Storage type,GPU,Screen,Touch,Final Price
1478,InnJoo Voom Excellence Intel Celeron N4020/8GB...,New,Innjoo,Voom,Intel Celeron,8,256,SSD,,15.6,No,311.37
1479,InnJoo Voom Excellence Pro Intel Celeron N4020...,New,Innjoo,Voom,Intel Celeron,8,512,SSD,,15.6,No,392.55
1480,Innjoo Voom Intel Celeron N3350/4GB/64GB eMMC/...,New,Innjoo,Voom,Intel Celeron,4,64,eMMC,,14.1,No,251.4
1481,Innjoo Voom Laptop Max Intel Celeron N3350/6GB...,New,Innjoo,Voom,Intel Celeron,6,64,eMMC,,14.1,No,383.61
1482,Innjoo Voom Laptop Pro Intel Celeron N3350/6GB...,New,Innjoo,Voom,Intel Celeron,6,128,SSD,,14.1,No,317.02
1483,Innjoo Voom Pro Intel Celeron N3350/6GB/128GB ...,New,Innjoo,Voom,Intel Celeron,6,128,eMMC,,14.1,No,431.38


In [85]:
# Selecting the columns 'RAM', 'Storage' and 'Screen'
df_innjoo = df_innjoo[['RAM', 'Storage', 'Screen']]
df_innjoo

Unnamed: 0,RAM,Storage,Screen
1478,8,256,15.6
1479,8,512,15.6
1480,4,64,14.1
1481,6,64,14.1
1482,6,128,14.1
1483,6,128,14.1


In [None]:
# Converting the DataFrame to a Numpy Array
X = df_innjoo.to_numpy()