
### Set up the environment

You need to install Python, NumPy, Pandas, Matplotlib and Seaborn.

In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
#import seaborn as sns

### Question 1

What's the version of Pandas that you installed?

You can get the version information using the `__version__` field:

```python
pd.__version__
```

In [3]:
pd.__version__

'2.2.2'

### Getting the data

For this homework, we'll use the Laptops Price dataset. Download it from
[here](https://raw.githubusercontent.com/alexeygrigorev/datasets/master/housing.csv).

In [4]:
!wget https://raw.githubusercontent.com/alexeygrigorev/datasets/master/laptops.csv

--2024-10-01 19:47:36--  https://raw.githubusercontent.com/alexeygrigorev/datasets/master/laptops.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.110.133, 185.199.109.133, 185.199.108.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.110.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 298573 (292K) [text/plain]
Saving to: 'laptops.csv.1'

     0K .......... .......... .......... .......... .......... 17%  283K 1s
    50K .......... .......... .......... .......... .......... 34%  491K 1s
   100K .......... .......... .......... .......... .......... 51%  506K 0s
   150K .......... .......... .......... .......... .......... 68%  195K 0s
   200K .......... .......... .......... .......... .......... 85%  802K 0s
   250K .......... .......... .......... .......... .         100%  441K=0.8s

2024-10-01 19:47:38 (369 KB/s) - 'laptops.csv.1' saved [298573/298573]



### Question 2
###  Records count

How many records are in the dataset?

- 12
- 1000
- 2160
- 12160

In [7]:
# Get the number of records in the dataset
df_laptops = pd.read_csv('laptops.csv')
df_laptops.shape[0]

2160

In [11]:
df_laptops.head()

Unnamed: 0,Laptop,Status,Brand,Model,CPU,RAM,Storage,Storage type,GPU,Screen,Touch,Final Price
0,ASUS ExpertBook B1 B1502CBA-EJ0436X Intel Core...,New,Asus,ExpertBook,Intel Core i5,8,512,SSD,,15.6,No,1009.0
1,Alurin Go Start Intel Celeron N4020/8GB/256GB ...,New,Alurin,Go,Intel Celeron,8,256,SSD,,15.6,No,299.0
2,ASUS ExpertBook B1 B1502CBA-EJ0424X Intel Core...,New,Asus,ExpertBook,Intel Core i3,8,256,SSD,,15.6,No,789.0
3,MSI Katana GF66 12UC-082XES Intel Core i7-1270...,New,MSI,Katana,Intel Core i7,16,1000,SSD,RTX 3050,15.6,No,1199.0
4,HP 15S-FQ5085NS Intel Core i5-1235U/16GB/512GB...,New,HP,15S,Intel Core i5,16,512,SSD,,15.6,No,669.01


### Q3. Laptop brands

How many laptop brands are presented in the dataset?
- 12
- 27
- 28
- 2160

In [12]:
#  Count the number of unique laptop brands
laptop_brands = df_laptops['Brand'].nunique()
print(laptop_brands)

27


### Question 4
### Missing values

How many columns in the dataset have missing values?

- 0
- 1
- 2
- 3

In [13]:
# Check for columns with missing values
missing_columns = df_laptops.isnull().sum()

In [14]:
# Count how many columns have missing values
missing_columns_count = (missing_columns > 0).sum()
print(missing_columns_count)

3


### Question 5

Maximum final price
What's the maximum final price of Dell notebooks in the dataset?

- 869
- 3691
- 3849
- 3936

In [18]:
# Filter rows where the manufacturer is 'Dell' and find the maximum final price
dell_laptops = df_laptops[df_laptops['Brand'] == 'Dell']
max_dell_price = dell_laptops['Final Price'].max()

print(max_dell_price)

3936.0


### Question 6. Median value of Screen

1. Find the median value of `Screen` column in the dataset.
2. Next, calculate the most frequent value of the same `Screen` column.
3. Use `fillna` method to fill the missing values in `Screen` column with the most frequent value from the previous step.
4. Now, calculate the median value of `Screen` once again.

Has it changed?
'Hint: refer to existing `mode` and `median` functions to complete the task.'

Yes                
No

In [19]:
# 1. Find the median value of the 'Screen' column
screen_median_before = df_laptops['Screen'].median()

In [20]:
# 2. Calculate the most frequent value (mode) of the 'Screen' column
screen_mode = df_laptops['Screen'].mode()[0]

In [21]:
# 3. Use 'fillna' to fill missing values in the 'Screen' column with the most frequent value
df_laptops['Screen'].fillna(screen_mode, inplace=True)

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df_laptops['Screen'].fillna(screen_mode, inplace=True)


In [22]:
# 4. Calculate the median value of 'Screen' again after filling missing values
screen_median_after = df_laptops['Screen'].median()

In [24]:
# Check if the median has changed
screen_median_before, screen_median_after, screen_median_before != screen_median_after

(15.6, 15.6, False)

### Question 7. Sum of weights

1. Select all the "Innjoo" laptops from the dataset.
2. Select only columns `RAM`, `Storage`, `Screen`.
3. Get the underlying NumPy array. Let's call it `X`.
4. Compute matrix-matrix multiplication between the transpose of `X` and `X`. To get the transpose, use `X.T`. Let's call the result `XTX`.
5. Compute the inverse of `XTX`.
6. Create an array `y` with values `[1100, 1300, 800, 900, 1000, 1100]`.
7. Multiply the inverse of `XTX` with the transpose of `X`, and then multiply the result by `y`. Call the result `w`.
8. What's the sum of all the elements of the result?

> Note: You just implemented linear regression. We'll talk about it in the next lesson.

- 0.43
- 45.29
- 45.58
- 91.30

In [26]:
# Step 1: Select all "Innjoo" laptops
innjoo_laptops = df_laptops[df_laptops['Brand'] == 'Innjoo']

In [27]:
# Step 2: Select only columns 'RAM', 'Storage', 'Screen'
innjoo_laptops_selected = innjoo_laptops[['RAM', 'Storage', 'Screen']]

In [28]:
# Step 3: Get the underlying NumPy array
X = innjoo_laptops_selected.to_numpy()

In [29]:
# Step 4: Compute matrix-matrix multiplication X.T @ X
XTX = X.T @ X

In [30]:
# Step 5: Compute the inverse of XTX
XTX_inv = np.linalg.inv(XTX)

In [31]:
# Step 6: Create an array y
y = np.array([1100, 1300, 800, 900, 1000, 1100])

In [32]:
# Step 7: Multiply the inverse of XTX with the transpose of X, and then multiply by y
w = XTX_inv @ X.T @ y

In [33]:
# Step 8: Calculate the sum of all elements of the result
result_sum = np.sum(w)
result_sum

91.29988062995588