In [2]:
# Libraries

import numpy as np
import pandas as pd
import matplotlib as plt
import seaborn as sns



### Q1. Pandas version
What's the version of Pandas that you installed?

You can get the version information using the __version__ field:

In [7]:
pd.__version__

'1.4.4'

### Getting the data

In [4]:
data = "https://raw.githubusercontent.com/alexeygrigorev/datasets/master/laptops.csv"
!wget $data 

--2024-09-18 08:30:32--  https://raw.githubusercontent.com/alexeygrigorev/datasets/master/laptops.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.111.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.111.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 298573 (292K) [text/plain]
Saving to: ‘laptops.csv’


2024-09-18 08:30:34 (191 KB/s) - ‘laptops.csv’ saved [298573/298573]



In [5]:
df = pd.read_csv('laptops.csv')

In [6]:
df.head()

Unnamed: 0,Laptop,Status,Brand,Model,CPU,RAM,Storage,Storage type,GPU,Screen,Touch,Final Price
0,ASUS ExpertBook B1 B1502CBA-EJ0436X Intel Core...,New,Asus,ExpertBook,Intel Core i5,8,512,SSD,,15.6,No,1009.0
1,Alurin Go Start Intel Celeron N4020/8GB/256GB ...,New,Alurin,Go,Intel Celeron,8,256,SSD,,15.6,No,299.0
2,ASUS ExpertBook B1 B1502CBA-EJ0424X Intel Core...,New,Asus,ExpertBook,Intel Core i3,8,256,SSD,,15.6,No,789.0
3,MSI Katana GF66 12UC-082XES Intel Core i7-1270...,New,MSI,Katana,Intel Core i7,16,1000,SSD,RTX 3050,15.6,No,1199.0
4,HP 15S-FQ5085NS Intel Core i5-1235U/16GB/512GB...,New,HP,15S,Intel Core i5,16,512,SSD,,15.6,No,669.01


### Q2. Records count

How many records are in the dataset?

In [8]:
df.shape

(2160, 12)

2160 records

### Q3. Laptop brands
How many laptop brands are presented in the dataset?

In [16]:
len(df['Brand'].value_counts().index)

27

27 laptop brands

### Q4. Missing values
How many columns in the dataset have missing values?

- 0
- 1
- 2
- 3

In [18]:
df.isnull().sum()

Laptop             0
Status             0
Brand              0
Model              0
CPU                0
RAM                0
Storage            0
Storage type      42
GPU             1371
Screen             4
Touch              0
Final Price        0
dtype: int64

Three (3) columns (Storage type, GPU, and Screen) have missing values.

### Q5. Maximum final price

What's the maximum final price of Dell notebooks in the dataset?

- 869
- 3691
- 3849
- 3936

In [31]:
df_dell = df[df['Brand'] == 'Dell']
df_dell.sort_values('Final Price', ascending=False).head()

Unnamed: 0,Laptop,Status,Brand,Model,CPU,RAM,Storage,Storage type,GPU,Screen,Touch,Final Price
1335,Dell Precision 5770 Intel Core i7-12700H/16GB/...,New,Dell,Precision,Intel Core i7,16,512,SSD,RTX A2000,17.0,Yes,3936.0
1334,Dell Precision 5470 Intel Core i7-12800H/16GB/...,New,Dell,Precision,Intel Core i7,16,512,SSD,RTX A1000,14.0,No,3469.0
1346,Dell XPS 15 9520 Intel Core i7-12700H/16GB RAM...,New,Dell,XPS,Intel Core i7,16,1000,SSD,RTX 3050,15.6,No,3012.09
1347,Dell XPS 15 9520 Intel Core i7-12700H/32GB/1TB...,Refurbished,Dell,XPS,Intel Core i7,32,1000,SSD,RTX 3050,15.6,Yes,2818.09
1323,Dell Latitude 5520 Intel Core i5-1135G7/8GB/25...,New,Dell,Latitude,Intel Core i5,8,256,SSD,,15.6,No,2450.71


The maximum final price of Dell notebooks in the dataset is 3936

### Q6. Median value of Screen

1. Find the median value of Screen column in the dataset.
2. Next, calculate the most frequent value of the same Screen column.
3. Use fillna method to fill the missing values in Screen column with the most frequent value from the previous step.
4. Now, calculate the median value of Screen once again.


Has it changed?

Hint: refer to existing mode and median functions to complete the task.

- Yes
- No

In [33]:
# Median value of the Screen column
screen_median = df['Screen'].median()
screen_median

15.6

In [54]:
# Most frequent value of the screen column
screen_mode = df['Screen'].mode()[0]
screen_mode

15.6

In [55]:
# Fill the missing value in screen column with most frequent value
df['Screen'].fillna(screen_mode, inplace=True)

In [57]:
df['Screen'].median()

15.6

No. The median value has not changed.

### Q7. Sum of weights

1. Select all the "Innjoo" laptops from the dataset.
2. Select only columns RAM, Storage, Screen.
3. Get the underlying NumPy array. Let's call it X.
4. Compute matrix-matrix multiplication between the transpose of X and X. To get the transpose, use X.T. Let's call the result XTX.
5. Compute the inverse of XTX.
6. Create an array y with values [1100, 1300, 800, 900, 1000, 1100].
7. Multiply the inverse of XTX with the transpose of X, and then multiply the result by y. Call the result w.
8. What's the sum of all the elements of the result?

Note: You just implemented linear regression. We'll talk about it in the next lesson.

- 0.43
- 45.29
- 45.58
- 91.30


In [61]:
# Creating injoo dataframe
df_innjoo = df[df['Brand'] == 'Innjoo']

In [87]:
# subsetting RAM, Storage, Screen columns
df_subset = df_innjoo[['RAM', 'Storage', 'Screen']]

In [91]:
# Numpy array of df_subset
X = df_subset.values
X

array([[  8. , 256. ,  15.6],
       [  8. , 512. ,  15.6],
       [  4. ,  64. ,  14.1],
       [  6. ,  64. ,  14.1],
       [  6. , 128. ,  14.1],
       [  6. , 128. ,  14.1]])

In [92]:
# X Transpose
XT = X.T
XT

array([[  8. ,   8. ,   4. ,   6. ,   6. ,   6. ],
       [256. , 512. ,  64. ,  64. , 128. , 128. ],
       [ 15.6,  15.6,  14.1,  14.1,  14.1,  14.1]])

In [93]:
XTX = XT.dot(X)
XTX



array([[2.52000e+02, 8.32000e+03, 5.59800e+02],
       [8.32000e+03, 3.68640e+05, 1.73952e+04],
       [5.59800e+02, 1.73952e+04, 1.28196e+03]])

In [94]:
# Inverse of XTX
XTX_inv = np.linalg.inv(XTX)
XTX_inv

array([[ 2.78025381e-01, -1.51791334e-03, -1.00809855e-01],
       [-1.51791334e-03,  1.58286725e-05,  4.48052175e-04],
       [-1.00809855e-01,  4.48052175e-04,  3.87214888e-02]])

In [95]:
# Hypothetical y
y = np.array([1100, 1300, 800, 900, 1000, 1100])
y

array([1100, 1300,  800,  900, 1000, 1100])

In [96]:
# XTX multiply by transpose of X
w = XTX_inv.dot(XT).dot(y)
w

array([45.58076606,  0.42783519, 45.29127938])

In [98]:
round(w.sum(), 1)

91.3

The sum of the all the elements of the result is 91.30