# Homework

## Set up the environment

You need to install Python, NumPy, Pandas, Matplotlib, and Seaborn. For that, you can use the instructions from
[06-environment.md](../../../01-intro/06-environment.md).

## Q1. Pandas version

What version of Pandas did you install?

You can get the version information using the `__version__` field:

```python
pd.__version__

In [93]:
import pandas as pd
import numpy as np

In [4]:
pd.__version__

'2.3.2'

## Getting the data 

For this homework, we'll use the Car Fuel Efficiency dataset. Download it from <a href='https://raw.githubusercontent.com/alexeygrigorev/datasets/master/car_fuel_efficiency.csv'>here</a>.

You can do it with wget:
```bash
wget https://raw.githubusercontent.com/alexeygrigorev/datasets/master/car_fuel_efficiency.csv
```

Or just open it with your browser and click "Save as...".

Now read it with Pandas.

## Q2. Records count

How many records are in the dataset?

- 4704
- 8704
- 9704
- 17704

In [6]:
!wget https://raw.githubusercontent.com/alexeygrigorev/datasets/master/car_fuel_efficiency.csv

--2025-09-29 10:13:46--  https://raw.githubusercontent.com/alexeygrigorev/datasets/master/car_fuel_efficiency.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 2606:50c0:8002::154, 2606:50c0:8003::154, 2606:50c0:8000::154, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|2606:50c0:8002::154|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 874188 (854K) [text/plain]
Saving to: ‘car_fuel_efficiency.csv’


2025-09-29 10:13:47 (3.19 MB/s) - ‘car_fuel_efficiency.csv’ saved [874188/874188]



In [7]:
!ls

Untitled.ipynb          car_fuel_efficiency.csv homework.md


In [64]:
df = pd.read_csv('car_fuel_efficiency.csv')

In [65]:
df.shape

(9704, 11)

answer: 9704

## Q3. Fuel types

How many fuel types are presented in the dataset?

- 1
- 2
- 3
- 4

In [66]:
df.columns

Index(['engine_displacement', 'num_cylinders', 'horsepower', 'vehicle_weight',
       'acceleration', 'model_year', 'origin', 'fuel_type', 'drivetrain',
       'num_doors', 'fuel_efficiency_mpg'],
      dtype='object')

In [67]:
df.fuel_type.value_counts()

fuel_type
Gasoline    4898
Diesel      4806
Name: count, dtype: int64

In [68]:
df.fuel_type.nunique()

2

answer: 2

## Q4. Missing values

How many columns in the dataset have missing values?

- 0
- 1
- 2
- 3
- 4

In [69]:
(df.isnull().sum() > 0).sum()

np.int64(4)

In [70]:
df.isnull().any().sum()

np.int64(4)

answer: 4

## Q5. Max fuel efficiency

What's the maximum fuel efficiency of cars from Asia?

- 13.75
- 23.75
- 33.75
- 43.75

In [71]:
df_asia = df[df.origin == 'Asia']

In [72]:
df_asia.fuel_efficiency_mpg.sort_values(ascending=False)

9387    23.759123
343     23.204566
7739    23.033673
9401    22.919968
5416    22.858156
          ...    
2890     7.485553
9120     7.329314
6581     7.317353
3891     6.939508
1095     6.886245
Name: fuel_efficiency_mpg, Length: 3247, dtype: float64

In [73]:
df_asia.fuel_efficiency_mpg.max()


23.759122836520497

answer: 23.75

# Q6. Median value of horsepower

1. Find the median value of the `horsepower` column in the dataset.
2. Next, calculate the most frequent value of the same `horsepower` column.
3. Use the `fillna` method to fill the missing values in the `horsepower` column with the most frequent value from the previous step.
4. Now, calculate the median value of `horsepower` once again.

Has it changed?


- Yes, it increased
- Yes, it decreased
- No

In [74]:
median_hp = df.horsepower.median()

In [75]:
mode_hp = df.horsepower.mode()[0]

In [76]:
mode_hp

np.float64(152.0)

In [77]:
median_hp

149.0

In [78]:
df['horsepower'].isna().sum()

np.int64(708)

In [79]:
df['horsepower'] = df['horsepower'].fillna(mode_hp)

In [80]:
df['horsepower'].isna().sum()

np.int64(0)

In [81]:
updated_median_hp = df.horsepower.median()
updated_median_hp

152.0

In [84]:
median_hp

149.0

In [83]:
df.horsepower.median()

152.0

answer: Yes, it increased

## Q7. Sum of weights

1. Select all the cars from Asia
2. Select only columns `vehicle_weight` and `model_year`
3. Select the first 7 values
4. Get the underlying NumPy array. Let's call it `X`.
5. Compute matrix-matrix multiplication between the transpose of `X` and `X`. To get the transpose, use `X.T`. Let's call the result `XTX`.
6. Invert `XTX`.
7. Create an array `y` with values `[1100, 1300, 800, 900, 1000, 1100, 1200]`.
8. Multiply the inverse of `XTX` with the transpose of `X`, and then multiply the result by `y`. Call the result `w`.
9. What's the sum of all the elements of the result?

> **Note**: You just implemented linear regression. We'll talk about it in the next lesson.

- 0.051
- 0.51
- 5.1
- 51

In [108]:
X = np.array(df_asia[['vehicle_weight', 'model_year']][:7])
X

array([[2714.21930965, 2016.        ],
       [2783.86897424, 2010.        ],
       [3582.68736772, 2007.        ],
       [2231.8081416 , 2011.        ],
       [2659.43145076, 2016.        ],
       [2844.22753389, 2014.        ],
       [3761.99403819, 2019.        ]])

In [109]:
XTX = np.dot(X.T, X)
XTX

array([[62248334.33150762, 41431216.5073268 ],
       [41431216.5073268 , 28373339.        ]])

In [110]:
XTX_inv = np.linalg.inv(XTX)
XTX_inv

array([[ 5.71497081e-07, -8.34509443e-07],
       [-8.34509443e-07,  1.25380877e-06]])

In [111]:
y = np.array([1100, 1300, 800, 900, 1000, 1100, 1200])
y

array([1100, 1300,  800,  900, 1000, 1100, 1200])

In [112]:
#Multiply the inverse of `XTX` with the transpose of `X`, and then multiply the result by `y`. Call the result `w`.

w = np.dot(np.dot(XTX_inv, X.T), y)
w

array([0.01386421, 0.5049067 ])

answer: 0.51