In [1]:
import numpy as np
import pandas as pd

### Q1. Pandas version

What's the version of Pandas that you installed?

In [2]:
pd.__version__

'2.2.2'

### Getting the data

In [3]:
!wget https://raw.githubusercontent.com/alexeygrigorev/datasets/master/car_fuel_efficiency.csv

--2025-09-26 17:52:15--  https://raw.githubusercontent.com/alexeygrigorev/datasets/master/car_fuel_efficiency.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.109.133, 185.199.111.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.109.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 874188 (854K) [text/plain]
Saving to: ‘car_fuel_efficiency.csv.1’


2025-09-26 17:52:15 (11.1 MB/s) - ‘car_fuel_efficiency.csv.1’ saved [874188/874188]



In [4]:
df = pd.read_csv('car_fuel_efficiency.csv')
df.head()

Unnamed: 0,engine_displacement,num_cylinders,horsepower,vehicle_weight,acceleration,model_year,origin,fuel_type,drivetrain,num_doors,fuel_efficiency_mpg
0,170,3.0,159.0,3413.433759,17.7,2003,Europe,Gasoline,All-wheel drive,0.0,13.231729
1,130,5.0,97.0,3149.664934,17.8,2007,USA,Gasoline,Front-wheel drive,0.0,13.688217
2,170,,78.0,3079.038997,15.1,2018,Europe,Gasoline,Front-wheel drive,0.0,14.246341
3,220,4.0,,2542.392402,20.2,2009,USA,Diesel,All-wheel drive,2.0,16.912736
4,210,1.0,140.0,3460.87099,14.4,2009,Europe,Gasoline,All-wheel drive,2.0,12.488369


### Q2. Records count

How many records are in the dataset?

In [5]:
len(df)

9704

### Q3. Fuel types

How many fuel types are presented in the dataset?

In [6]:
df['fuel_type'].nunique()

2

### Q4. Missing values

How many columns in the dataset have missing values?

In [7]:
int(df.isnull().any().sum())

4

### Q5. Max fuel efficiency

What's the maximum fuel efficiency of cars from Asia?

In [8]:
df['origin'].unique()

array(['Europe', 'USA', 'Asia'], dtype=object)

In [9]:
df[df['origin'] == 'Asia']['fuel_efficiency_mpg'].max()

23.759122836520497

### Q6. Median value of horsepower

1. Find the median value of `horsepower` column in the dataset.

In [10]:
df['horsepower'].median()

149.0

2. Next, calculate the most frequent value of the same `horsepower` column.

In [11]:
most_frequent_horsepower = float(df['horsepower'].mode()[0])
most_frequent_horsepower

152.0

Use `fillna` method to fill the missing values in `horsepower` column with the most frequent value from the previous step.

In [12]:
df['horsepower'].fillna(most_frequent_horsepower, inplace=True)

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df['horsepower'].fillna(most_frequent_horsepower, inplace=True)


Now, calculate the median value of `horsepower` once again.

In [13]:
df['horsepower'].median()

152.0

Has it changed?



*   Yes, it increased


### Q7. Sum of weights

1. Select all the cars from Asia

In [14]:
df_asia = df[df['origin'] == 'Asia']

2. Select only columns `vehicle_weight` and `model_year`

In [15]:
df_asia = df_asia[['vehicle_weight', 'model_year']]

3. Select the first 7 values

In [16]:
df_asia_first_7 = df_asia.head(7)

4. Get the underlying NumPy array. Let's call it `X`.

In [17]:
X = df_asia_first_7.to_numpy()
X

array([[2714.21930965, 2016.        ],
       [2783.86897424, 2010.        ],
       [3582.68736772, 2007.        ],
       [2231.8081416 , 2011.        ],
       [2659.43145076, 2016.        ],
       [2844.22753389, 2014.        ],
       [3761.99403819, 2019.        ]])

5. Compute matrix-matrix multiplication between the transpose of `X` and `X`. To get the transpose, use `X.T`. Let's call the result `XTX`.

In [18]:
XTX = np.dot(X.T, X)
XTX

array([[62248334.33150762, 41431216.5073268 ],
       [41431216.5073268 , 28373339.        ]])

6. Invert `XTX`.

In [19]:
XTX_inv = np.linalg.inv(XTX)
XTX_inv

array([[ 5.71497081e-07, -8.34509443e-07],
       [-8.34509443e-07,  1.25380877e-06]])

7. Create an array `y` with values `[1100, 1300, 800, 900, 1000, 1100, 1200]`.

In [20]:
y = np.array([1100, 1300, 800, 900, 1000, 1100, 1200])

8. Multiply the inverse of `XTX` with the transpose of `X`, and then multiply the result by `y`. Call the result `w`.

In [21]:
w = XTX_inv @ np.transpose(X) @ y
w

array([0.01386421, 0.5049067 ])

9. What's the sum of all the elements of the result?

In [22]:
float(sum(w))

0.5187709081074016