# 01 - Introduction

## Q1. Pandas version

What version of Pandas did you install?

You can get the version information using the __version__ field:

In [12]:
import pandas as pd
import numpy as np

In [13]:
pd.__version__

'2.3.3'

## Getting the data


```sh
wget https://raw.githubusercontent.com/alexeygrigorev/datasets/master/car_fuel_efficiency.csv
```

In [14]:
data = pd.read_csv("car_fuel_efficiency.csv")

In [15]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9704 entries, 0 to 9703
Data columns (total 11 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   engine_displacement  9704 non-null   int64  
 1   num_cylinders        9222 non-null   float64
 2   horsepower           8996 non-null   float64
 3   vehicle_weight       9704 non-null   float64
 4   acceleration         8774 non-null   float64
 5   model_year           9704 non-null   int64  
 6   origin               9704 non-null   object 
 7   fuel_type            9704 non-null   object 
 8   drivetrain           9704 non-null   object 
 9   num_doors            9202 non-null   float64
 10  fuel_efficiency_mpg  9704 non-null   float64
dtypes: float64(6), int64(2), object(3)
memory usage: 834.1+ KB


## Q2. Records Count

How many records are in the dataset?

In [16]:
len(data)

9704

## Q3. Fuel Types

How many fuel types are presented in the dataset?

In [17]:
data["fuel_type"].unique().size

2

## Q4. Missing Values

How many columns in the dataset have missing values?

In [18]:
data.isnull().sum()

# Filter the columns with missing values
missing_values = data.isnull().sum()
missing_values = missing_values[missing_values > 0]
missing_values.size

4

## Q5. Max Fuel Efficiency

What's the maximum fuel efficiency of cars from Asia?

In [19]:
max_fuel_efficiency = data['fuel_efficiency_mpg'].max()
print(f"Exact max value: {max_fuel_efficiency}")

print(f"Top 5 highest values:")
print(data['fuel_efficiency_mpg'].nlargest(5))

options = [13.75, 23.75, 33.75, 43.75]
print(f"\nClosest option to {max_fuel_efficiency}: {min(options, key=lambda x: abs(x - max_fuel_efficiency))}")

max_fuel_efficiency

Exact max value: 25.96722204888372
Top 5 highest values:
5014    25.967222
5815    24.971452
9387    23.759123
560     23.556075
8300    23.369176
Name: fuel_efficiency_mpg, dtype: float64

Closest option to 25.96722204888372: 23.75


np.float64(25.96722204888372)

## Q6. Median value of horsepower

1. Find the median value of the horsepower column in the dataset.
2. Next, calculate the most frequent value of the same horsepower column.
3. Use the fillna method to fill the missing values in the horsepower column with the most frequent value from the previous step.
4. Now, calculate the median value of horsepower once again.

Has it changed?

In [20]:
# Step 1: Find the median value of the horsepower column
median_original = data['horsepower'].median()
print(f"Original median horsepower: {median_original}")

# Step 2: Calculate the most frequent value (mode) of horsepower
mode_horsepower = data['horsepower'].mode()[0]  # mode() returns a Series, take first value
print(f"Most frequent horsepower value (mode): {mode_horsepower}")

# Step 3: Fill missing values with the most frequent value
data_filled = data.copy()
data_filled['horsepower'] = data_filled['horsepower'].fillna(mode_horsepower)

# Step 4: Calculate the median again
median_after_fill = data_filled['horsepower'].median()
print(f"Median after filling with mode: {median_after_fill}")

# Step 5: Check if it changed
print(f"\nComparison:")
if median_after_fill > median_original:
    print("Yes, it increased")
elif median_after_fill < median_original:
    print("Yes, it decreased")
else:
    print("No")

median_original

Original median horsepower: 149.0
Most frequent horsepower value (mode): 152.0
Median after filling with mode: 152.0

Comparison:
Yes, it increased


np.float64(149.0)

## Q7. Sum of Weights

1. Select all the cars from Asia
2. Select only columns vehicle_weight and model_year
3. Select the first 7 values
4. Get the underlying NumPy array. Let's call it X.
5. Compute matrix-matrix multiplication between the transpose of X and X. To get the transpose, use X.T. Let's call the result XTX.
6. Invert XTX.
7. Create an array y with values [1100, 1300, 800, 900, 1000, 1100, 1200].
8. Multiply the inverse of XTX with the transpose of X, and then multiply the result by y. Call the result w.
9. What's the sum of all the elements of the result?


In [21]:
asia_cars = data[data['origin'] == 'Asia'][['vehicle_weight', 'model_year']].head(7)
X = asia_cars.to_numpy()
XTX = X.T @ X
XTX_inv = np.linalg.inv(XTX)
y = np.array([1100, 1300, 800, 900, 1000, 1100, 1200])
w = XTX_inv @ X.T @ y
print(f"Sum of all elements in w: {w.sum()}")

Sum of all elements in w: 0.5187709081074006
