Question 1

What's the version of Pandas that you installed?

You can get the version information using the __version__ field:

`pd.__version__`

In [1]:
import pandas as pd
pd.__version__

'1.5.3'

Getting the data

In [2]:
import requests

url = "https://raw.githubusercontent.com/alexeygrigorev/datasets/master/housing.csv"
response = requests.get(url)

if response.status_code == 200:
    with open("data/housing.csv", "wb") as file:
        file.write(response.content)
else:
    print("Failed to download the file.")

Question 2

How many columns are in the dataset?

In [3]:
df = pd.read_csv('data/housing.csv')
len(df.columns)

10

Question 3

Which columns in the dataset have missing values?

In [4]:
missing_values_df = df.isna().sum()
print(missing_values_df[missing_values_df > 0])

total_bedrooms    207
dtype: int64


Question 4

How many unique values does the `ocean_proximity` column have?

In [5]:
df['ocean_proximity'].unique()

array(['NEAR BAY', '<1H OCEAN', 'INLAND', 'NEAR OCEAN', 'ISLAND'],
      dtype=object)

In [6]:
len(df['ocean_proximity'].unique())

5

Question 5

What's the average value of the `median_house_value` for the houses located near the bay?

In [7]:
average_near_bay = df[df['ocean_proximity'] == 'NEAR BAY']['median_house_value'].mean()
print("Average median_house_value for houses near the bay:", average_near_bay)

Average median_house_value for houses near the bay: 259212.31179039303


Question 6

1. Calculate the average of `total_bedrooms` column in the dataset.
2. Use the `fillna` method to fill the missing values in `total_bedrooms` with the mean value from the previous step.
3. Now, calculate the average of `total_bedrooms` again.
4. Has it changed?

In [8]:
# Step 1: Calculate the average of total_bedrooms column
average_total_bedrooms_before_fillna = round(df['total_bedrooms'].mean(), 3)
print("Average total_bedrooms before filling missing values:", average_total_bedrooms_before_fillna)

# Step 2: Fill missing values in total_bedrooms with the mean value
df['total_bedrooms'].fillna(average_total_bedrooms_before_fillna, inplace=True)

# Step 3: Calculate the average of total_bedrooms again
average_total_bedrooms_after_fillna = round(df['total_bedrooms'].mean(), 3)
print("Average total_bedrooms after filling missing values:", average_total_bedrooms_after_fillna)

Average total_bedrooms before filling missing values: 537.871
Average total_bedrooms after filling missing values: 537.871


Question 7

1. Select all the options located on islands.
2. Select only columns `housing_median_age`, `total_rooms`, `total_bedrooms`.
3. Get the underlying NumPy array. Let's call it `X`.
4. Compute matrix-matrix multiplication between the transpose of `X` and `X`. To get the transpose, use `X.T`. Let's call the result XTX.
5. Compute the inverse of `XTX`.
6. Create an array `y` with values `[950, 1300, 800, 1000, 1300]`.
7. Multiply the inverse of `XTX` with the transpose of `X`, and then multiply the result by `y`. Call the result `w`.
8. What's the value of the last element of `w`?

In [9]:
import numpy as np 

# Step 1: Select options located on islands
islands_data = df[df['ocean_proximity'] == 'ISLAND']

# Step 2: Select specific columns
selected_columns = islands_data[['housing_median_age', 'total_rooms', 'total_bedrooms']]

# Step 3: Get the underlying NumPy array (X)
X = selected_columns.to_numpy()

# Step 4: Compute XTX
XTX = np.dot(X.T, X)

# Step 5: Compute the inverse of XTX
XTX_inverse = np.linalg.inv(XTX)

# Step 6: Create array y
y = np.array([950, 1300, 800, 1000, 1300])

# Step 7: Calculate w (weights) using linear regression formula
w = np.dot(np.dot(XTX_inverse, X.T), y)

# Step 8: Get the last element of w
last_element_w = w[-1]

print("The value of the last element of w is:", round(last_element_w, 4))

The value of the last element of w is: 5.6992
