# Intro

This exercise is based on the following video:
https://www.youtube.com/watch?v=TrzUlo4BImM

This video explains the different metrics for regression problems.

The metrics that we are going to see are the following ones:


In [102]:
import pandas as pd
from sklearn.datasets import fetch_california_housing

data = fetch_california_housing()
features = data["data"]
target = data["target"]

# Source Data

In [103]:
print(data.DESCR)

.. _california_housing_dataset:

California Housing dataset
--------------------------

**Data Set Characteristics:**

:Number of Instances: 20640

:Number of Attributes: 8 numeric, predictive attributes and the target

:Attribute Information:
    - MedInc        median income in block group
    - HouseAge      median house age in block group
    - AveRooms      average number of rooms per household
    - AveBedrms     average number of bedrooms per household
    - Population    block group population
    - AveOccup      average number of household members
    - Latitude      block group latitude
    - Longitude     block group longitude

:Missing Attribute Values: None

This dataset was obtained from the StatLib repository.
https://www.dcc.fc.up.pt/~ltorgo/Regression/cal_housing.html

The target variable is the median house value for California districts,
expressed in hundreds of thousands of dollars ($100,000).

This dataset was derived from the 1990 U.S. census, using one row per ce

In [104]:
# Getting the features and the target as Dataframes 
features_df = pd.DataFrame(features, columns = data.feature_names)
target_df = pd.DataFrame(target, columns = data.target_names)

# Merging both the features and the target into the same dataframe
housing_df = pd.concat( 
    [features_df, target_df],
    axis= 1 # Horizontally
)

In [105]:
print("---- Data -----")

print(f"Rows: {housing_df.shape[0]}")
print(f"Columns: {housing_df.shape[1]}")

print("Target:", data.target_names[0])

print(f"Features ({len(data.feature_names)}):")
for feature in data.feature_names:
    print(" -", feature)

print("")

example_nrows = 4
print(f"Example:")
housing_df.head(example_nrows)

---- Data -----
Rows: 20640
Columns: 9
Target: MedHouseVal
Features (8):
 - MedInc
 - HouseAge
 - AveRooms
 - AveBedrms
 - Population
 - AveOccup
 - Latitude
 - Longitude

Example:


Unnamed: 0,MedInc,HouseAge,AveRooms,AveBedrms,Population,AveOccup,Latitude,Longitude,MedHouseVal
0,8.3252,41.0,6.984127,1.02381,322.0,2.555556,37.88,-122.23,4.526
1,8.3014,21.0,6.238137,0.97188,2401.0,2.109842,37.86,-122.22,3.585
2,7.2574,52.0,8.288136,1.073446,496.0,2.80226,37.85,-122.24,3.521
3,5.6431,52.0,5.817352,1.073059,558.0,2.547945,37.85,-122.25,3.413
