This chapter will go through an example machine learning project from start to end. The main steps are recapped as the following:
<br>
<ol>
    <li> Look at the big picture. </li>
    <li> Get the data. </li>
    <li> Discover and visualize the data to gain insights. </li>
    <li> Prepare the data for Machine Learning algorithms. </li>
    <li> Select a model and train it. </li>
    <li> Fine-tune your model. </li>
    <li> Present your solution. </li>
    <li> Launch, monitor, and maintain your system. </li>
</ol>

The book suggested using a checklist and they offer one in the back of the book.

<b>Business Objective</b>: Determine if a given area is worth investing in by building a model to predict a district's median housing price.

<b>What type of ML to use?</b>
<br>This will be a supervised learning task (because we are using labeled data). We need to predict values so this will be a regression task; specifically multivariate regression because the system uses several features. Also, there is no continous flow of data so this will use batch learning and not online learning.

<b> Performance measure </b> : Usually use the Root Mean Square Error (RMSE) or Mean Absolute Error (MAE).

In [3]:
# we need to fetch the data so we will use a small script to download a .tgz file 
#and export a .csv from it.
import os
import tarfile
from six.moves import urllib

DOWNLOAD_ROOT = "https://raw.githubusercontent.com/ageron/handson-ml/master/"
HOUSING_PATH = os.path.join("datasets", "housing")
HOUSING_URL = DOWNLOAD_ROOT + "datasets/housing/housing.tgz"

# create function that creates a datasets/housing directory in our workspace, downloads 
# the .tgz file, and then extracts the .csv from the directory
def fetch_housing_data(housing_url=HOUSING_URL, housing_path=HOUSING_PATH):
    if not os.path.isdir(housing_path):
        os.makedirs(housing_path)
    tgz_path = os.path.join(housing_path, "housing.tgz")
    urllib.request.urlretrieve(housing_url, tgz_path)
    housing_tgz = tarfile.open(tgz_path)
    housing_tgz.extractall(path=housing_path)
    housing_tgz.close()

In [5]:
import pandas as pd

# create function that returns our data in a pandas DataFrame
def load_housing_data(housing_path=HOUSING_PATH):
    csv_path = os.path.join(housing_path, "housing.csv")
    return pd.read_csv(csv_path)