# Enable Virtual Environment For This Notebook.

### Activate Conda Environment

<b>`$ conda activate`</b>

### Install Or Upgrade necessary software for virtual environment.

<b>`$ sudo apt-get install --upgrade python3-pip`</b>

<b>`$ sudo pip3 install --upgrade virtualenv`</b>

<b>`$ sudo pip3 install --upgrade setuptools`</b>

Now we will go to the location of the directory, where we will create our virtual environment.

<b>`$ cd /media/mujahid7292/Data/GoogleDriveSandCorp2014/ML_With_TensorFlow_On_GCP/05.Art_And_Science_Of_Machine_Learning/WEEK_1/01.Improve model accuracy by hand-tuning hyperparameters/Practice`</b>

### Deactivate conda environment

<b>`$ conda deactivate`</b>

### Create Virtual Environment

<b>`$ virtualenv Venv`</b>

### Activate newly created virtual environment

<b>`$ source Venv/bin/activate`</b>

<b>`$ (Venv) which python`</b>

<b>`$ (Venv) pip list`</b>

<b>`$ (Venv) pip3 install jupyter`</b>

In [21]:
%%writefile requirements.txt
numpy
pandas
tensorflow==1.8.0

Writing requirements.txt


In [22]:
%%bash
pip3 install -r requirements.txt



In [23]:
%%bash
pip3 list

Package            Version  
------------------ ---------
absl-py            0.9.0    
astor              0.8.1    
attrs              19.3.0   
backcall           0.1.0    
bleach             1.5.0    
decorator          4.4.2    
defusedxml         0.6.0    
entrypoints        0.3      
gast               0.3.3    
grpcio             1.27.2   
html5lib           0.9999999
importlib-metadata 1.5.0    
ipykernel          5.2.0    
ipython            7.13.0   
ipython-genutils   0.2.0    
ipywidgets         7.5.1    
jedi               0.16.0   
Jinja2             2.11.1   
jsonschema         3.2.0    
jupyter            1.0.0    
jupyter-client     6.1.0    
jupyter-console    6.1.0    
jupyter-core       4.6.3    
Markdown           3.2.1    
MarkupSafe         1.1.1    
mistune            0.8.4    
nbconvert          5.6.1    
nbformat           5.0.4    
notebook           6.0.3    
numpy              1.18.2   
pandas             1.0.3    
pandocfilters      1.4.2    
parso         

Solution: <a>https://github.com/GoogleCloudPlatform/training-data-analyst/blob/master/courses/machine_learning/deepdive/05_artandscience/a_handtuning.ipynb<a>

# Hand tuning hyperparameter

**Learning Objectives:**
   * Use the `LinearRegressor` class in Tensorflow to predict median housing price, at the granularity of the city blocks, based on one input features.
   * Evaluate the accuracy of model's predictions using Root Mean Squared Eroor (RMSE).
   * Improve the accuracy of the model by hand tuning it's hyperparameter.

The data is based on 1990 census data from California. This data is at the city block level, so these features reflect the total number of rooms in that block, or the total number of people who live on that block, respectively.  Using only one input feature -- the number of rooms -- predict house value.

## Set Up
In this first cell, we'll load the necessary libraries.

In [1]:
import math
import shutil
import numpy as np
import pandas as pd
import tensorflow as tf
print(tf.__version__)

tf.logging.set_verbosity(tf.logging.INFO)
pd.options.display.max_rows = 10
pd.options.display.float_format = '{:.1f}'.format

  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])


1.8.0


Next, we'll load our data set.

In [2]:
df = pd.read_csv('https://storage.googleapis.com/ml_universities/california_housing_train.csv', sep=',')

## Examine the data

It's a good idea to get to know your data a little bit before you work with it.

We'll print out a quick summary of a few useful statistics on each column.

This will include things like mean, standard deviation, max, min, and various quantiles.

In [3]:
df.head()

Unnamed: 0,longitude,latitude,housing_median_age,total_rooms,total_bedrooms,population,households,median_income,median_house_value
0,-114.3,34.2,15.0,5612.0,1283.0,1015.0,472.0,1.5,66900.0
1,-114.5,34.4,19.0,7650.0,1901.0,1129.0,463.0,1.8,80100.0
2,-114.6,33.7,17.0,720.0,174.0,333.0,117.0,1.7,85700.0
3,-114.6,33.6,14.0,1501.0,337.0,515.0,226.0,3.2,73400.0
4,-114.6,33.6,20.0,1454.0,326.0,624.0,262.0,1.9,65500.0


In [4]:
df.describe()

Unnamed: 0,longitude,latitude,housing_median_age,total_rooms,total_bedrooms,population,households,median_income,median_house_value
count,17000.0,17000.0,17000.0,17000.0,17000.0,17000.0,17000.0,17000.0,17000.0
mean,-119.6,35.6,28.6,2643.7,539.4,1429.6,501.2,3.9,207300.9
std,2.0,2.1,12.6,2179.9,421.5,1147.9,384.5,1.9,115983.8
min,-124.3,32.5,1.0,2.0,1.0,3.0,1.0,0.5,14999.0
25%,-121.8,33.9,18.0,1462.0,297.0,790.0,282.0,2.6,119400.0
50%,-118.5,34.2,29.0,2127.0,434.0,1167.0,409.0,3.5,180400.0
75%,-118.0,37.7,37.0,3151.2,648.2,1721.0,605.2,4.8,265000.0
max,-114.3,42.0,52.0,37937.0,6445.0,35682.0,6082.0,15.0,500001.0


In this exercise, we'll be trying to predict median_house_value. It will be our label (sometimes also called a target). Can we use total_rooms as our input feature?  What's going on with the values for that feature?

This data is at the city block level, so these features reflect the total number of rooms in that block, or the total number of people who live on that block, respectively.  Let's create a different, more appropriate feature.  Because we are predicing the price of a single house, we should try to make all our features correspond to a single house as well

In [5]:
df['num_rooms'] = df['total_rooms'] / df['households']
df.head()

Unnamed: 0,longitude,latitude,housing_median_age,total_rooms,total_bedrooms,population,households,median_income,median_house_value,num_rooms
0,-114.3,34.2,15.0,5612.0,1283.0,1015.0,472.0,1.5,66900.0,11.9
1,-114.5,34.4,19.0,7650.0,1901.0,1129.0,463.0,1.8,80100.0,16.5
2,-114.6,33.7,17.0,720.0,174.0,333.0,117.0,1.7,85700.0,6.2
3,-114.6,33.6,14.0,1501.0,337.0,515.0,226.0,3.2,73400.0,6.6
4,-114.6,33.6,20.0,1454.0,326.0,624.0,262.0,1.9,65500.0,5.5


# Split the dataset into training and evaluation

In [6]:
np.random.seed(seed=1) # This will make the split reproducible
msk = np.random.rand(len(df)) < 0.8
# We will take 80% data as training data
train_df = df[msk]
# Rest will be put on evaluation
eval_df = df[~msk]
print('Training Data Size = {}'.format(train_df.size))
print('Evaluation Data Size = {}'.format(eval_df.size))

Training Data Size = 136120
Evaluation Data Size = 33880


In [7]:
# See all the Mean (Remain Same)
train_df.describe()

Unnamed: 0,longitude,latitude,housing_median_age,total_rooms,total_bedrooms,population,households,median_income,median_house_value,num_rooms
count,13612.0,13612.0,13612.0,13612.0,13612.0,13612.0,13612.0,13612.0,13612.0,13612.0
mean,-119.6,35.6,28.7,2632.0,536.0,1423.3,498.1,3.9,207986.5,5.5
std,2.0,2.1,12.6,2163.3,416.7,1126.0,379.3,1.9,116514.3,2.6
min,-124.3,32.5,1.0,8.0,1.0,3.0,1.0,0.5,14999.0,0.8
25%,-121.8,33.9,18.0,1461.0,296.0,787.0,281.0,2.6,119600.0,4.4
50%,-118.5,34.2,29.0,2117.5,432.0,1168.0,408.0,3.6,180800.0,5.2
75%,-118.0,37.7,37.0,3146.0,644.2,1715.0,602.0,4.8,266300.0,6.1
max,-114.3,42.0,52.0,37937.0,5471.0,35682.0,5189.0,15.0,500001.0,141.9


In [8]:
# See all the Mean (Remain Same)
eval_df.describe()

Unnamed: 0,longitude,latitude,housing_median_age,total_rooms,total_bedrooms,population,households,median_income,median_house_value,num_rooms
count,3388.0,3388.0,3388.0,3388.0,3388.0,3388.0,3388.0,3388.0,3388.0,3388.0
mean,-119.6,35.7,28.3,2690.4,553.0,1454.8,513.7,3.8,204546.3,5.4
std,2.0,2.1,12.6,2245.5,440.2,1231.5,404.7,1.8,113802.5,2.1
min,-124.3,32.5,2.0,2.0,2.0,6.0,2.0,0.5,22500.0,0.9
25%,-121.8,33.9,18.0,1467.0,300.0,796.0,283.8,2.5,118800.0,4.4
50%,-118.6,34.3,28.0,2171.5,441.0,1160.0,414.0,3.5,178650.0,5.2
75%,-118.0,37.7,37.0,3167.2,667.0,1756.2,615.2,4.7,258825.0,6.0
max,-114.6,41.9,52.0,32627.0,6445.0,28566.0,6082.0,15.0,500001.0,59.9


## Build the first model

In this exercise, we'll be trying to predict `median_house_value`. It will be our label (sometimes also called a target). We'll use `num_rooms` as our input feature.

To train our model, we'll use the [LinearRegressor](https://www.tensorflow.org/api_docs/python/tf/estimator/LinearRegressor) estimator. The Estimator takes care of a lot of the plumbing, and exposes a convenient way to interact with data, training, and evaluation.

In [11]:
OUTDIR = './housing_trained'
def train_and_evaluate(output_dir, num_of_train_steps):
    """
    """
    # Create a Linear Regressor Estimator object.
    estimator = tf.estimator.LinearRegressor(
        feature_columns=[tf.feature_column.numeric_column('num_rooms')],
        model_dir=output_dir
    )
    
    # Add RMSE evaluation metric.
    def rmse(labels, predictions):
        """
        """
        pred_values = tf.cast(predictions['predictions'], tf.float64)
        return {'rmse': tf.metrics.root_mean_squared_error(labels, pred_values)}
    
    # Now add this above rmse evaluation metric to the estimator.
    estimator = tf.contrib.estimator.add_metrics(estimator, rmse)
    
    # Specify the training specification
    train_spec = tf.estimator.TrainSpec(
        input_fn=tf.estimator.inputs.pandas_input_fn(
            x=train_df[['num_rooms']],
            y=train_df['median_house_value'], # Note The Scalling
            num_epochs=None,
            shuffle=True
        ),
        max_steps=num_of_train_steps
    )
    
    # Specify the evaluation specification
    eval_spec = tf.estimator.EvalSpec(
        input_fn=tf.estimator.inputs.pandas_input_fn(
            x=eval_df[['num_rooms']],
            y=eval_df['median_house_value'],
            num_epochs=1,
            shuffle=False
        ),
        steps=None,
        start_delay_secs=1, # Start evaluating after N seconds
        throttle_secs=10 # Evaluate every N seconds
    )
    
    tf.estimator.train_and_evaluate(
        estimator=estimator,
        train_spec=train_spec,
        eval_spec=eval_spec
    )

# Now run  the train_and_evaluate function
shutil.rmtree(OUTDIR,ignore_errors=True) # Start fresh each time
train_and_evaluate(OUTDIR, num_of_train_steps=100)

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_model_dir': './housing_trained', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': None, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f02a36fa278>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
INFO:tensorflow:Using config: {'_model_dir': './housing_trained', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': None, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_serv

# 1. Scale the output

In [18]:
SCALE = 100000 # 1 Lac
OUTDIR = './housing_trained'
def train_and_evaluate(OUTDIR, num_train_steps):
    """
    """
    # Create Linear Regressor estimator object.
    estimator = tf.estimator.LinearRegressor(
        feature_columns=[tf.feature_column.numeric_column('num_rooms')],
        model_dir=OUTDIR
    )
    
    # Add RMSE evaluation metric
    def rmse(labels, predictions):
        """
        """
        pred_values = tf.cast(x=predictions['predictions'], dtype=tf.float64)
        return {'rmse':tf.metrics.root_mean_squared_error(
            labels=labels*SCALE, 
            predictions=pred_values*SCALE
        )}
    
    # Now add this above RMSE evaluation metric to the estimator object
    estimator = tf.contrib.estimator.add_metrics(estimator=estimator, metric_fn=rmse)
    
    # Now create the training specefication
    train_spec = tf.estimator.TrainSpec(
        input_fn=tf.estimator.inputs.pandas_input_fn(
            x=train_df[['num_rooms']],
            y=train_df['median_house_value'] / SCALE, # Note the scalling
            num_epochs=None,
            shuffle=True
        ),
        max_steps=num_train_steps
    )
    
    # Now create the evaluation specification
    eval_spec = tf.estimator.EvalSpec(
        input_fn=tf.estimator.inputs.pandas_input_fn(
            x=eval_df[['num_rooms']],
            y=eval_df["median_house_value"] / SCALE, # Note the scaling
            num_epochs=1,
            shuffle=False
        ),
        steps=None,
        start_delay_secs=1, # Start evaluation after N seconds
        throttle_secs=10 # Evaluate every N seconds
    )
    
    tf.estimator.train_and_evaluate(estimator,train_spec,eval_spec)

# Now run the training
shutil.rmtree(path=OUTDIR,ignore_errors=True) # Start fresh every time
train_and_evaluate(OUTDIR,num_train_steps=100)

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_model_dir': './housing_trained', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': None, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f029838a780>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
INFO:tensorflow:Using config: {'_model_dir': './housing_trained', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': None, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_serv

# 2. Change Learning Rate and Batch Size

In [19]:
SCALE=100000 # 1 Lac
OUTDIR='./housing_trained'
def train_and_evaluate(output_dir, num_train_steps):
    """
    """
    # Create an optimizer with a learning rate 0.2
    my_opt = tf.train.FtrlOptimizer(learning_rate=0.2) # Note the learning rate
    
    # Create Linear Regressor estimator object
    estimator = tf.estimator.LinearRegressor(
        feature_columns=[tf.feature_column.numeric_column('num_rooms')],
        model_dir=output_dir,
        optimizer=my_opt
    )
    
    # Add rmse evaluation metric
    def rmse(labels, predictions):
        """
        """
        pred_values = tf.cast(x=predictions['predictions'],dtype=tf.float64)
        return {'rmse' : tf.metrics.root_mean_squared_error(labels*SCALE, pred_values*SCALE)}
    
    # Attach this above custom evaluation metric to the estimator object
    estimator = tf.contrib.estimator.add_metrics(estimator, metric_fn=rmse)
    
    # Create training specefication
    train_spec = tf.estimator.TrainSpec(
        input_fn=tf.estimator.inputs.pandas_input_fn(
            x=train_df[['num_rooms']],
            y=train_df['median_house_value'] /SCALE, # Note the scalling
            num_epochs=None,
            batch_size=512, # Note the batch size
            shuffle=True
        ),
        max_steps=num_train_steps
    )
    
    # Create evaluation specification
    eval_spec = tf.estimator.EvalSpec(
        input_fn=tf.estimator.inputs.pandas_input_fn(
            x=eval_df[['num_rooms']],
            y=eval_df['median_house_value'] / SCALE, # Note the scalling
            num_epochs=1,
            shuffle=False
        ),
        steps=None,
        start_delay_secs=1, # Start evaluating after N seconds
        throttle_secs=10 # Evaluate every N seconds
    )
    
    tf.estimator.train_and_evaluate(estimator,train_spec,eval_spec)

# Run the training
shutil.rmtree(OUTDIR,ignore_errors=True) # Start fresh every time
train_and_evaluate(OUTDIR,num_train_steps=100)

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_model_dir': './housing_trained', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': None, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f0298da6e48>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
INFO:tensorflow:Using config: {'_model_dir': './housing_trained', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': None, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_serv

# Clean Up Local Directory

In [2]:
%%bash
if [ -d housing_trained ]; then
  rm -rf housing_trained
fi
if [ -d requirements.txt ]; then
  rm -rf requirements.txt
fi
if [ -d Venv ]; then
  rm -rf Venv
fi