# Enable Virtual Environment For This Notebook.

Now we will go to the location of the directory, where we will create our virtual environment.

<b>`$ cd /media/mujahid7292/Data/GoogleDriveSandCorp2014/ML_With_TensorFlow_On_GCP/06. End_To_End_ML_With_TensorFlow_On_GCP/Week_2/Lab_3/Practice`</b>

### Deactivate conda environment

<b>`$ conda deactivate`</b>

### Activate newly created virtual environment

<b>`$ source Venv/bin/activate`</b>

# Notebook <a href="https://github.com/GoogleCloudPlatform/training-data-analyst/blob/master/courses/machine_learning/deepdive/06_structured/3_tensorflow_wd.ipynb">Link</a>

# Necessary import of python package

In [1]:
import os
import shutil
import numpy as np
import tensorflow as tf
# Load the TensorBoard notebook extension
%load_ext tensorboard
print(tf.__version__)

  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])


The tensorboard module is not an IPython extension.
1.8.0


# Python Variable

In [2]:
# change these to try this notebook out
ACCOUNT = 'student-03-ec074aee9d21@qwiklabs.net'
SAC = 'jupyter-notebook-sac-d'
SAC_KEY_DESTINATION = '/media/mujahid7292/Data/Gcloud_Tem_SAC'
PROJECT = 'qwiklabs-gcp-03-d8cb7ba1a6f4'
BUCKET = 'bucket-qwiklabs-gcp-03-d8cb7ba1a6f4'
REGION = 'us-central1'

# Bash Variable

In [3]:
os.environ['ACCOUNT'] = ACCOUNT
os.environ['SAC'] = SAC
os.environ['SAC_KEY_DESTINATION'] = SAC_KEY_DESTINATION
os.environ['BUCKET'] = BUCKET
os.environ['PROJECT'] = PROJECT
os.environ['REGION'] = REGION

# Set Google Application Credentials

In [4]:
import os
os.environ["GOOGLE_APPLICATION_CREDENTIALS"]='{}/{}.json'.format(SAC_KEY_DESTINATION,SAC)

Check Whether Google Application Credential Was Set Successfully Outside Virtual Environment

In [5]:
%%bash
set | grep GOOGLE_APPLICATION_CREDENTIALS 

GOOGLE_APPLICATION_CREDENTIALS=/media/mujahid7292/Data/Gcloud_Tem_SAC/jupyter-notebook-sac-d.json


# Set Default Project And Region

In [6]:
%%bash
gcloud config set account $ACCOUNT
gcloud config set project $PROJECT
gcloud config set compute/region $REGION

Updated property [core/account].
Updated property [core/project].
Updated property [compute/region].


<h2> Create TensorFlow model using TensorFlow's Estimator API </h2>
<p>
First, write an input_fn to read the data.

In [7]:
# Determine CSV, label, and key columns
CSV_COLUMNS = 'weight_pounds,is_male,mother_age,plurality,gestation_weeks,key'.split(',')
LABEL_COLUMN = 'weight_pounds'
KEY_COLUMN = 'key'

# Set default values for each CSV column
DEFAULTS = [[0.0], ['null'], [0.0], ['null'], [0.0], ['nokey']]
TRAIN_STEPS = 1000

In [8]:
# Create an input function reading a file using the Dataset API
# Then provide the results to the Estimator API
def read_dataset(filename, mode, batch_size = 512):
  def _input_fn():
    def decode_csv(value_column):
      columns = tf.decode_csv(value_column, record_defaults=DEFAULTS)
      features = dict(zip(CSV_COLUMNS, columns))
      label = features.pop(LABEL_COLUMN)
      return features, label
    
    # Create list of files that match pattern
    file_list = tf.gfile.Glob(filename)

    # Create dataset from file list
    dataset = (tf.data.TextLineDataset(file_list)  # Read text file
                 .map(decode_csv))  # Transform each elem by applying decode_csv fn
      
    if mode == tf.estimator.ModeKeys.TRAIN:
        num_epochs = None # indefinitely
        dataset = dataset.shuffle(buffer_size=10*batch_size)
    else:
        num_epochs = 1 # end-of-input after this
 
    dataset = dataset.repeat(num_epochs).batch(batch_size)
    return dataset
  return _input_fn

Next, define the feature columns

In [9]:
def get_wide_and_deep():
    """
    Define both wide and deep type feature column.
    """
    is_male,mother_age,plurality,gestation_weeks= \
        [\
         tf.feature_column.categorical_column_with_vocabulary_list(
            key='is_male',
            vocabulary_list=['True', 'False', 'Unknown']
         ),
         tf.feature_column.numeric_column('mother_age'),
         tf.feature_column.categorical_column_with_vocabulary_list(
             key='plurality',
             vocabulary_list=['Single(1)', 'Twins(2)', 'Triplets(3)',
                       'Quadruplets(4)', 'Quintuplets(5)','Multiple(2+)']
         ),
         tf.feature_column.numeric_column('gestation_weeks')
        ]
    
    # Discretize
    age_buckets = tf.feature_column.bucketized_column(
        source_column=mother_age,
        boundaries=np.arange(start=15, stop=45, step=1).tolist()
    )
    
    gestation_buckets = tf.feature_column.bucketized_column(
        source_column=gestation_weeks,
        boundaries=np.arange(start=17, stop=47, step=1).tolist()
    )
    
    # Sparse columns are wide and have a linear relationship with the output.
    wide = [
        is_male,
        plurality,
        age_buckets,
        gestation_buckets
    ]
    
    # Feature cross all the wide column and embed into lower dimension.
    crossed = tf.feature_column.crossed_column(keys=wide, hash_bucket_size=20000)
    embed = tf.feature_column.embedding_column(categorical_column=crossed, dimension=3)
    
    # Continous columns are deep and have a complex relationship with the output.
    deep =[
        mother_age,
        gestation_weeks,
        embed
    ]
    
    return wide, deep

To predict with the TensorFlow model, we also need a serving input function. We will want all the inputs from our user.

In [10]:
# Create serving input function to be able to serve predictions later using provided inputs
def serving_input_fn():
    feature_placeholders = {
        'is_male': tf.placeholder(tf.string, [None]),
        'mother_age': tf.placeholder(tf.float32, [None]),
        'plurality': tf.placeholder(tf.string, [None]),
        'gestation_weeks': tf.placeholder(tf.float32, [None])
    }
    features = {
        key: tf.expand_dims(tensor, -1)
        for key, tensor in feature_placeholders.items()
    }
    return tf.estimator.export.ServingInputReceiver(features, feature_placeholders)

## Train And Evaluate

In [11]:
def train_and_evaluate(output_dir):
    """
    """
    wide, deep = get_wide_and_deep()
    EVAL_INTERVAL = 300
    
    run_config = tf.estimator.RunConfig(
        save_checkpoints_secs=EVAL_INTERVAL,
        keep_checkpoint_max=3
    )
    
    estimator = tf.estimator.DNNLinearCombinedRegressor(
        model_dir=output_dir,
        linear_feature_columns=wide,
        dnn_feature_columns=deep,
        dnn_hidden_units=[64,32],
        config=run_config
    )
    
    train_spec = tf.estimator.TrainSpec(
        input_fn=read_dataset('train.csv',tf.estimator.ModeKeys.TRAIN),
        max_steps=TRAIN_STEPS
    )
    
    exporter = tf.estimator.LatestExporter(
        name='exporter',
        serving_input_receiver_fn=serving_input_fn
    )
    
    eval_spec = tf.estimator.EvalSpec(
        input_fn=read_dataset('eval.csv',tf.estimator.ModeKeys.EVAL),
        steps=None,
        start_delay_secs=60, # Start evaluating after N seconds
        throttle_secs=EVAL_INTERVAL, # Evaluate every N seconds
        exporters=exporter
    )
    
    tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec)

#### Monitor The Training In Tensorboard
Run Below Command in terminal 

<b>$ `tensorboard --logdir babyweight_trained_tf_wd`</b>

Open <a href='http://localhost:6006/'>Tensoarboard</a>

## Finally, train!

In [12]:
# Run the model
OUTPUT_DIR='babyweight_trained_tf_wd'
shutil.rmtree(path=OUTPUT_DIR, ignore_errors=True) # Start fresh each time
tf.summary.FileWriterCache.clear() # Ensure file writer cache is clear for tensorboard event file
train_and_evaluate(OUTPUT_DIR)

INFO:tensorflow:Using config: {'_model_dir': 'babyweight_trained_tf_wd', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 300, '_session_config': None, '_keep_checkpoint_max': 3, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f5c135b5c18>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
INFO:tensorflow:Running training and evaluation locally (non-distributed).
INFO:tensorflow:Start train and evaluate loop. The evaluate will happen after 300 secs (eval_spec.throttle_secs) or training is finished.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INF

<h2> Monitor and experiment with training </h2>

To begin TensorBoard from within AI Platform Notebooks, click the + symbol in the top left corner and select the **Tensorboard** icon to create a new TensorBoard.

In TensorBoard, look at the learned embeddings. Are they getting clustered? How about the weights for the hidden layers? What if you run this longer? What happens if you change the batchsize?