# TensorFlow Locally Develop a Model

This notebook is tested using `Data Science - Python 3 Kernel` running on a `ml.t3.medium` instance. Please ensure that you use `Python 3 (Data Science)` in the top right on your notebook.

-----------------------
BROKEN KERNEL (Fix should be imminent):

This notebook is tested using `TensorFlow 2.6 Python 3.8 CPU Optimized - Python 3 Kernel` running on a `ml.t3.medium` instance. Please ensure that you see `Python 3 (TensorFlow 2.6 Python 3.8 CPU Optimized)` in the top right on your notebook.


------------------------------



## Overview

In this notebook, we'll use a Studio notebook to protype our data loading and model architecture.


In [None]:
# TODO: If not installed (e.g in wrong kernel)
%pip install tensorflow

## Loading stored variables
 Run the cell below to load any prevously created variables from the prior notebook in this lab. You should see a print-out of the existing variables. If you don't see anything printed then you missed the final cell of the previous notebook.

In [1]:
%store -r
%store

Stored variables and their in-db values:
data_bucket             -> 'sagemaker-us-west-1-176842773820/nyc-taxi/data/pr


Important: You must have run the previous sequential notebooks to retrieve variables using the StoreMagic command.



## Download Sample of data for local model building

In [2]:
import sagemaker

data_bucket_s3_uri = "s3://" + data_bucket

# Filter directory for csv files
csv_files = [x for x in sagemaker.s3.S3Downloader.list(data_bucket_s3_uri) if x[-4:] == ".csv"]

# Download one csv file
sagemaker.s3.S3Downloader.download(csv_files[0], "demo_data")

In [2]:
import glob
import pandas as pd

csv_file = glob.glob("demo_data/*.csv")[0]

column_headers = ["day_of_week","month","hour","pickup_latitude","pickup_longitude",
                  "dropoff_latitude","dropoff_longitude","trip_distance","fare_amount"]

raw_dataset = pd.read_csv(csv_file, names=column_headers)
raw_dataset.head()

Unnamed: 0,day_of_week,month,hour,pickup_latitude,pickup_longitude,dropoff_latitude,dropoff_longitude,trip_distance,fare_amount
0,1,1,0,40.575375,-73.96907,40.726715,40.726715,20.8,57.0
1,1,1,0,40.575977,-73.990845,40.65662,40.65662,12.0,41.0
2,1,1,0,40.576317,-73.981339,40.583519,40.583519,0.59,4.0
3,1,1,0,40.587311,-73.95401,40.631557,40.631557,3.9,15.0
4,1,1,0,40.587357,-73.954361,40.595753,40.595753,1.2,6.5


In [3]:
linear_input = raw_dataset[["day_of_week", "month", "hour", "trip_distance"]]
dnn_input = raw_dataset[["pickup_latitude","pickup_longitude","dropoff_latitude","dropoff_longitude","trip_distance"]]
y = raw_dataset[["fare_amount"]]

# Architecture Prototyping
![image](https://1.bp.blogspot.com/-Dw1mB9am1l8/V3MgtOzp3uI/AAAAAAAABGs/mP-3nZQCjWwdk6qCa5WraSpK8A7rSPj3ACLcB/s1600/image04.png)

https://ai.googleblog.com/2016/06/wide-deep-learning-better-together-with.html

In [4]:
import tensorflow as tf
from tensorflow.keras.experimental import LinearModel, WideDeepModel
from tensorflow import keras

In [6]:
linear_model = LinearModel()

dnn_model = keras.Sequential([
    keras.layers.Flatten(),
    keras.layers.Dense(128, activation='elu'),  
    keras.layers.Dense(64, activation='elu'), 
    keras.layers.Dense(32, activation='elu'), 
    keras.layers.Dense(1,activation='sigmoid') 
])

combined_model = WideDeepModel(linear_model, dnn_model)
combined_model.compile(optimizer="Adam", loss="mse", metrics=["mse"])
combined_model.fit([linear_input, dnn_input], y, epochs=5)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x7fbaa14ca290>

# TF Native File Reader
After an acceptable model tested using our pandas dataset, we need to think about what dataset we'll have when we scale this up to our entire dataset as a submitted SageMaker Training Job. To do this, we can prototype a notoriously tricky process right here in our local notebook.

In [7]:
def pack(features, label):
    linear_features = [tf.cast(features['day_of_week'], tf.float32), tf.cast(features['month'], tf.float32),
                       tf.cast(features['hour'], tf.float32), features["trip_distance"]]
    
    dnn_features = [features['pickup_latitude'], features['pickup_longitude'], features['dropoff_latitude'],
                    features['dropoff_longitude'], features["trip_distance"]]
    
    return (tf.stack(linear_features, axis=-1), tf.stack(dnn_features, axis=-1)), label


ds = tf.data.experimental.make_csv_dataset(csv_file,
                                           batch_size=1,
                                           column_names=column_headers,
                                           num_epochs=5,
                                           shuffle=False,
                                           label_name="fare_amount")
ds = ds.map(pack)

In [8]:
iterator = iter(ds)
(x1, x2), y = next(iterator)

print(x1)
print(x2)
print(y)

tf.Tensor([[ 1.  1.  0. 12.]], shape=(1, 4), dtype=float32)
tf.Tensor([[ 40.575977 -73.990845  40.65662   40.65662   12.      ]], shape=(1, 5), dtype=float32)
tf.Tensor([41.], shape=(1,), dtype=float32)


## Build Regression Model

In [9]:
# Increase Batch Size
ds = tf.data.experimental.make_csv_dataset(csv_file,
                                           batch_size=128,
                                           column_names=column_headers,
                                           num_epochs=1,
                                           shuffle=False,
                                           label_name="fare_amount")
ds = ds.map(pack)

In [10]:
linear_model = LinearModel()
dnn_model = keras.Sequential([
    keras.layers.Flatten(),
    keras.layers.Dense(128, activation='elu'),  
    keras.layers.Dense(64, activation='elu'), 
    keras.layers.Dense(32, activation='elu'), 
    keras.layers.Dense(1,activation='sigmoid') 
])
combined_model = WideDeepModel(linear_model, dnn_model)
combined_model.compile(optimizer="Adam", loss="mse", metrics=["mse"])
combined_model.fit(ds, epochs=5)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x7f4c8212f8d0>

# Lets Scale it out in the next Notebook