# TensorFlow Training at Scale

This notebook is tested using `Data Science - Python 3 Kernel` running on a `ml.t3.medium` instance. Please ensure that you use `Python 3 (Data Science)` in the top right on your notebook.

## Loading stored variables
If you ran this notebook before, you may want to re-use the resources you aready created with AWS. Run the cell below to load any prevously created variables. You should see a print-out of the existing variables. If you don't see anything printed then it's probably the first time you are running the notebook!

In [2]:
%store -r
%store

Stored variables and their in-db values:
data_bucket             -> 'sagemaker-us-west-1-176842773820/nyc-taxi/data/pr


In [None]:
# Ensure updated SageMaker SDK version
%pip install -U -q sagemaker

In [22]:
%%writefile tf_train.py

import os
import argparse

import tensorflow as tf
from tensorflow.keras.experimental import LinearModel, WideDeepModel
from tensorflow import keras


def parse_args():

    parser = argparse.ArgumentParser()

    # hyperparameters sent by the client are passed as command-line arguments to the script
    parser.add_argument("--epochs", type=int, default=1)
    parser.add_argument("--batch_size", type=int, default=64)
    parser.add_argument("--learning_rate", type=float, default=0.1)

    # data directories
    parser.add_argument("--training", type=str, default=os.environ["SM_CHANNEL_TRAINING"])
    parser.add_argument("--testing", type=str, default=os.environ["SM_CHANNEL_TESTING"])

    # model directory: we will use the default set by SageMaker, /opt/ml/model
    parser.add_argument("--model_dir", type=str, default=os.environ.get("SM_MODEL_DIR"))

    return parser.parse_known_args()


def get_train_data(train_dir, batch_size):

    def pack(features, label):
        linear_features = [tf.cast(features['day_of_week'], tf.float32), tf.cast(features['month'], tf.float32),
                           tf.cast(features['hour'], tf.float32), features["trip_distance"]]
        
        dnn_features = [features['pickup_latitude'], features['pickup_longitude'], features['dropoff_latitude'], features['dropoff_longitude'], features["trip_distance"]]
        return (tf.stack(linear_features, axis=-1), tf.stack(dnn_features, axis=-1)), label

    
    column_headers = ["day_of_week","month","hour","pickup_latitude","pickup_longitude",
                      "dropoff_latitude","dropoff_longitude","trip_distance","fare_amount"]

    ds = tf.data.experimental.make_csv_dataset(tf.io.gfile.glob(train_dir + '/*.csv'),
                                               batch_size=batch_size,
                                               column_names=column_headers,
                                               num_epochs=1,
                                               shuffle=True,
                                               label_name="fare_amount")
    ds = ds.map(pack)
    return ds



if __name__ == "__main__":
    args, _ = parse_args()
    
    print(args)
    print(os.environ)


    batch_size = args.batch_size
    epochs = args.epochs
    learning_rate = args.learning_rate
    train_dir = args.training
    
    ds = get_train_data(train_dir, batch_size)
    
    linear_model = LinearModel()
    dnn_model = keras.Sequential(
        [keras.layers.Dense(units=64), keras.layers.Dense(units=1)]
    )
    combined_model = WideDeepModel(linear_model, dnn_model)
    combined_model.compile(optimizer="Adam", loss="mse", metrics=["mse"])
    combined_model.fit(ds, epochs=epochs)



Overwriting tf_train.py


In [31]:
import sagemaker
from sagemaker.tensorflow import TensorFlow

sess = sagemaker.Session()
role = sagemaker.get_execution_role()
bucket = sess.default_bucket()

output_bucket = f"s3://{bucket}/nyc-taxi/model/"


tf_estimator = TensorFlow(
    entry_point="tf_train.py",
    base_job_name="tf2-taxi-wide-deep",
    role=role,
    framework_version="2.6.2",
    py_version="py38",
    input_mode="File",
    output_path=output_bucket,
    instance_count=1,
    instance_type="ml.c4.xlarge",
    hyperparameters={"batch_size": 512, "epochs": 1}
)

In [33]:
tf_estimator.fit({"training": f"s3://{data_bucket}/train/", "testing": f"s3://{data_bucket}/test/"}, logs=True)

2022-02-02 04:59:23 Starting - Starting the training job...
2022-02-02 04:59:40 Starting - Preparing the instances for trainingProfilerReport-1643777963: InProgress
......
2022-02-02 05:00:54 Downloading - Downloading input data......
2022-02-02 05:01:45 Training - Downloading the training image...
2022-02-02 05:02:25 Training - Training image download completed. Training in progress.[34m2022-02-02 05:02:13.717578: W tensorflow/core/profiler/internal/smprofiler_timeline.cc:460] Initializing the SageMaker Profiler.[0m
[34m2022-02-02 05:02:13.728798: W tensorflow/core/profiler/internal/smprofiler_timeline.cc:105] SageMaker Profiler is not enabled. The timeline writer thread will not be started, future recorded events will be dropped.[0m
[34m2022-02-02 05:02:13.915104: W tensorflow/core/profiler/internal/smprofiler_timeline.cc:460] Initializing the SageMaker Profiler.[0m
[34m2022-02-02 05:02:17,265 sagemaker-training-toolkit INFO     Imported framework sagemaker_tensorflow_containe