# Neural network content based recommendation engine - Hyperparameter tuning

This notebook builds on the results and outputs of the previous neural network model "nn_model.ipynb".

The focus of the following code is to package up the model which can be trained on a server, and using packages for hyperparameter tuning.

In [None]:
!pip3 install tensorflow_hub

In [None]:
%%bash
pip install --upgrade tensorflow

In [None]:
pip install tensorflow==1.13.2

Now reset the notebook's session kernel! Since we're no longer using Cloud Dataflow, we'll be using the python3 kernel from here on out so don't forget to change the kernel if it's still python2.

In [None]:
# Setup core libraries and packages
import os
import tensorflow as tf
import tensorflow_hub as hub

PROJECT = "astute-veld-253418" 
BUCKET = "masters-research" 
REGION = "us-central1" 

# do not change these
os.environ["PROJECT"] = PROJECT
os.environ["BUCKET"] = BUCKET
os.environ["REGION"] = REGION
os.environ["TFVERSION"] = "1.13"

In [None]:
%%bash
gcloud config set project $PROJECT
gcloud config set compute/region $REGION

## Package up the module so it can run as a python job

In order to tune and train the algorithm on a server, we need to package up the model in a training job

In [None]:
%%writefile requirements.txt
tensorflow_hub

In [None]:
%%bash
echo "bucket=${BUCKET}"
rm -rf hybrid_recommendation_trained
export PYTHONPATH=${PYTHONPATH}:${PWD}/hybrid_recommendations_module
python -m trainer.task \
    --bucket=${BUCKET} \
    --train_data_paths=gs://${BUCKET}/hybrid_recommendation/preproc/features/train.csv* \
    --eval_data_paths=gs://${BUCKET}/hybrid_recommendation/preproc/features/eval.csv* \
    --output_dir=${OUTDIR} \
    --batch_size=128 \
    --learning_rate=0.1 \
    --hidden_units="256 128 64" \
    --top_k=3 \
    --train_steps=1000 \
    --start_delay_secs=30 \
    --throttle_secs=60

# Run Model training on remote server
We test our training job by running the model remotely on a cloud server

In [None]:
%%bash
OUTDIR=gs://${BUCKET}/hybrid_recommendation/small_trained_model
JOBNAME=hybrid_recommendation_$(date -u +%y%m%d_%H%M%S)
echo $OUTDIR $REGION $JOBNAME
gsutil -m rm -rf $OUTDIR
gcloud ml-engine jobs submit training $JOBNAME \
    --region=$REGION \
    --module-name=trainer.task \
    --package-path=$(pwd)/hybrid_recommendations_module/trainer \
    --job-dir=$OUTDIR \
    --staging-bucket=gs://$BUCKET \
    --scale-tier=STANDARD_1 \
    --runtime-version=$TFVERSION \
    -- \
    --bucket=${BUCKET} \
    --train_data_paths=gs://${BUCKET}/hybrid_recommendation/preproc/features/train.csv* \
    --eval_data_paths=gs://${BUCKET}/hybrid_recommendation/preproc/features/eval.csv* \
    --output_dir=${OUTDIR} \
    --batch_size=128 \
    --learning_rate=0.1 \
    --hidden_units="256 128 64" \
    --top_k=3 \
    --train_steps=1000 \
    --start_delay_secs=30 \
    --throttle_secs=30

In [None]:
!gcloud ai-platform jobs stream-logs hybrid_recommendation_200711_204322

We can now perform hyperparameter tuning

In [None]:
%%writefile hyperparam.yaml
trainingInput:
    hyperparameters:
        goal: MAXIMIZE
        maxTrials: 5
        maxParallelTrials: 1
        hyperparameterMetricTag: accuracy
        params:
            - parameterName: batch_size
              type: INTEGER
              minValue: 8
              maxValue: 64
              scaleType: UNIT_LINEAR_SCALE
            - parameterName: learning_rate
              type: DOUBLE
              minValue: 0.01
              maxValue: 0.1
              scaleType: UNIT_LINEAR_SCALE
            - parameterName: hidden_units
              type: CATEGORICAL
              categoricalValues: ["1024 512 256", "1024 512 128", "1024 256 128", "512 256 128", "1024 512 64", "1024 256 64", "512 256 64", "1024 128 64", "512 128 64", "256 128 64", "1024 512 32", "1024 256 32", "512 256 32", "1024 128 32", "512 128 32", "256 128 32", "1024 64 32", "512 64 32", "256 64 32", "128 64 32"]

In [None]:
%%bash
OUTDIR=gs://${BUCKET}/hybrid_recommendation/hypertuning
JOBNAME=hybrid_recommendation_$(date -u +%y%m%d_%H%M%S)
echo $OUTDIR $REGION $JOBNAME
gsutil -m rm -rf $OUTDIR
gcloud ml-engine jobs submit training $JOBNAME \
    --region=$REGION \
    --module-name=trainer.task \
    --package-path=$(pwd)/hybrid_recommendations_module/trainer \
    --job-dir=$OUTDIR \
    --staging-bucket=gs://$BUCKET \
    --scale-tier=STANDARD_1 \
    --runtime-version=$TFVERSION \
    --config=hyperparam.yaml \
    -- \
    --bucket=${BUCKET} \
    --train_data_paths=gs://${BUCKET}/hybrid_recommendation/preproc/features/train.csv* \
    --eval_data_paths=gs://${BUCKET}/hybrid_recommendation/preproc/features/eval.csv* \
    --output_dir=${OUTDIR} \
    --batch_size=128 \
    --learning_rate=0.1 \
    --hidden_units="256 128 64" \
    --top_k=3 \
    --train_steps=1000 \
    --start_delay_secs=30 \
    --throttle_secs=30

Run another remote training job

In [None]:
%%bash
OUTDIR=gs://${BUCKET}/hybrid_recommendation/big_trained_model
JOBNAME=hybrid_recommendation_$(date -u +%y%m%d_%H%M%S)
echo $OUTDIR $REGION $JOBNAME
gsutil -m rm -rf $OUTDIR
gcloud ml-engine jobs submit training $JOBNAME \
    --region=$REGION \
    --module-name=trainer.task \
    --package-path=$(pwd)/hybrid_recommendations_module/trainer \
    --job-dir=$OUTDIR \
    --staging-bucket=gs://$BUCKET \
    --scale-tier=STANDARD_1 \
    --runtime-version=$TFVERSION \
    -- \
    --bucket=${BUCKET} \
    --train_data_paths=gs://${BUCKET}/hybrid_recommendation/preproc/features/train.csv* \
    --eval_data_paths=gs://${BUCKET}/hybrid_recommendation/preproc/features/eval.csv* \
    --output_dir=${OUTDIR} \
    --batch_size=128 \
    --learning_rate=0.1 \
    --hidden_units="256 128 64" \
    --content_id_embedding_dimensions=10 \
    --author_embedding_dimensions=10 \
    --top_k= \
    --train_steps=10000 \
    --start_delay_secs=30 \
    --throttle_secs=30