# Recommendations on GCP with TensorFlow and WALS

This project deploys a solution for a recommendation service on docker container/GCP APP engine, using the WALS algorithm in TensorFlow. Components include:

- Recommendation model code, and scripts to train and tune the model on ML Engine
- A REST endpoint using docker and nginx server.
- Web App to show personalized and popular recommendation.


In [1]:
import tensorflow as tf
tf.__version__

'1.14.0'

In [2]:
import os
PROJECT = 'mlongcp' # REPLACE WITH YOUR PROJECT ID
REGION = 'us-central1' # REPLACE WITH YOUR REGION e.g. us-central1

# Set GCP variables
os.environ['PROJECT'] = PROJECT
os.environ['BUCKET'] = 'recserve_' + PROJECT
os.environ['REGION'] = REGION

# DO NOT expose/share project service account json.
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = '../../sv_mlongcp.json' # Path to service account file

In [3]:
%%bash

gcloud config set project $PROJECT
gcloud config set compute/region $REGION

Updated property [core/project].
Updated property [compute/region].


In [4]:
%%bash

# create GCS bucket with recserve_PROJECT_NAME if not exists
exists=$(gsutil ls -d | grep -w gs://${BUCKET}/)
if [ -n "$exists" ]; then
   echo "Not creating recserve_bucket since it already exists."
else
   echo "Creating recserve_bucket"
   gsutil mb -l ${REGION} gs://${BUCKET}
fi

Not creating recserve_bucket since it already exists.


### Copy data files into our bucket

In [25]:
%%bash

gsutil -m cp -r gs://${BUCKET}/data/* ./data

Copying gs://recserve_mlongcp/data/ga_sessions_sample.json.gz...
Copying gs://recserve_mlongcp/data/ga_sessions_sample_schema.json...
Copying gs://recserve_mlongcp/data/recommendation_events.csv...
Copying gs://recserve_mlongcp/data/u.data...
/ [0/4 files][    0.0 B/133.2 MiB]   0% Done                                    / [0/4 files][    0.0 B/133.2 MiB]   0% Done                                    / [0/4 files][    0.0 B/133.2 MiB]   0% Done                                    / [0/4 files][    0.0 B/133.2 MiB]   0% Done                                    / [1/4 files][ 13.9 KiB/133.2 MiB]   0% Done                                    -- [1/4 files][  3.4 MiB/133.2 MiB]   2% Done                                    \\ [2/4 files][  4.7 MiB/133.2 MiB]   3% Done                                    || [2/4 files][  8.6 MiB/133.2 MiB]   6% Done                                    /-- [2/4 files][ 12.7 MiB/133.2 MiB]   9% Done                                    \|| [2/4 files][

### Create empty BigQuery dataset and load sample JSON data
Note: Ingesting the 400K rows of sample data. This usually takes 5-7 minutes.

In [None]:
%%bash

# create BigQuery dataset if it doesn't already exist
exists=$(bq ls -d | grep -w GA360_test)
if [ -n "$exists" ]; then
   echo "Not creating GA360_test since it already exists."
else
   echo "Creating GA360_test dataset."
   bq --project_id=${PROJECT} mk GA360_test 
fi

# create the schema and load our sample Google Analytics session data
bq load --source_format=NEWLINE_DELIMITED_JSON \
 GA360_test.ga_sessions_sample \
 gs://${BUCKET}/data/ga_sessions_sample.json.gz \
 data/ga_sessions_sample_schema.json # can't load schema files from GCS

## Install WALS model training package and model data

### 1. Create a distributable package. Copy the package up to the code folder in the bucket you created previously.

In [91]:
%%bash

cd wals_ml_engine

echo "############## Creating distributable package ##############"
python setup.py sdist

echo "############## Copying ML package to bucket ##############"
gsutil cp dist/wals_ml_engine-0.1.tar.gz gs://${BUCKET}/code/

############## Creating distributable package ##############
running sdist
running egg_info
writing wals_ml_engine.egg-info/PKG-INFO
writing dependency_links to wals_ml_engine.egg-info/dependency_links.txt
writing requirements to wals_ml_engine.egg-info/requires.txt
writing top-level names to wals_ml_engine.egg-info/top_level.txt
reading manifest file 'wals_ml_engine.egg-info/SOURCES.txt'
writing manifest file 'wals_ml_engine.egg-info/SOURCES.txt'
running check
creating wals_ml_engine-0.1
creating wals_ml_engine-0.1/trainer
creating wals_ml_engine-0.1/wals_ml_engine.egg-info
copying files to wals_ml_engine-0.1...
copying README.txt -> wals_ml_engine-0.1
copying setup.py -> wals_ml_engine-0.1
copying trainer/__init__.py -> wals_ml_engine-0.1/trainer
copying trainer/model.py -> wals_ml_engine-0.1/trainer
copying trainer/task.py -> wals_ml_engine-0.1/trainer
copying trainer/util.py -> wals_ml_engine-0.1/trainer
copying trainer/wals.py -> wals_ml_engine-0.1/trainer
copying wals_ml_engine.e

Copying file://dist/wals_ml_engine-0.1.tar.gz [Content-Type=application/x-tar]...
/ [0 files][    0.0 B/  7.9 KiB]                                                / [1 files][  7.9 KiB/  7.9 KiB]                                                
Operation completed over 1 objects/7.9 KiB.                                      


### 2. Run the WALS model on the sample data set:

In [None]:
%%bash

# view the ML train local script before running
cat wals_ml_engine/mltrain.sh

In [4]:
%%bash

cd wals_ml_engine


# train with user item ratings data
./mltrain.sh local ../data/u.data
# train locally with unoptimized hyperparams
#./mltrain.sh local ../data/recommendation_events.csv --data-type web_views --use-optimized

# train on ML Engine with optimized hyperparams
# ./mltrain.sh train ../data/recommendation_events.csv --data-type web_views --use-optimized

# tune hyperparams on ML Engine:
# ./mltrain.sh tune ../data/recommendation_events.csv --data-type web_views


Sun Jul 14 10:00:51 CDT 2019
Sun Jul 14 10:00:55 CDT 2019


INFO:tensorflow:Train Start: 2019-07-14 10:00:53
  frac = np.array(1.0/(data > 0.0).sum(axis))
Instructions for updating:
Colocations handled automatically by placer.
2019-07-14 10:00:54.635943: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
INFO:tensorflow:Train Finish: 2019-07-14 10:00:55
INFO:tensorflow:train RMSE = 0.89
INFO:tensorflow:test RMSE = 1.06


This will take a couple minutes, and create a job directory under wals_ml_engine/jobs like "wals_ml_local_20180102_012345/model", containing the model files saved as numpy arrays.

### View the locally trained model directory

In [30]:
ls wals_ml_engine/jobs

[34mwals_ml_local_20190708_132650[m[m/


### 3. Copy the model files from this directory to the model folder in the project bucket:
In the case of multiple models, take the most recent (tail -1)

In [31]:
%%bash
export JOB_MODEL=$(find wals_ml_engine/jobs -name "model" | tail -1)
gsutil cp ${JOB_MODEL}/* gs://${BUCKET}/model/
  
echo "Recommendation model file numpy arrays in bucket:"  
gsutil ls gs://${BUCKET}/model/

Recommendation model file numpy arrays in bucket:
gs://recserve_mlongcp/model/col.npy
gs://recserve_mlongcp/model/item.npy
gs://recserve_mlongcp/model/row.npy
gs://recserve_mlongcp/model/user.npy


Copying file://wals_ml_engine/jobs/wals_ml_local_20190708_132650/model/col.npy [Content-Type=application/octet-stream]...
/ [0 files][    0.0 B/ 33.0 KiB]                                                / [0 files][ 33.0 KiB/ 33.0 KiB]                                                -- [1 files][ 33.0 KiB/ 33.0 KiB]                                                Copying file://wals_ml_engine/jobs/wals_ml_local_20190708_132650/model/item.npy [Content-Type=application/octet-stream]...
- [1 files][ 33.0 KiB/814.4 KiB]                                                - [1 files][296.4 KiB/814.4 KiB]                                                \\ [1 files][560.4 KiB/814.4 KiB]                                                |// [2 files][814.4 KiB/814.4 KiB]                                                -Copying file://wals_ml_engine/jobs/wals_ml_local_20190708_132650/model/row.npy [Content-Type=application/octet-stream]...
- [2 files][814.4 KiB/832.9 KiB]  208.7 KiB/s           

In [77]:
# this how your bucket will look after uploading code, data and model

!gsutil ls -R gs://${BUCKET}

gs://recserve_mlongcp/code/:
gs://recserve_mlongcp/code/wals_ml_engine-0.1.tar.gz

gs://recserve_mlongcp/data/:
gs://recserve_mlongcp/data/ga_sessions_sample.json.gz
gs://recserve_mlongcp/data/ga_sessions_sample_schema.json
gs://recserve_mlongcp/data/recommendation_events.csv
gs://recserve_mlongcp/data/u.data

gs://recserve_mlongcp/model/:
gs://recserve_mlongcp/model/col.npy
gs://recserve_mlongcp/model/item.npy
gs://recserve_mlongcp/model/row.npy
gs://recserve_mlongcp/model/user.npy


# Dockerize WebApp and launch


In [10]:
%%bash
docker build -t luvneries/wals-api app/.

Sending build context to Docker daemon  3.772MB
Step 1/9 : FROM python:3.7
 ---> 42d620af35be
Step 2/9 : ENV APP /app
 ---> Using cache
 ---> 329dd7c28311
Step 3/9 : RUN mkdir $APP
 ---> Using cache
 ---> c1be1f4a2720
Step 4/9 : WORKDIR $APP
 ---> Using cache
 ---> 909f0e38041f
Step 5/9 : EXPOSE 8000
 ---> Using cache
 ---> cdb40acb6232
Step 6/9 : COPY requirements.txt .
 ---> Using cache
 ---> afe29c77db72
Step 7/9 : RUN pip install -r requirements.txt
 ---> Using cache
 ---> 207c57ae6a51
Step 8/9 : COPY . .
 ---> 57fe93ba5aa3
Step 9/9 : CMD ["gunicorn", "-b", "0.0.0.0:8000", "main:app"]
 ---> Running in 946c77d1c7ae
Removing intermediate container 946c77d1c7ae
 ---> 340e14aa7cea
Successfully built 340e14aa7cea
Successfully tagged luvneries/wals-api:latest


In [11]:
%%bash
docker run -d -p 8000:8000 luvneries/wals-api 

6c368596ba5c3329d78613ae6f575493e21211eeaa9f776e7499394f7cb2fbb0


### Query API based on User ID and get recommended items

In [12]:
%%bash
for user_id in 195 196 197 198
do 
    echo $user_id `curl -s http://127.0.0.1:8000/recommendation?userId=$user_id`
done

195 {"items":["319","1142","1404","1021","1083"]}
196 {"items":["1149","1462","1191","1183","1188"]}
197 {"items":["168","63","317","49","97"]}
198 {"items":["898","301","913","18","271"]}
