## Content Based Filtering

This lab relies on files created in the **Create Datasets** lab. Be sure to run the code in lab_Create_Datasets.ipynb lab before completing this lab. 

This notebook should be run in with a **python3** kernel. 

This lab illustrates:
1. how to build feature columns for a model using tf.feature_column
2. how to create custom evaluation metrics and add them to Tensorboard
3. how to train a model and make predictions with the saved model

Tensorflow Hub should already be installed. You can check using pip freeze.

In [68]:
%bash
pip freeze | grep tensor

tensorboard==1.8.0
tensorflow==1.8.0
tensorflow-hub==0.1.1


If tensorflow-hub is not already install, uncomment the cell below and execute the commands. After doing the pip install, click **"Reset Session"** on the notebook so that the Python environment picks up the new packages.

In [69]:
#%bash
#pip install tensorflow-hub

In [70]:
import os
import tensorflow as tf
import numpy as np
import tensorflow_hub as hub
import shutil
PROJECT = 'munn-sandbox' # REPLACE WITH YOUR PROJECT ID
BUCKET = 'munn-sandbox-bucket' # REPLACE WITH YOUR BUCKET NAME
REGION = 'us-central1' # REPLACE WITH YOUR BUCKET REGION e.g. us-central1

# do not change these
os.environ['PROJECT'] = PROJECT
os.environ['BUCKET'] = BUCKET
os.environ['REGION'] = REGION
os.environ['TFVERSION'] = '1.8'

In [71]:
%bash
gcloud config set project $PROJECT
gcloud config set compute/region $REGION

Updated property [core/project].
Updated property [compute/region].


### Build the feature columns for the model.

To start, we'll load the list of categories, authors and article ids we created in the previous **Create Datasets** notebook.

In [72]:
categories_list = open("categories.txt").read().splitlines()
authors_list = open("authors.txt").read().splitlines()
content_ids_list = open("content_ids.txt").read().splitlines()
mean_months_since_epoch = 523

In the cell below we'll define the feature columns to use in our model. If necessary, remind yourself the [various feature columns](https://www.tensorflow.org/api_docs/python/tf/feature_column) to use.  
For the embedded_title_column feature column, use a Tensorflow Hub Module to create an embedding of the article title. Since the articles and titles are in German, you'll want to use a German language embedding module.  
Explore the text embedding Tensorflow Hub modules [available here](https://alpha.tfhub.dev/). Filter by setting the language to 'German'. The 50 dimensional embedding should be sufficient for our purposes. 

In [73]:
embedded_title_column = hub.text_embedding_column(
    key="title", 
    module_spec="https://tfhub.dev/google/nnlm-de-dim50/1",
    trainable=False)

content_id_column = tf.feature_column.categorical_column_with_hash_bucket(
    key="content_id",
    hash_bucket_size= len(content_ids_list))
embedded_content_column = tf.feature_column.embedding_column(
    categorical_column=content_id_column,
    dimension=10)

author_column = tf.feature_column.categorical_column_with_hash_bucket(
    key="author",
    hash_bucket_size=len(authors_list) + 1)
embedded_author_column = tf.feature_column.embedding_column(
    categorical_column=author_column,
    dimension=3)

category_column_categorical = tf.feature_column.categorical_column_with_vocabulary_list(
    key="category",
    vocabulary_list=categories_list,
    num_oov_buckets=1)
category_column = tf.feature_column.indicator_column(category_column_categorical)

months_since_epoch_boundaries = list(range(400,700,20))
months_since_epoch_column = tf.feature_column.numeric_column(
    key="months_since_epoch")
months_since_epoch_bucketized = tf.feature_column.bucketized_column(
    source_column = months_since_epoch_column,
    boundaries = months_since_epoch_boundaries)

crossed_months_since_category_column = tf.feature_column.indicator_column(tf.feature_column.crossed_column(
    [category_column_categorical, months_since_epoch_bucketized], len(months_since_epoch_boundaries) * (len(categories_list) + 1)))

feature_columns = [embedded_content_column,
                   embedded_author_column,
                   category_column,
                   embedded_title_column,
                   crossed_months_since_category_column] 

### Create the input function.

Next we'll create the input function for our model. This input function reads the data from the csv files we created in the previous labs. 

In [74]:
record_defaults = [["Unknown"], ["Unknown"],["Unknown"],["Unknown"],["Unknown"],[mean_months_since_epoch],["Unknown"]]
column_keys = ["visitor_id", "content_id", "category", "title", "author", "months_since_epoch", "next_content_id"]
label_key = "next_content_id"
def read_dataset(filename, mode, batch_size = 512):
  def _input_fn():
      #tf.tables_initializer()
      def decode_csv(value_column):
          columns = tf.decode_csv(value_column,record_defaults=record_defaults)
          features = dict(zip(column_keys, columns))          
          label = features.pop(label_key)         
          return features, label

      # Create list of files that match pattern
      file_list = tf.gfile.Glob(filename)

      # Create dataset from file list
      dataset = tf.data.TextLineDataset(file_list).map(decode_csv)

      if mode == tf.estimator.ModeKeys.TRAIN:
          num_epochs = None # indefinitely
          dataset = dataset.shuffle(buffer_size = 10 * batch_size)
      else:
          num_epochs = 1 # end-of-input after this

      dataset = dataset.repeat(num_epochs).batch(batch_size)
      return dataset.make_one_shot_iterator().get_next()
  return _input_fn

### Create the model and train/evaluate


Next, we'll build our model which recommends an article for a visitor to the Kurier.at website. Look through the code below. We use the input_layer feature column to create the dense input layer to our network. This is just a sigle layer network where we can adjust the number of hidden units as a parameter.

Currently, we compute the accuracy between our predicted 'next article' and the actual 'next article' read next by the visitor. We'll also add an additional performance metric of top 10 accuracy to assess our model. To accomplish this, we compute the top 10 accuracy metric, add it to the metrics dictionary below and add it to the tf.summary so that this value is reported to Tensorboard as well.

In [75]:
def model_fn(features, labels, mode, params):
  net = tf.feature_column.input_layer(features, params['feature_columns'])
  for units in params['hidden_units']:
        net = tf.layers.dense(net, units=units, activation=tf.nn.relu)
   # Compute logits (1 per class).
  logits = tf.layers.dense(net, params['n_classes'], activation=None) 

  predicted_classes = tf.argmax(logits, 1)
  from tensorflow.python.lib.io import file_io
    
  with file_io.FileIO('content_ids.txt', mode='r') as ifp:
    content = tf.constant([x.rstrip() for x in ifp])
  predicted_class_names = tf.gather(content, predicted_classes)
  if mode == tf.estimator.ModeKeys.PREDICT:
    predictions = {
        'class_ids': predicted_classes[:, tf.newaxis],
        'class_names' : predicted_class_names[:, tf.newaxis],
        'probabilities': tf.nn.softmax(logits),
        'logits': logits,
    }
    return tf.estimator.EstimatorSpec(mode, predictions=predictions)
  table = tf.contrib.lookup.index_table_from_file(vocabulary_file="content_ids.txt")
  labels = table.lookup(labels)
  # Compute loss.
  loss = tf.losses.sparse_softmax_cross_entropy(labels=labels, logits=logits)

  # Compute evaluation metrics.
  accuracy = tf.metrics.accuracy(labels=labels,
                                 predictions=predicted_classes,
                                 name='acc_op')
  top_10_accuracy = tf.metrics.mean(tf.nn.in_top_k(predictions=logits, targets=labels, k=10))
  
  metrics = {
    'accuracy': accuracy,
    'top_10_accuracy' : top_10_accuracy}
  
  tf.summary.scalar('accuracy', accuracy[1])
  tf.summary.scalar('top_10_accuracy', top_10_accuracy[1])

  if mode == tf.estimator.ModeKeys.EVAL:
      return tf.estimator.EstimatorSpec(
          mode, loss=loss, eval_metric_ops=metrics)

  # Create training op.
  assert mode == tf.estimator.ModeKeys.TRAIN

  optimizer = tf.train.AdagradOptimizer(learning_rate=0.1)
  train_op = optimizer.minimize(loss, global_step=tf.train.get_global_step())
  return tf.estimator.EstimatorSpec(mode, loss=loss, train_op=train_op)

### Train and Evaluate

In [None]:
outdir = 'content_based_model_trained'
shutil.rmtree(outdir, ignore_errors = True)
estimator = tf.estimator.Estimator(
    model_fn=model_fn,
    model_dir = outdir,
    params={
     'feature_columns': feature_columns,
      'hidden_units': [200, 100, 50],
      'n_classes': len(content_ids_list)
    })

train_spec = tf.estimator.TrainSpec(
    input_fn = read_dataset("training_set.csv", tf.estimator.ModeKeys.TRAIN),
    max_steps = 2000)

eval_spec = tf.estimator.EvalSpec(
    input_fn = read_dataset("test_set.csv", tf.estimator.ModeKeys.EVAL),
    steps = None,
    start_delay_secs = 30,
    throttle_secs = 60)

tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec)

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_service': None, '_task_id': 0, '_log_step_count_steps': 100, '_tf_random_seed': None, '_num_worker_replicas': 1, '_keep_checkpoint_max': 5, '_train_distribute': None, '_master': '', '_num_ps_replicas': 0, '_save_summary_steps': 100, '_task_type': 'worker', '_save_checkpoints_secs': 600, '_keep_checkpoint_every_n_hours': 10000, '_session_config': None, '_is_chief': True, '_evaluation_master': '', '_global_id_in_cluster': 0, '_save_checkpoints_steps': None, '_model_dir': 'content_based_model_trained', '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f31f0291748>}
INFO:tensorflow:Running training and evaluation locally (non-distributed).
INFO:tensorflow:Start train and evaluate loop. The evaluate will happen after 60 secs (eval_spec.throttle_secs) or training is finished.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Initialize variable input_layer/title_hub_module_embedding/module/e

INFO:tensorflow:Restoring parameters from content_based_model_trained/model.ckpt-186
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Finished evaluation at 2018-09-28-18:30:21
INFO:tensorflow:Saving dict for global step 186: accuracy = 0.03093871, global_step = 186, loss = 5.297491, top_10_accuracy = 0.23536076
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Initialize variable input_layer/title_hub_module_embedding/module/embeddings/part_0:0 from checkpoint b'/tmp/tfhub_modules/e40ef097142ae1de637df7021ce148ffe836e262/variables/variables' with embeddings
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from content_based_model_trained/model.ckpt-186
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 187 into content_based_model_trained/model.ckpt.
INFO:tens

INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from content_based_model_trained/model.ckpt-380
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 381 into content_based_model_trained/model.ckpt.
INFO:tensorflow:loss = 5.168992, step = 381
INFO:tensorflow:Saving checkpoints for 404 into content_based_model_trained/model.ckpt.
INFO:tensorflow:Loss for final step: 5.055147.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Initialize variable input_layer/title_hub_module_embedding/module/embeddings/part_0:0 from checkpoint b'/tmp/tfhub_modules/e40ef097142ae1de637df7021ce148ffe836e262/variables/variables' with embeddings
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2018-09-28-18:41:36
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from content_based_mod

INFO:tensorflow:Saving checkpoints for 561 into content_based_model_trained/model.ckpt.
INFO:tensorflow:Loss for final step: 5.0821733.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Initialize variable input_layer/title_hub_module_embedding/module/embeddings/part_0:0 from checkpoint b'/tmp/tfhub_modules/e40ef097142ae1de637df7021ce148ffe836e262/variables/variables' with embeddings
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2018-09-28-18:52:32
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from content_based_model_trained/model.ckpt-561
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Finished evaluation at 2018-09-28-18:53:35
INFO:tensorflow:Saving dict for global step 561: accuracy = 0.038868707, global_step = 561, loss = 5.160089, top_10_accuracy = 0.2549709
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Initialize variable input_layer/title_hub_module_embedding/modul

INFO:tensorflow:Starting evaluation at 2018-09-28-19:04:20
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from content_based_model_trained/model.ckpt-680
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Finished evaluation at 2018-09-28-19:05:29
INFO:tensorflow:Saving dict for global step 680: accuracy = 0.04156412, global_step = 680, loss = 5.1384516, top_10_accuracy = 0.2675495
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Initialize variable input_layer/title_hub_module_embedding/module/embeddings/part_0:0 from checkpoint b'/tmp/tfhub_modules/e40ef097142ae1de637df7021ce148ffe836e262/variables/variables' with embeddings
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from content_based_model_trained/model.ckpt-680
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
I

INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Initialize variable input_layer/title_hub_module_embedding/module/embeddings/part_0:0 from checkpoint b'/tmp/tfhub_modules/e40ef097142ae1de637df7021ce148ffe836e262/variables/variables' with embeddings
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from content_based_model_trained/model.ckpt-803
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 804 into content_based_model_trained/model.ckpt.
INFO:tensorflow:loss = 4.923583, step = 804
INFO:tensorflow:Saving checkpoints for 842 into content_based_model_trained/model.ckpt.
INFO:tensorflow:Loss for final step: 5.0419245.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Initialize variable input_layer/title_hub_module_embedding/module/embeddings/part_0:0 from checkpoint b'/tmp/tfhub_modules/e40ef097142ae1de637d

INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 955 into content_based_model_trained/model.ckpt.
INFO:tensorflow:loss = 4.870165, step = 955
INFO:tensorflow:Saving checkpoints for 978 into content_based_model_trained/model.ckpt.
INFO:tensorflow:Loss for final step: 4.842451.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Initialize variable input_layer/title_hub_module_embedding/module/embeddings/part_0:0 from checkpoint b'/tmp/tfhub_modules/e40ef097142ae1de637df7021ce148ffe836e262/variables/variables' with embeddings
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2018-09-28-19:30:00
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from content_based_model_trained/model.ckpt-978
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Finished evaluation at 2018-09-28-19:31:07
INFO:tensorflow:Saving dict for globa

INFO:tensorflow:Initialize variable input_layer/title_hub_module_embedding/module/embeddings/part_0:0 from checkpoint b'/tmp/tfhub_modules/e40ef097142ae1de637df7021ce148ffe836e262/variables/variables' with embeddings
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2018-09-28-19:41:38
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from content_based_model_trained/model.ckpt-1101
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Finished evaluation at 2018-09-28-19:42:46
INFO:tensorflow:Saving dict for global step 1101: accuracy = 0.04328294, global_step = 1101, loss = 5.0836353, top_10_accuracy = 0.29489434
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Initialize variable input_layer/title_hub_module_embedding/module/embeddings/part_0:0 from checkpoint b'/tmp/tfhub_modules/e40ef097142ae1de637df7021ce148ffe836e262/variables/variables' with embeddings
INFO:tensorflow:Done calling 

INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Finished evaluation at 2018-09-28-19:54:43
INFO:tensorflow:Saving dict for global step 1230: accuracy = 0.04453299, global_step = 1230, loss = 5.074096, top_10_accuracy = 0.2948162
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Initialize variable input_layer/title_hub_module_embedding/module/embeddings/part_0:0 from checkpoint b'/tmp/tfhub_modules/e40ef097142ae1de637df7021ce148ffe836e262/variables/variables' with embeddings
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from content_based_model_trained/model.ckpt-1230
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 1231 into content_based_model_trained/model.ckpt.
INFO:tensorflow:loss = 4.8149595, step = 1231
INFO:tensorflow:Saving checkpoints for 1255 i

INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from content_based_model_trained/model.ckpt-1355
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 1356 into content_based_model_trained/model.ckpt.
INFO:tensorflow:loss = 4.7464476, step = 1356
INFO:tensorflow:Saving checkpoints for 1378 into content_based_model_trained/model.ckpt.
INFO:tensorflow:Loss for final step: 4.751824.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Initialize variable input_layer/title_hub_module_embedding/module/embeddings/part_0:0 from checkpoint b'/tmp/tfhub_modules/e40ef097142ae1de637df7021ce148ffe836e262/variables/variables' with embeddings
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2018-09-28-20:07:53
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from content_base

INFO:tensorflow:loss = 4.6713066, step = 1476
INFO:tensorflow:Saving checkpoints for 1501 into content_based_model_trained/model.ckpt.
INFO:tensorflow:Loss for final step: 4.8481097.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Initialize variable input_layer/title_hub_module_embedding/module/embeddings/part_0:0 from checkpoint b'/tmp/tfhub_modules/e40ef097142ae1de637df7021ce148ffe836e262/variables/variables' with embeddings
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2018-09-28-20:20:07
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from content_based_model_trained/model.ckpt-1501
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Finished evaluation at 2018-09-28-20:21:06
INFO:tensorflow:Saving dict for global step 1501: accuracy = 0.049025353, global_step = 1501, loss = 5.0515175, top_10_accuracy = 0.30770734
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Initialize v

INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2018-09-28-20:31:28
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from content_based_model_trained/model.ckpt-1634
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Finished evaluation at 2018-09-28-20:32:28
INFO:tensorflow:Saving dict for global step 1634: accuracy = 0.04660338, global_step = 1634, loss = 5.049248, top_10_accuracy = 0.31020743
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Initialize variable input_layer/title_hub_module_embedding/module/embeddings/part_0:0 from checkpoint b'/tmp/tfhub_modules/e40ef097142ae1de637df7021ce148ffe836e262/variables/variables' with embeddings
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from content_based_model_trained/model.ckpt-1634
INFO:tensorflow:Running local_init_op.
IN

INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Initialize variable input_layer/title_hub_module_embedding/module/embeddings/part_0:0 from checkpoint b'/tmp/tfhub_modules/e40ef097142ae1de637df7021ce148ffe836e262/variables/variables' with embeddings
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from content_based_model_trained/model.ckpt-1768
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 1769 into content_based_model_trained/model.ckpt.
INFO:tensorflow:loss = 4.635854, step = 1769
INFO:tensorflow:Saving checkpoints for 1793 into content_based_model_trained/model.ckpt.
INFO:tensorflow:Loss for final step: 4.7237873.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Initialize variable input_layer/title_hub_module_embedding/module/embeddings/part_0:0 from checkpoint b'/tmp/tfhub_modules/e40ef097142ae1de

This takes a while to complete but in the end, I get about **30% top 10 accuracy**.

### Make predictions with the trained model. 

With the model now trained, we can make predictions by calling the predict method on the estimator. Let's look at how our model predicts on the first five examples of the training set.  
To start, we'll create a new file 'first_5.csv' which contains the first five elements of our training set. We'll also save the target values to a file 'first_5_content_ids' so we can compare our results. 

In [12]:
%%bash
head -5 training_set.csv > first_5.csv
head first_5.csv
awk -F "\"*,\"*" '{print $2}' first_5.csv > first_5_content_ids

1004555043399129313,299833840,News,"Ehemalige Obdachlose: ""Für mich war die Gruft eine Familie""",Julia Schrenk,574,299837992
1004555043399129313,299837992,Stars & Kultur,Das erste TV-Interview von Prinz Harry & Meghan Markle ,Christina Michlits,574,299824032
1004555043399129313,299824032,Stars & Kultur,YouTube: Schwere Probleme mit verstörenden Kindervideos,Georg Leyrer,574,299836255
1004555043399129313,299836255,News,Blümel Kneissl &Co.: Das sind die Fixstarter,,574,299836841
1004555043399129313,299836841,News,"ÖVP will Studiengebühren FPÖ in Verhandlungen ""flexibel""",Raffaela Lindorfer,574,299899819


Recall, to make predictions on the trained model we pass a list of examples through the input function. Complete the code below to make predicitons on the examples contained in the "first_5.csv" file we created above. 

In [13]:
output = list(estimator.predict(input_fn=read_dataset("first_5.csv", tf.estimator.ModeKeys.PREDICT)))

INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Initialize variable input_layer/title_hub_module_embedding/module/embeddings/part_0:0 from checkpoint b'/tmp/tfhub_modules/e40ef097142ae1de637df7021ce148ffe836e262/variables/variables' with embeddings
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from content_based_model_trained/model.ckpt-500
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.


In [14]:
import numpy as np
recommended_content_ids = [np.asscalar(d["class_names"]).decode('UTF-8') for d in output]
content_ids = open("first_5_content_ids").read().splitlines()

Finally, we map the content id back to the article title. Let's compare our model's recommendation for the first example. This can be done in BigQuery. Look through the query below and make sure it is clear what is being returned.

In [17]:
import google.datalab.bigquery as bq
recommended_title_sql="""
#standardSQL
SELECT
(SELECT MAX(IF(index=6, value, NULL)) FROM UNNEST(hits.customDimensions)) AS title
FROM `cloud-training-demos.GA360_test.ga_sessions_sample`,   
  UNNEST(hits) AS hits
WHERE 
  # only include hits on pages
  hits.type = "PAGE"
  AND (SELECT MAX(IF(index=10, value, NULL)) FROM UNNEST(hits.customDimensions)) = \"{}\"
LIMIT 1""".format(recommended_content_ids[0])
current_title_sql="""
#standardSQL
SELECT
(SELECT MAX(IF(index=6, value, NULL)) FROM UNNEST(hits.customDimensions)) AS title
FROM `cloud-training-demos.GA360_test.ga_sessions_sample`,   
  UNNEST(hits) AS hits
WHERE 
  # only include hits on pages
  hits.type = "PAGE"
  AND (SELECT MAX(IF(index=10, value, NULL)) FROM UNNEST(hits.customDimensions)) = \"{}\"
LIMIT 1""".format(content_ids[0])
recommended_title = bq.Query(recommended_title_sql).execute().result().to_dataframe()['title'].tolist()[0]
current_title = bq.Query(current_title_sql).execute().result().to_dataframe()['title'].tolist()[0]
print("Current title: {} ".format(current_title))
print("Recommended title: {}".format(recommended_title))

Current title: Ehemalige Obdachlose: "Für mich war die Gruft eine Familie" 
Recommended title: Blümel, Kneissl &Co.: Das sind die Fixstarter


### Tensorboard

As usual, we can monitor the performance of our training job using Tensorboard. 

In [81]:
from google.datalab.ml import TensorBoard
TensorBoard().start('./content_based_model_trained')

In [None]:
for pid in TensorBoard.list()['pid']:
  TensorBoard().stop(pid)
  print("Stopped TensorBoard with pid {}".format(pid))