## ニューラルネットワークを使用したコンテンツベースのフィルタリング

このノートブックは、[content_based_preproc.ipynb]（./ content_based_preproc.ipynb）ノートブックで作成されたファイルに依存しています。このノートブックを完成させる前に、必ずそこでコードを実行してください。
また、これからは** python3 **カーネルを使用するので、カーネルがまだPython2である場合は、カーネルを変更することを忘れないでください。

このラボでは、次のことを説明します。
1.tf.feature_columnを使用してモデルのフィーチャ列を作成する方法
2.カスタム評価メトリックを作成してTensorboardに追加する方法
3.モデルをトレーニングし、保存されたモデルを使用して予測を行う方法

Tensorflowハブはすでにインストールされているはずです。 「pip freeze」で確認できます。

In [1]:
%%bash
pip freeze | grep tensor

tensorboard==2.7.0
tensorboard-data-server==0.6.1
tensorboard-plugin-wit==1.8.0
tensorflow @ file:///opt/conda/conda-bld/dlenv-tf-1-15-cpu_1637198803162/work/tensorflow-1.15.5-cp37-cp37m-linux_x86_64.whl
tensorflow-cloud==0.1.13
tensorflow-data-validation==0.23.1
tensorflow-datasets==1.2.0
tensorflow-estimator==1.15.1
tensorflow-hub==0.6.0
tensorflow-io==0.8.1
tensorflow-metadata==0.23.0
tensorflow-model-analysis==0.23.0
tensorflow-probability==0.8.0
tensorflow-serving-api==1.15.0
tensorflow-transform==0.23.0


必要なバージョンのtensorflow-hubがインストールされていることを確認しましょう。以下のpipインストールを実行した後、ノートブックの**「カーネルの再起動」**をクリックして、Python環境が新しいパッケージを取得するようにします。

In [2]:
!pip3 install tensorflow-hub==0.7.0
!pip3 install --upgrade tensorflow==1.15.3
!pip3 install google-cloud-bigquery==1.10

Collecting tensorflow-hub==0.7.0
  Downloading tensorflow_hub-0.7.0-py2.py3-none-any.whl (89 kB)
     |████████████████████████████████| 89 kB 5.4 MB/s             
Installing collected packages: tensorflow-hub
  Attempting uninstall: tensorflow-hub
    Found existing installation: tensorflow-hub 0.6.0
    Uninstalling tensorflow-hub-0.6.0:
      Successfully uninstalled tensorflow-hub-0.6.0
Successfully installed tensorflow-hub-0.7.0
Collecting tensorflow==1.15.3
  Downloading tensorflow-1.15.3-cp37-cp37m-manylinux2010_x86_64.whl (110.5 MB)
     |████████████████████████████████| 110.5 MB 24 kB/s              
Collecting tensorboard<1.16.0,>=1.15.0
  Downloading tensorboard-1.15.0-py3-none-any.whl (3.8 MB)
     |████████████████████████████████| 3.8 MB 37.3 MB/s            
Installing collected packages: tensorboard, tensorflow
  Attempting uninstall: tensorboard
    Found existing installation: tensorboard 2.7.0
    Uninstalling tensorboard-2.7.0:
      Successfully uninstalled tenso

#### **注**：非互換性の警告とエラーを無視し、セルを再実行して、インストールされているテンソルフローのバージョンを表示してください。

In [1]:
import os
import tensorflow as tf
import numpy as np
import tensorflow_hub as hub
import shutil

PROJECT = 'qwiklabs-gcp-00-c6695d766645' # プロジェクトIDと交換してください
BUCKET = 'qwiklabs-gcp-00-c6695d766645' # バケット名に置き換えてください
REGION = 'us-central1' # バケット領域と交換してください。例： us-central1

# これらを変更しないでください
os.environ['PROJECT'] = PROJECT
os.environ['BUCKET'] = BUCKET
os.environ['REGION'] = REGION
os.environ['TFVERSION'] = '1.15.3'

In [2]:
%%bash
gcloud config set project $PROJECT
gcloud config set compute/region $REGION

Updated property [core/project].
Updated property [compute/region].


### モデルの特徴量の列を作成します。

**データセットの作成**まず、前のノートブックで作成したカテゴリ、作成者、記事IDのリストを読み込みます。

In [4]:
categories_list = open("categories.txt").read().splitlines()
authors_list = open("authors.txt").read().splitlines()
content_ids_list = open("content_ids.txt").read().splitlines()
mean_months_since_epoch = 523

下のセルで、モデルで使用するフィーチャ列を定義します。必要に応じて、使用する[さまざまな機能列]（https://www.tensorflow.org/api_docs/python/tf/feature_column）を思い出してください。
Embedded_title_column機能列の場合、Tensorflowハブモジュールを使用して記事タイトルの埋め込みを作成します。記事とタイトルはドイツ語であるため、ドイツ語の埋め込みモジュールを使用することをお勧めします。
Tensorflow Hubモジュールを埋め込んだテキスト[ここで入手可能]（https://alpha.tfhub.dev/）をご覧ください。言語を「ドイツ語」に設定してフィルタリングします。私たちの目的には、50次元の埋め込みで十分です。

In [5]:
embedded_title_column = hub.text_embedding_column(
    key="title", 
    module_spec="https://tfhub.dev/google/nnlm-de-dim50/1",
    trainable=False)

content_id_column = tf.feature_column.categorical_column_with_hash_bucket(
    key="content_id",
    hash_bucket_size= len(content_ids_list) + 1)
embedded_content_column = tf.feature_column.embedding_column(
    categorical_column=content_id_column,
    dimension=10)

author_column = tf.feature_column.categorical_column_with_hash_bucket(key="author",
    hash_bucket_size=len(authors_list) + 1)
embedded_author_column = tf.feature_column.embedding_column(
    categorical_column=author_column,
    dimension=3)

category_column_categorical = tf.feature_column.categorical_column_with_vocabulary_list(
    key="category",
    vocabulary_list=categories_list,
    num_oov_buckets=1)
category_column = tf.feature_column.indicator_column(category_column_categorical)

months_since_epoch_boundaries = list(range(400,700,20))
months_since_epoch_column = tf.feature_column.numeric_column(
    key="months_since_epoch")
months_since_epoch_bucketized = tf.feature_column.bucketized_column(
    source_column = months_since_epoch_column,
    boundaries = months_since_epoch_boundaries)

crossed_months_since_category_column = tf.feature_column.indicator_column(tf.feature_column.crossed_column(
  keys = [category_column_categorical, months_since_epoch_bucketized], 
  hash_bucket_size = len(months_since_epoch_boundaries) * (len(categories_list) + 1)))

feature_columns = [embedded_content_column,
                   embedded_author_column,
                   category_column,
                   embedded_title_column,
                   crossed_months_since_category_column] 

### 入力関数を作成します。

次に、モデルの入力関数を作成します。この入力関数は、前のラボで作成したcsvファイルからデータを読み取ります。

In [6]:
record_defaults = [["Unknown"], ["Unknown"],["Unknown"],["Unknown"],["Unknown"],[mean_months_since_epoch],["Unknown"]]
column_keys = ["visitor_id", "content_id", "category", "title", "author", "months_since_epoch", "next_content_id"]
label_key = "next_content_id"
def read_dataset(filename, mode, batch_size = 512):
  def _input_fn():
      def decode_csv(value_column):
          columns = tf.decode_csv(value_column,record_defaults=record_defaults)
          features = dict(zip(column_keys, columns))          
          label = features.pop(label_key)         
          return features, label

      # パターンに一致するファイルのリストを作成する
      file_list = tf.io.gfile.glob(filename)

      #ファイルリストからデータセットを作成する
      dataset = tf.data.TextLineDataset(file_list).map(decode_csv)

      if mode == tf.estimator.ModeKeys.TRAIN:
          num_epochs = None # 無期限に
          dataset = dataset.shuffle(buffer_size = 10 * batch_size)
      else:
          num_epochs = 1 # この後の入力の終わり

      dataset = dataset.repeat(num_epochs).batch(batch_size)
      return dataset.make_one_shot_iterator().get_next()
  return _input_fn

### モデルを作成し、トレーニング/評価します


次に、Kurier.atWebサイトへの訪問者に記事を推奨するモデルを作成します。以下のコードを確認してください。 input_layer機能列を使用して、ネットワークへの高密度入力レイヤーを作成します。これは、非表示のユニットの数をパラメーターとして調整できる単一層のネットワークです。

現在、予測された「次の記事」と訪問者が次に読んだ実際の「次の記事」の間の精度を計算しています。また、モデルを評価するために、上位10の精度のパフォーマンスメトリックを追加します。これを実現するために、上位10個の精度メトリックを計算し、それを以下のメトリックディクショナリに追加し、tf.summaryに追加して、この値がTensorboardにも報告されるようにします。

In [7]:
def model_fn(features, labels, mode, params):
  net = tf.feature_column.input_layer(features, params['feature_columns'])
  for units in params['hidden_units']:
        net = tf.layers.dense(net, units=units, activation=tf.nn.relu)
   # Compute logits (1 per class).
  logits = tf.layers.dense(net, params['n_classes'], activation=None) 

  predicted_classes = tf.argmax(logits, 1)
  from tensorflow.python.lib.io import file_io
    
  with file_io.FileIO('content_ids.txt', mode='r') as ifp:
    content = tf.constant([x.rstrip() for x in ifp])
  predicted_class_names = tf.gather(content, predicted_classes)
  if mode == tf.estimator.ModeKeys.PREDICT:
    predictions = {
        'class_ids': predicted_classes[:, tf.newaxis],
        'class_names' : predicted_class_names[:, tf.newaxis],
        'probabilities': tf.nn.softmax(logits),
        'logits': logits,
    }
    return tf.estimator.EstimatorSpec(mode, predictions=predictions)
  table = tf.contrib.lookup.index_table_from_file(vocabulary_file="content_ids.txt")
  labels = table.lookup(labels)
  # 損失の計算.
  loss = tf.losses.sparse_softmax_cross_entropy(labels=labels, logits=logits)

  # 評価指標を計算します。
  accuracy = tf.metrics.accuracy(labels=labels,
                                 predictions=predicted_classes,
                                 name='acc_op')
  top_10_accuracy = tf.metrics.mean(tf.nn.in_top_k(predictions=logits, 
                                                   targets=labels, 
                                                   k=10))
  
  metrics = {
    'accuracy': accuracy,
    'top_10_accuracy' : top_10_accuracy}
  
  tf.summary.scalar('accuracy', accuracy[1])
  tf.summary.scalar('top_10_accuracy', top_10_accuracy[1])

  if mode == tf.estimator.ModeKeys.EVAL:
      return tf.estimator.EstimatorSpec(
          mode, loss=loss, eval_metric_ops=metrics)

  # トレーニング操作を作成します。
  assert mode == tf.estimator.ModeKeys.TRAIN

  optimizer = tf.train.AdagradOptimizer(learning_rate=0.1)
  train_op = optimizer.minimize(loss, global_step=tf.train.get_global_step())
  return tf.estimator.EstimatorSpec(mode, loss=loss, train_op=train_op)

### トレーニングと評価

In [8]:
outdir = 'content_based_model_trained'
shutil.rmtree(outdir, ignore_errors = True) # 毎回新しく始める
#tf.summary.FileWriterCache.clear() # TensorBoardイベントファイルのファイルライターキャッシュがクリアされていることを確認します
estimator = tf.estimator.Estimator(
    model_fn=model_fn,
    model_dir = outdir,
    params={
     'feature_columns': feature_columns,
      'hidden_units': [200, 100, 50],
      'n_classes': len(content_ids_list)
    })

train_spec = tf.estimator.TrainSpec(
    input_fn = read_dataset("training_set.csv", tf.estimator.ModeKeys.TRAIN),
    max_steps = 2000)

eval_spec = tf.estimator.EvalSpec(
    input_fn = read_dataset("test_set.csv", tf.estimator.ModeKeys.EVAL),
    steps = None,
    start_delay_secs = 30,
    throttle_secs = 60)

tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec)

INFO:tensorflow:Using default config.


INFO:tensorflow:Using default config.


INFO:tensorflow:Using config: {'_model_dir': 'content_based_model_trained', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7fe181a5ce50>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}


INFO:tensorflow:Using config: {'_model_dir': 'content_based_model_trained', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7fe181a5ce50>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}


INFO:tensorflow:Not using Distribute Coordinator.


INFO:tensorflow:Not using Distribute Coordinator.


INFO:tensorflow:Running training and evaluation locally (non-distributed).


INFO:tensorflow:Running training and evaluation locally (non-distributed).


INFO:tensorflow:Start train and evaluate loop. The evaluate will happen after every checkpoint. Checkpoint frequency is determined based on RunConfig arguments: save_checkpoints_steps None or save_checkpoints_secs 600.


INFO:tensorflow:Start train and evaluate loop. The evaluate will happen after every checkpoint. Checkpoint frequency is determined based on RunConfig arguments: save_checkpoints_steps None or save_checkpoints_secs 600.


Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.


Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.








Instructions for updating:
Use `for ... in dataset:` to iterate over a dataset. If using `tf.estimator`, return the `Dataset` object directly from your input function. As a last resort, you can use `tf.compat.v1.data.make_one_shot_iterator(dataset)`.


Instructions for updating:
Use `for ... in dataset:` to iterate over a dataset. If using `tf.estimator`, return the `Dataset` object directly from your input function. As a last resort, you can use `tf.compat.v1.data.make_one_shot_iterator(dataset)`.


INFO:tensorflow:Calling model_fn.


INFO:tensorflow:Calling model_fn.








Instructions for updating:
The old _FeatureColumn APIs are being deprecated. Please use the new FeatureColumn APIs instead.


Instructions for updating:
The old _FeatureColumn APIs are being deprecated. Please use the new FeatureColumn APIs instead.


Instructions for updating:
The old _FeatureColumn APIs are being deprecated. Please use the new FeatureColumn APIs instead.


Instructions for updating:
The old _FeatureColumn APIs are being deprecated. Please use the new FeatureColumn APIs instead.


Instructions for updating:
The old _FeatureColumn APIs are being deprecated. Please use the new FeatureColumn APIs instead.


Instructions for updating:
The old _FeatureColumn APIs are being deprecated. Please use the new FeatureColumn APIs instead.


Instructions for updating:
The old _FeatureColumn APIs are being deprecated. Please use the new FeatureColumn APIs instead.


Instructions for updating:
The old _FeatureColumn APIs are being deprecated. Please use the new FeatureColumn APIs instead.


Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where


Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where


Instructions for updating:
The old _FeatureColumn APIs are being deprecated. Please use the new FeatureColumn APIs instead.


Instructions for updating:
The old _FeatureColumn APIs are being deprecated. Please use the new FeatureColumn APIs instead.


Instructions for updating:
The old _FeatureColumn APIs are being deprecated. Please use the new FeatureColumn APIs instead.


Instructions for updating:
The old _FeatureColumn APIs are being deprecated. Please use the new FeatureColumn APIs instead.


Instructions for updating:
The old _FeatureColumn APIs are being deprecated. Please use the new FeatureColumn APIs instead.


Instructions for updating:
The old _FeatureColumn APIs are being deprecated. Please use the new FeatureColumn APIs instead.


Instructions for updating:
The old _FeatureColumn APIs are being deprecated. Please use the new FeatureColumn APIs instead.


Instructions for updating:
The old _FeatureColumn APIs are being deprecated. Please use the new FeatureColumn APIs instead.


Instructions for updating:
The old _FeatureColumn APIs are being deprecated. Please use the new FeatureColumn APIs instead.


Instructions for updating:
The old _FeatureColumn APIs are being deprecated. Please use the new FeatureColumn APIs instead.


Instructions for updating:
The old _FeatureColumn APIs are being deprecated. Please use the new FeatureColumn APIs instead.


Instructions for updating:
The old _FeatureColumn APIs are being deprecated. Please use the new FeatureColumn APIs instead.


Instructions for updating:
The old _FeatureColumn APIs are being deprecated. Please use the new FeatureColumn APIs instead.


Instructions for updating:
The old _FeatureColumn APIs are being deprecated. Please use the new FeatureColumn APIs instead.


Instructions for updating:
The old _FeatureColumn APIs are being deprecated. Please use the new FeatureColumn APIs instead.


Instructions for updating:
The old _FeatureColumn APIs are being deprecated. Please use the new FeatureColumn APIs instead.


Instructions for updating:
The old _FeatureColumn APIs are being deprecated. Please use the new FeatureColumn APIs instead.


Instructions for updating:
The old _FeatureColumn APIs are being deprecated. Please use the new FeatureColumn APIs instead.


Instructions for updating:
The old _FeatureColumn APIs are being deprecated. Please use the new FeatureColumn APIs instead.


Instructions for updating:
The old _FeatureColumn APIs are being deprecated. Please use the new FeatureColumn APIs instead.


Instructions for updating:
The old _FeatureColumn APIs are being deprecated. Please use the new FeatureColumn APIs instead.


Instructions for updating:
The old _FeatureColumn APIs are being deprecated. Please use the new FeatureColumn APIs instead.


Instructions for updating:
The old _FeatureColumn APIs are being deprecated. Please use the new FeatureColumn APIs instead.


Instructions for updating:
The old _FeatureColumn APIs are being deprecated. Please use the new FeatureColumn APIs instead.


Instructions for updating:
The old _FeatureColumn APIs are being deprecated. Please use the new FeatureColumn APIs instead.


Instructions for updating:
The old _FeatureColumn APIs are being deprecated. Please use the new FeatureColumn APIs instead.
2021-11-21 07:49:41.218227: W tensorflow/core/graph/graph_constructor.cc:1491] Importing a graph with a lower producer version 26 into an existing graph with producer version 134. Shape inference will have run different parts of the graph with different producer versions.


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


Instructions for updating:
Use keras.layers.Dense instead.


Instructions for updating:
Use keras.layers.Dense instead.


Instructions for updating:
Please use `layer.__call__` method instead.


Instructions for updating:
Please use `layer.__call__` method instead.


The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.



The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.







































Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor


Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor


INFO:tensorflow:Done calling model_fn.


INFO:tensorflow:Done calling model_fn.


INFO:tensorflow:Create CheckpointSaverHook.


INFO:tensorflow:Create CheckpointSaverHook.


INFO:tensorflow:Graph was finalized.


INFO:tensorflow:Graph was finalized.
2021-11-21 07:49:41.849424: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2021-11-21 07:49:41.856490: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2200155000 Hz
2021-11-21 07:49:41.856824: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x564f02912020 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2021-11-21 07:49:41.856852: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version


INFO:tensorflow:Running local_init_op.


INFO:tensorflow:Running local_init_op.


INFO:tensorflow:Done running local_init_op.


INFO:tensorflow:Done running local_init_op.


INFO:tensorflow:Saving checkpoints for 0 into content_based_model_trained/model.ckpt.


INFO:tensorflow:Saving checkpoints for 0 into content_based_model_trained/model.ckpt.


INFO:tensorflow:loss = 9.656897, step = 1


INFO:tensorflow:loss = 9.656897, step = 1


INFO:tensorflow:global_step/sec: 8.24184


INFO:tensorflow:global_step/sec: 8.24184


INFO:tensorflow:loss = 6.1363354, step = 101 (12.135 sec)


INFO:tensorflow:loss = 6.1363354, step = 101 (12.135 sec)


INFO:tensorflow:global_step/sec: 8.27661


INFO:tensorflow:global_step/sec: 8.27661


INFO:tensorflow:loss = 4.9411354, step = 201 (12.082 sec)


INFO:tensorflow:loss = 4.9411354, step = 201 (12.082 sec)


INFO:tensorflow:global_step/sec: 8.2732


INFO:tensorflow:global_step/sec: 8.2732


INFO:tensorflow:loss = 4.850307, step = 301 (12.087 sec)


INFO:tensorflow:loss = 4.850307, step = 301 (12.087 sec)


INFO:tensorflow:global_step/sec: 8.28069


INFO:tensorflow:global_step/sec: 8.28069


INFO:tensorflow:loss = 4.5423856, step = 401 (12.080 sec)


INFO:tensorflow:loss = 4.5423856, step = 401 (12.080 sec)


INFO:tensorflow:global_step/sec: 8.55097


INFO:tensorflow:global_step/sec: 8.55097


INFO:tensorflow:loss = 5.443879, step = 501 (11.691 sec)


INFO:tensorflow:loss = 5.443879, step = 501 (11.691 sec)


INFO:tensorflow:global_step/sec: 8.47973


INFO:tensorflow:global_step/sec: 8.47973


INFO:tensorflow:loss = 5.652931, step = 601 (11.793 sec)


INFO:tensorflow:loss = 5.652931, step = 601 (11.793 sec)


INFO:tensorflow:global_step/sec: 8.31516


INFO:tensorflow:global_step/sec: 8.31516


INFO:tensorflow:loss = 4.7115107, step = 701 (12.026 sec)


INFO:tensorflow:loss = 4.7115107, step = 701 (12.026 sec)


INFO:tensorflow:global_step/sec: 8.54508


INFO:tensorflow:global_step/sec: 8.54508


INFO:tensorflow:loss = 4.6385307, step = 801 (11.703 sec)


INFO:tensorflow:loss = 4.6385307, step = 801 (11.703 sec)


INFO:tensorflow:global_step/sec: 8.40367


INFO:tensorflow:global_step/sec: 8.40367


INFO:tensorflow:loss = 4.08313, step = 901 (11.899 sec)


INFO:tensorflow:loss = 4.08313, step = 901 (11.899 sec)


INFO:tensorflow:global_step/sec: 7.9398


INFO:tensorflow:global_step/sec: 7.9398


INFO:tensorflow:loss = 5.4661603, step = 1001 (12.599 sec)


INFO:tensorflow:loss = 5.4661603, step = 1001 (12.599 sec)


INFO:tensorflow:global_step/sec: 8.32209


INFO:tensorflow:global_step/sec: 8.32209


INFO:tensorflow:loss = 4.4864902, step = 1101 (12.015 sec)


INFO:tensorflow:loss = 4.4864902, step = 1101 (12.015 sec)


INFO:tensorflow:global_step/sec: 8.37732


INFO:tensorflow:global_step/sec: 8.37732


INFO:tensorflow:loss = 4.9270735, step = 1201 (11.934 sec)


INFO:tensorflow:loss = 4.9270735, step = 1201 (11.934 sec)


INFO:tensorflow:global_step/sec: 8.30727


INFO:tensorflow:global_step/sec: 8.30727


INFO:tensorflow:loss = 4.439883, step = 1301 (12.038 sec)


INFO:tensorflow:loss = 4.439883, step = 1301 (12.038 sec)


INFO:tensorflow:global_step/sec: 8.32406


INFO:tensorflow:global_step/sec: 8.32406


INFO:tensorflow:loss = 5.42249, step = 1401 (12.013 sec)


INFO:tensorflow:loss = 5.42249, step = 1401 (12.013 sec)


INFO:tensorflow:global_step/sec: 8.47757


INFO:tensorflow:global_step/sec: 8.47757


INFO:tensorflow:loss = 5.2785273, step = 1501 (11.796 sec)


INFO:tensorflow:loss = 5.2785273, step = 1501 (11.796 sec)


INFO:tensorflow:global_step/sec: 8.52596


INFO:tensorflow:global_step/sec: 8.52596


INFO:tensorflow:loss = 4.7937737, step = 1601 (11.729 sec)


INFO:tensorflow:loss = 4.7937737, step = 1601 (11.729 sec)


INFO:tensorflow:global_step/sec: 8.32757


INFO:tensorflow:global_step/sec: 8.32757


INFO:tensorflow:loss = 4.5518284, step = 1701 (12.008 sec)


INFO:tensorflow:loss = 4.5518284, step = 1701 (12.008 sec)


INFO:tensorflow:global_step/sec: 8.29552


INFO:tensorflow:global_step/sec: 8.29552


INFO:tensorflow:loss = 4.217218, step = 1801 (12.055 sec)


INFO:tensorflow:loss = 4.217218, step = 1801 (12.055 sec)


INFO:tensorflow:global_step/sec: 8.28686


INFO:tensorflow:global_step/sec: 8.28686


INFO:tensorflow:loss = 5.479928, step = 1901 (12.067 sec)


INFO:tensorflow:loss = 5.479928, step = 1901 (12.067 sec)


INFO:tensorflow:Saving checkpoints for 2000 into content_based_model_trained/model.ckpt.


INFO:tensorflow:Saving checkpoints for 2000 into content_based_model_trained/model.ckpt.


INFO:tensorflow:Calling model_fn.


INFO:tensorflow:Calling model_fn.


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


2021-11-21 07:53:44.725098: W tensorflow/core/graph/graph_constructor.cc:1491] Importing a graph with a lower producer version 26 into an existing graph with producer version 134. Shape inference will have run different parts of the graph with different producer versions.
INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Done calling model_fn.


INFO:tensorflow:Done calling model_fn.


INFO:tensorflow:Starting evaluation at 2021-11-21T07:53:44Z


INFO:tensorflow:Starting evaluation at 2021-11-21T07:53:44Z


INFO:tensorflow:Graph was finalized.


INFO:tensorflow:Graph was finalized.


INFO:tensorflow:Restoring parameters from content_based_model_trained/model.ckpt-2000


INFO:tensorflow:Restoring parameters from content_based_model_trained/model.ckpt-2000


INFO:tensorflow:Running local_init_op.


INFO:tensorflow:Running local_init_op.


INFO:tensorflow:Done running local_init_op.


INFO:tensorflow:Done running local_init_op.


INFO:tensorflow:Finished evaluation at 2021-11-21-07:53:50


INFO:tensorflow:Finished evaluation at 2021-11-21-07:53:50


INFO:tensorflow:Saving dict for global step 2000: accuracy = 0.035782646, global_step = 2000, loss = 5.1037364, top_10_accuracy = 0.26395562


INFO:tensorflow:Saving dict for global step 2000: accuracy = 0.035782646, global_step = 2000, loss = 5.1037364, top_10_accuracy = 0.26395562


INFO:tensorflow:Saving 'checkpoint_path' summary for global step 2000: content_based_model_trained/model.ckpt-2000


INFO:tensorflow:Saving 'checkpoint_path' summary for global step 2000: content_based_model_trained/model.ckpt-2000


INFO:tensorflow:Loss for final step: 4.6135645.


INFO:tensorflow:Loss for final step: 4.6135645.


({'accuracy': 0.035782646,
  'loss': 5.1037364,
  'top_10_accuracy': 0.26395562,
  'global_step': 2000},
 [])

これが完了するまでにはしばらく時間がかかりますが、最終的には、約** 30％の上位10の精度**が得られます。

### トレーニングされたモデルを使用して予測を行います。

モデルがトレーニングされたので、推定器でpredictメソッドを呼び出すことで予測を行うことができます。トレーニングセットの最初の5つの例で、モデルがどのように予測するかを見てみましょう。
まず、トレーニングセットの最初の5つの要素を含む新しいファイル「first_5.csv」を作成します。また、結果を比較できるように、ターゲット値をファイル「first_5_content_ids」に保存します。

In [8]:
%%bash
head -5 training_set.csv > first_5.csv
head first_5.csv
awk -F "\"*,\"*" '{print $2}' first_5.csv > first_5_content_ids

1000593816586876859,230814320,Stars & Kultur,Kritik an Meghan Markle immer lauter,Elisabeth Spitzer,562,299837992
1001769331926555188,299836255,News,Blümel Kneissl &Co.: Das sind die Fixstarter,,574,299826767
1001769331926555188,299826767,Lifestyle,Titanic-Regisseur: Darum musste Jack sterben,Elisabeth Mittendorfer,574,299921761
1001769331926555188,299912085,News,Erster ÖBB-Containerzug nach China unterwegs,Stefan Hofer,574,299836841
1001769331926555188,299836841,News,"ÖVP will Studiengebühren FPÖ in Verhandlungen ""flexibel""",Raffaela Lindorfer,574,299915880


トレーニング済みモデルで予測を行うために、入力関数を介して例のリストを渡すことを思い出してください。以下のコードを完成させて、上記で作成した「first_5.csv」ファイルに含まれる例を予測します。

In [9]:
output = list(estimator.predict(input_fn=read_dataset("first_5.csv", tf.estimator.ModeKeys.PREDICT)))

INFO:tensorflow:Calling model_fn.


INFO:tensorflow:Calling model_fn.


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Done calling model_fn.


INFO:tensorflow:Done calling model_fn.


INFO:tensorflow:Graph was finalized.


INFO:tensorflow:Graph was finalized.


INFO:tensorflow:Restoring parameters from content_based_model_trained/model.ckpt-2000


INFO:tensorflow:Restoring parameters from content_based_model_trained/model.ckpt-2000


INFO:tensorflow:Running local_init_op.


INFO:tensorflow:Running local_init_op.


INFO:tensorflow:Done running local_init_op.


INFO:tensorflow:Done running local_init_op.


In [10]:
import numpy as np
recommended_content_ids = [np.asscalar(d["class_names"]).decode('UTF-8') for d in output]
content_ids = open("first_5_content_ids").read().splitlines()

  


最後に、コンテンツIDを記事のタイトルにマッピングし直します。最初の例のモデルの推奨事項を比較してみましょう。これはBigQueryで実行できます。以下のクエリを調べて、何が返されるのかが明確であることを確認してください。

In [11]:
from google.cloud import bigquery
recommended_title_sql="""
#standardSQL
SELECT
(SELECT MAX(IF(index=6, value, NULL)) FROM UNNEST(hits.customDimensions)) AS title
FROM `cloud-training-demos.GA360_test.ga_sessions_sample`,   
  UNNEST(hits) AS hits
WHERE 
  # only include hits on pages
  hits.type = "PAGE"
  AND (SELECT MAX(IF(index=10, value, NULL)) FROM UNNEST(hits.customDimensions)) = \"{}\"
LIMIT 1""".format(recommended_content_ids[0])

current_title_sql="""
#standardSQL
SELECT
(SELECT MAX(IF(index=6, value, NULL)) FROM UNNEST(hits.customDimensions)) AS title
FROM `cloud-training-demos.GA360_test.ga_sessions_sample`,   
  UNNEST(hits) AS hits
WHERE 
  # only include hits on pages
  hits.type = "PAGE"
  AND (SELECT MAX(IF(index=10, value, NULL)) FROM UNNEST(hits.customDimensions)) = \"{}\"
LIMIT 1""".format(content_ids[0])
recommended_title = bigquery.Client().query(recommended_title_sql).to_dataframe()['title'].tolist()[0].encode('utf-8').strip()
current_title = bigquery.Client().query(current_title_sql).to_dataframe()['title'].tolist()[0].encode('utf-8').strip()
print("Current title: {} ".format(current_title))
print("Recommended title: {}".format(recommended_title))

Current title: b'Kritik an Meghan Markle immer lauter' 
Recommended title: b'Matthias Reim spricht \xc3\xbcber Absturz & Comeback'
