<a href="https://colab.research.google.com/github/hai105178362/blogs/blob/main/%E2%80%9Cdeep_recommenders_ipynb%E2%80%9D%E7%9A%84%E5%89%AF%E6%9C%AC_2021_10_20.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

##### Copyright 2020 The TensorFlow Authors.

In [None]:
#@title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Building deep retrieval models

<table class="tfo-notebook-buttons" align="left">
  <td>
    <a target="_blank" href="https://www.tensorflow.org/recommenders/examples/deep_recommenders"><img src="https://www.tensorflow.org/images/tf_logo_32px.png" />View on TensorFlow.org</a>
  </td>
  <td>
    <a target="_blank" href="https://colab.research.google.com/github/tensorflow/recommenders/blob/main/docs/examples/deep_recommenders.ipynb"><img src="https://www.tensorflow.org/images/colab_logo_32px.png" />Run in Google Colab</a>
  </td>
  <td>
    <a target="_blank" href="https://github.com/tensorflow/recommenders/blob/main/docs/examples/deep_recommenders.ipynb"><img src="https://www.tensorflow.org/images/GitHub-Mark-32px.png" />View source on GitHub</a>
  </td>
  <td>
    <a href="https://storage.googleapis.com/tensorflow_docs/recommenders/docs/examples/deep_recommenders.ipynb"><img src="https://www.tensorflow.org/images/download_logo_32px.png" />Download notebook</a>
  </td>
</table>

In [the featurization tutorial](featurization) we incorporated multiple features into our models, but the models consist of only an embedding layer. We can add more dense layers to our models to increase their expressive power.

In general, deeper models are capable of learning more complex patterns than shallower models. For example, our [user model](featurization#user_model) incorporates user ids and timestamps to model user preferences at a point in time. A shallow model (say, a single embedding layer) may only be able to learn the simplest relationships between those features and movies: a given movie is most popular around the time of its release, and a given user generally prefers horror movies to comedies. To capture more complex relationships, such as user preferences evolving over time, we may need a deeper model with multiple stacked dense layers.

Of course, complex models also have their disadvantages. The first is computational cost, as larger models require both more memory and more computation to fit and serve. The second is the requirement for more data: in general, more training data is needed to take advantage of deeper models. With more parameters, deep models might overfit or even simply memorize the training examples instead of learning a function that can generalize. Finally, training deeper models may be harder, and more care needs to be taken in choosing settings like regularization and learning rate.

Finding a good architecture for a real-world recommender system is a complex art, requiring good intuition and careful [hyperparameter tuning](https://en.wikipedia.org/wiki/Hyperparameter_optimization). For example, factors such as the depth and width of the model, activation function, learning rate, and optimizer can radically change the performance of the model. Modelling choices are further complicated by the fact that good offline evaluation metrics may not correspond to good online performance, and that the choice of what to optimize for is often more critical than the choice of model itself.

Nevertheless, effort put into building and fine-tuning larger models often pays off. In this tutorial, we will illustrate how to build deep retrieval models using TensorFlow Recommenders. We'll do this by building progressively more complex models to see how this affects model performance.

## Preliminaries

We first import the necessary packages.

In [None]:
!pip install -q tensorflow-recommenders
!pip install -q --upgrade tensorflow-datasets

In [None]:
import os
import tempfile

%matplotlib inline
import matplotlib.pyplot as plt

import numpy as np
import tensorflow as tf
import tensorflow_datasets as tfds

import tensorflow_recommenders as tfrs

plt.style.use('seaborn-whitegrid')

In [None]:
from typing import Dict, Optional, Text, Tuple, Union
import contextlib
import tensorflow as tf

from tensorflow_recommenders import layers

def _check_candidates_with_identifiers(candidates: tf.data.Dataset) -> None:
  """Checks preconditions the dataset used for indexing."""

  spec = candidates.element_spec

  if isinstance(spec, tuple):
    if len(spec) != 2:
      raise ValueError(
          "The dataset must yield candidate embeddings or "
          "tuples of (candidate embeddings, candidate identifiers). "
          f"Got {spec} instead."
      )

    identifiers_spec, candidates_spec = spec

    if candidates_spec.shape[0] != identifiers_spec.shape[0]:
      raise ValueError(
          "Candidates and identifiers have to have the same batch dimension. "
          f"Got {candidates_spec.shape[0]} and {identifiers_spec.shape[0]}."
      )

@contextlib.contextmanager
def _wrap_batch_too_small_error(k: int):
  """Context manager that provides a more helpful error message."""

  try:
    yield
  except tf.errors.InvalidArgumentError as e:
    error_message = str(e)
    if "input must have at least k columns" in error_message:
      raise ValueError("Tried to retrieve k={k} top items, but the candidate "
                       "dataset batch size is too small. This may be because "
                       "your candidate batch size is too small or the last "
                       "batch of your dataset is too small. "
                       "To resolve this, increase your batch size, set the "
                       "drop_remainder argument to True when batching your "
                       "candidates, or set the handle_incomplete_batches "
                       "argument to True in the constructor. ".format(k=k))
    else:
      raise

class Streaming_Lzh(layers.factorized_top_k.TopK):
  """Retrieves K highest scoring items and their ids from a large dataset.
  Used to efficiently retrieve top K query-candidate scores from a dataset,
  along with the top scoring candidates' identifiers.
  """

  def __init__(self,
               query_model: Optional[tf.keras.Model] = None,
               k: int = 10,
               handle_incomplete_batches: bool = True,
               num_parallel_calls: int = tf.data.AUTOTUNE,
               sorted_order: bool = True) -> None:
    """Initializes the layer.
    Args:
      query_model: Optional Keras model for representing queries. If provided,
        will be used to transform raw features into query embeddings when
        querying the layer. If not provided, the layer will expect to be given
        query embeddings as inputs.
      k: Number of top scores to retrieve.
      handle_incomplete_batches: When True, candidate batches smaller than k
        will be correctly handled at the price of some performance. As an
        alternative, consider using the drop_remainer option when batching the
        candidate dataset.
      num_parallel_calls: Degree of parallelism when computing scores. Defaults
        to autotuning.
      sorted_order: If the resulting scores should be returned in sorted order.
        setting this to False may result in a small increase in performance.
    Raises:
      ValueError if candidate elements are not tuples.
    """

    super().__init__(k=k)

    self.query_model = query_model
    self._candidates = None
    self._handle_incomplete_batches = handle_incomplete_batches
    self._num_parallel_calls = num_parallel_calls
    self._sorted = sorted_order

    self._counter = self.add_weight("counter", dtype=tf.int32, trainable=False)

  def index_from_dataset(
      self,
      candidates: tf.data.Dataset
  ) -> "TopK":

    _check_candidates_with_identifiers(candidates)

    self._candidates = candidates

    return self

  def index(self,
            candidates: tf.data.Dataset,
            identifiers: Optional[tf.data.Dataset] = None) -> "Streaming":
    """Not implemented. Please call `index_from_dataset` instead."""

    raise NotImplementedError(
        "The streaming top k class only accepts datasets. "
        "Please call `index_from_dataset` instead."
    )

  def call(
      self,
      queries: Union[tf.Tensor, Dict[Text, tf.Tensor]],
      k: Optional[int] = None,
  ) -> Tuple[tf.Tensor, tf.Tensor]:

    k = k if k is not None else self._k

    if self._candidates is None:
      raise ValueError("The `index` method must be called first to "
                       "create the retrieval index.")

    tf.print("_candidates_cardinality:",tf.data.experimental.cardinality(self._candidates))
    for element in self._candidates:
      tf.print("_candidates_element: ",tf.shape(element))
    if self.query_model is not None:
      queries = self.query_model(queries)

    # Reset the element counter.
    self._counter.assign(0)

    def top_scores(candidate_index: tf.Tensor,
                   candidate_batch: tf.Tensor) -> Tuple[tf.Tensor, tf.Tensor]:
      """Computes top scores and indices for a batch of candidates."""

      scores = self._compute_score(queries, candidate_batch)

      if self._handle_incomplete_batches:
        k_ = tf.math.minimum(k, tf.shape(scores)[1])
      else:
        k_ = k

      scores, indices = tf.math.top_k(scores, k=k_, sorted=self._sorted)

      return scores, tf.gather(candidate_index, indices)

    def top_k(state: Tuple[tf.Tensor, tf.Tensor],
              x: Tuple[tf.Tensor, tf.Tensor]) -> Tuple[tf.Tensor, tf.Tensor]:
      """Reduction function.
      Returns top K scores from a combination of existing top K scores and new
      candidate scores, as well as their corresponding indices.
      Args:
        state: tuple of [query_batch_size, k] tensor of highest scores so far
          and [query_batch_size, k] tensor of indices of highest scoring
          elements.
        x: tuple of [query_batch_size, k] tensor of new scores and
          [query_batch_size, k] tensor of new indices.
      Returns:
        Tuple of [query_batch_size, k] tensors of highest scores and indices
          from state and x.
      """
      state_scores, state_indices = state
      x_scores, x_indices = x

      joined_scores = tf.concat([state_scores, x_scores], axis=1)
      joined_indices = tf.concat([state_indices, x_indices], axis=1)

      if self._handle_incomplete_batches:
        k_ = tf.math.minimum(k, tf.shape(joined_scores)[1])
      else:
        k_ = k

      scores, indices = tf.math.top_k(joined_scores, k=k_, sorted=self._sorted)

      return scores, tf.gather(joined_indices, indices, batch_dims=1)

    def enumerate_rows(batch: tf.Tensor) -> Tuple[tf.Tensor, tf.Tensor]:
      """Enumerates rows in each batch using a total element counter."""

      starting_counter = self._counter.read_value()
      end_counter = self._counter.assign_add(tf.shape(batch)[0])

      return tf.range(starting_counter, end_counter), batch

    if not isinstance(self._candidates.element_spec, tuple):
      # We don't have identifiers.
      candidates = self._candidates.map(enumerate_rows)
      index_dtype = tf.int32
    else:
      candidates = self._candidates
      index_dtype = self._candidates.element_spec[0].dtype

    # Initialize the state with dummy scores and candidate indices.
    initial_state = (tf.zeros((tf.shape(queries)[0], 0), dtype=tf.float32),
                     tf.zeros((tf.shape(queries)[0], 0), dtype=index_dtype))
    
    with _wrap_batch_too_small_error(k):
      results = (
          candidates
          # Compute scores over all candidates, and select top k in each batch.
          # Each element is a ([query_batch_size, k] tensor,
          # [query_batch_size, k] tensor) of scores and indices (where query_
          # batch_size is the leading dimension of the input query embeddings).
          .map(top_scores, num_parallel_calls=self._num_parallel_calls)
          # Reduce into a single tuple of output tensors by keeping a running
          # tally of top k scores and indices.
          .reduce(initial_state, top_k))

    return results

In [None]:
from typing import List, Optional, Sequence, Text, Union

import tensorflow as tf

from tensorflow_recommenders import layers


class FactorizedTopK_LZH(tf.keras.layers.Layer):
  """Computes metrics for across top K candidates surfaced by a retrieval model.
  The default metric is top K categorical accuracy: how often the true candidate
   is in the top K candidates for a given query.
  """

  def __init__(
      self,
      candidates: Union[layers.factorized_top_k.TopK, tf.data.Dataset],
      metrics: Optional[Sequence[tf.keras.metrics.Metric]] = None,
      k: int = 100,
      name: Text = "factorized_top_k",
  ) -> None:
    """Initializes the metric.
    Args:
      candidates: A layer for retrieving top candidates in response
        to a query, or a dataset of candidate embeddings from which
        candidates should be retrieved.
      metrics: The metrics to compute. If not supplied, will compute top-K
        categorical accuracy metrics.
      k: The number of top scoring candidates to retrieve for metric evaluation.
      name: Optional name.
    """

    super().__init__(name=name)

    if metrics is None:
      metrics = [
          tf.keras.metrics.TopKCategoricalAccuracy(
              k=x, name=f"{self.name}/top_{x}_categorical_accuracy")
          for x in [1, 5, 10, 50, 100]
      ]

    if isinstance(candidates, tf.data.Dataset):
      candidates = Streaming_Lzh(k=k).index_from_dataset(
          candidates)
      
    
    self._k = k
    self._candidates = candidates
    self._top_k_metrics = metrics

  def update_state(self, query_embeddings: tf.Tensor,
                   true_candidate_embeddings: tf.Tensor) -> tf.Operation:
    """Updates the metrics.
    Args:
      query_embeddings: [num_queries, embedding_dim] tensor of query embeddings.
      true_candidate_embeddings: [num_queries, embedding_dim] tensor of
        embeddings for candidates that were selected for the query.
    Returns:
      Update op. Only used in graph mode.
    """
    
    tf.print("query_embeddings.shape",tf.shape(query_embeddings))
    tf.print("true_candidate_embeddings.shape",tf.shape(true_candidate_embeddings))

    tf.print("query_embeddings",(query_embeddings),summarize =100)
    tf.print("true_candidate_embeddings",(true_candidate_embeddings),summarize =100)
    positive_scores = tf.reduce_sum(
        query_embeddings * true_candidate_embeddings, axis=1, keepdims=True)
    tf.print("positive_scores.shape",tf.shape(positive_scores))
    top_k_predictions, _ = self._candidates(query_embeddings, k=self._k)
    tf.print("top_k_predictions",tf.shape(top_k_predictions))
    tf.print("positive_scores",positive_scores,summarize =100)
    #Mixed Negative Sampling
    #We propose Mixed Negative Sampling (MNS) to tackle these
    #problems. It uniformly samples B′items from another data stream.
    #We refer the additional data stream as index data, which is a set
    #composed of items from the entire corpus.
    y_true = tf.concat(
        [tf.ones(tf.shape(positive_scores)),
         tf.zeros_like(top_k_predictions)],
        axis=1)
    y_pred = tf.concat([positive_scores, top_k_predictions], axis=1)
    tf.print("y_true",y_true,summarize =100)
    tf.print("y_pred",y_pred,summarize =100)
    update_ops = []

    for metric in self._top_k_metrics:
      update_ops.append(metric.update_state(y_true=y_true, y_pred=y_pred))

    return tf.group(update_ops)

  def reset_states(self) -> None:
    """Resets the metrics."""

    for metric in self.metrics:
      metric.reset_states()

  def result(self) -> List[tf.Tensor]:
    """Returns a list of metric results."""

    return [metric.result() for metric in self.metrics]

In this tutorial we will use the models from [the featurization tutorial](featurization) to generate embeddings. Hence we will only be using the user id, timestamp, and movie title features.

In [None]:
import random
ratings = tfds.load("movielens/100k-ratings", split="train")
movies = tfds.load("movielens/100k-movies", split="train")

# 点击流数据10000条
ratings = ratings.map(lambda x: {    
    "user_id": x["user_id"],
    "user_sex": random.choice(["male","female"]),
    "user_age": random.choice(range(1, 99)),
    "timestamp": x["timestamp"],
    "item_id": x["movie_title"],
    "item_name": x["movie_title"],
    "item_desc": x["movie_title"],
    "item_price": random.choice(range(9999, 13999)),
})

# 点击流包换 1682个候选集
items = movies.map(lambda x: {
    "item_id": x["movie_title"],
    "item_name": x["movie_title"],
    "item_desc": x["movie_title"],
    "item_price": random.choice(range(9999, 13999)),
    })
# 更多的候选集，如新发布的产品
full_items = movies.map(lambda x: {
    "item_id": x["movie_title"],
    "item_name": x["movie_title"],
    "item_desc": x["movie_title"],
    "item_price": random.choice(range(9999, 13999)),
    })

We also do some housekeeping to prepare feature vocabularies.

In [None]:
timestamps = np.concatenate(list(ratings.map(lambda x: x["timestamp"]).batch(100)))
item_prices = np.concatenate(list(items.map(lambda x: x["item_price"]).batch(100)))

max_timestamp = timestamps.max()
min_timestamp = timestamps.min()

timestamp_buckets = np.linspace(
    min_timestamp, max_timestamp, num=1000,
)


unique_item_ids = np.unique(np.concatenate(list(items.batch(1000).map(lambda x: x["item_id"]))))
unique_user_ids = np.unique(np.concatenate(list(ratings.batch(1_000).map(
    lambda x: x["user_id"]))))

## Model definition

### Query model

We start with the user model defined in [the featurization tutorial](featurization) as the first layer of our model, tasked with converting raw input examples into feature embeddings.

In [None]:
class UserModel(tf.keras.Model):
  
  def __init__(self):
    super().__init__()

    self.user_embedding = tf.keras.Sequential([
        tf.keras.layers.StringLookup(
            vocabulary=unique_user_ids, mask_token=None),
        tf.keras.layers.Embedding(len(unique_user_ids) + 1, 32),
    ])

    self.sex_embedding = tf.keras.Sequential([
        tf.keras.layers.StringLookup(
            vocabulary=["male","female"], mask_token=None),
        tf.keras.layers.Embedding(len(["male","female"]) + 1, 32),
    ])

    #年龄分成几段
    self.age_embedding = tf.keras.Sequential([
        tf.keras.layers.Discretization(bin_boundaries=[0., 3., 14., 18., 40., 55., 65.,130.]),
        tf.keras.layers.Embedding(len([0., 3., 14., 18., 40., 55., 65.,130.]) + 1, 32),
    ])

    self.timestamp_embedding = tf.keras.Sequential([
        tf.keras.layers.Discretization(timestamp_buckets.tolist()),
        tf.keras.layers.Embedding(len(timestamp_buckets) + 1, 32),
    ])

    self.normalized_timestamp = tf.keras.layers.Normalization(
        axis=None
    )

    self.normalized_timestamp.adapt(timestamps)

  def call(self, inputs):
    # Take the input dictionary, pass it through each input layer,
    # and concatenate the result.
    return tf.concat([
        self.user_embedding(inputs["user_id"]),
        self.sex_embedding(inputs["user_sex"]),
        self.age_embedding(inputs["user_age"]),
        self.timestamp_embedding(inputs["timestamp"]),
        tf.reshape(self.normalized_timestamp(inputs["timestamp"]), (-1, 1)),
    ], axis=1)

Defining deeper models will require us to stack mode layers on top of this first input. A progressively narrower stack of layers, separated by an activation function, is a common pattern:

```
                            +----------------------+
                            |      128 x 64        |
                            +----------------------+
                                       | relu
                          +--------------------------+
                          |        256 x 128         |
                          +--------------------------+
                                       | relu
                        +------------------------------+
                        |          ... x 256           |
                        +------------------------------+
```
Since the expressive power of deep linear models is no greater than that of shallow linear models, we use ReLU activations for all but the last hidden layer. The final hidden layer does not use any activation function: using an activation function would limit the output space of the final embeddings and might negatively impact the performance of the model. For instance, if ReLUs are used in the projection layer, all components in the output embedding would be non-negative.

We're going to try something similar here. To make experimentation with different depths easy, let's define a model whose depth (and width) is defined by a set of constructor parameters. 

In [None]:
class QueryModel(tf.keras.Model):
  """Model for encoding user queries."""

  def __init__(self, layer_sizes):
    """Model for encoding user queries.

    Args:
      layer_sizes:
        A list of integers where the i-th entry represents the number of units
        the i-th layer contains.
    """
    super().__init__()

    # We first use the user model for generating embeddings.
    self.embedding_model = UserModel()

    # Then construct the layers.
    self.dense_layers = tf.keras.Sequential()

    # Use the ReLU activation for all but the last layer.
    for layer_size in layer_sizes[:-1]:
      self.dense_layers.add(tf.keras.layers.Dense(layer_size, activation="relu"))

    # No activation for the last layer.
    for layer_size in layer_sizes[-1:]:
      self.dense_layers.add(tf.keras.layers.Dense(layer_size))
    
  def call(self, inputs):
    feature_embedding = self.embedding_model(inputs)
    return self.dense_layers(feature_embedding)

The `layer_sizes` parameter gives us the depth and width of the model. We can vary it to experiment with shallower or deeper models.

### Candidate model

We can adopt the same approach for the movie model. Again, we start with the `MovieModel` from the [featurization](featurization) tutorial:

In [None]:
class ItemModel(tf.keras.Model):
  
  def __init__(self):
    super().__init__()

    max_tokens = 10_000

    self.id_embedding = tf.keras.Sequential([
      tf.keras.layers.StringLookup(
          vocabulary=unique_item_ids,mask_token=None),
      tf.keras.layers.Embedding(len(unique_item_ids) + 1, 32)
    ])

    item_name_lookup = tf.keras.layers.experimental.preprocessing.StringLookup()
    item_name_lookup.adapt(ratings.map(lambda x: x["item_name"]))

    self.name_embedding = tf.keras.Sequential([
      item_name_lookup, 
      tf.keras.layers.Embedding(item_name_lookup.vocabulary_size(), 32)])

    self.desc_vectorizer = tf.keras.layers.TextVectorization(
        max_tokens=max_tokens)

    self.desc_text_embedding = tf.keras.Sequential([
      self.desc_vectorizer,
      tf.keras.layers.Embedding(max_tokens, 32, mask_zero=True),
      tf.keras.layers.GlobalAveragePooling1D(),
    ])

    self.desc_vectorizer.adapt(items.map(lambda x: x["item_desc"]))

    self.normalized_price = tf.keras.layers.Normalization(
        axis=None
    )

    self.normalized_price.adapt(item_prices)

  def call(self, inputs):
    return tf.concat([
        self.id_embedding(inputs["item_id"]),
        self.name_embedding(inputs["item_name"]),
        self.desc_text_embedding(inputs["item_desc"]),
        tf.reshape(self.normalized_price(inputs["item_price"]), (-1, 1)),
    ], axis=1)

And expand it with hidden layers:

In [None]:
class CandidateModel(tf.keras.Model):
  """Model for encoding items."""

  def __init__(self, layer_sizes):
    """Model for encoding items.

    Args:
      layer_sizes:
        A list of integers where the i-th entry represents the number of units
        the i-th layer contains.
    """
    super().__init__()

    self.embedding_model = ItemModel()

    # Then construct the layers.
    self.dense_layers = tf.keras.Sequential()

    # Use the ReLU activation for all but the last layer.
    for layer_size in layer_sizes[:-1]:
      self.dense_layers.add(tf.keras.layers.Dense(layer_size, activation="relu"))

    # No activation for the last layer.
    for layer_size in layer_sizes[-1:]:
      self.dense_layers.add(tf.keras.layers.Dense(layer_size))
    
  def call(self, inputs):
    feature_embedding = self.embedding_model(inputs)
    return self.dense_layers(feature_embedding)

### Combined model

With both `QueryModel` and `CandidateModel` defined, we can put together a combined model and implement our loss and metrics logic. To make things simple, we'll enforce that the model structure is the same across the query and candidate models.

In [None]:
class MovielensModel(tfrs.models.Model):

  def __init__(self, layer_sizes):
    super().__init__()
    self.query_model = QueryModel(layer_sizes)
    self.candidate_model = CandidateModel(layer_sizes)
    self.task = tfrs.tasks.Retrieval(
        metrics=FactorizedTopK_LZH(
            candidates=items.batch(128).map(lambda x :self.candidate_model(x)),
            k = 12,
        ),
    )

  def compute_loss(self, features, training=False):
    # We only pass the user id and timestamp features into the query model. This
    # is to ensure that the training inputs would have the same keys as the
    # query inputs. Otherwise the discrepancy in input structure would cause an
    # error when loading the query model after saving it.
    query_embeddings = self.query_model({
        "user_id": features["user_id"],
        "user_sex": features["user_sex"],
        "user_age": features["user_age"],
        "timestamp": features["timestamp"],
    })
    item_embeddings = self.candidate_model({
        "item_id": features["item_id"],
        "item_name": features["item_name"],
        "item_desc": features["item_desc"],
        "item_price": features["item_price"]
        })

    return self.task(
        query_embeddings, item_embeddings, compute_metrics=not training)

## Training the model

### Prepare the data

We first split the data into a training set and a testing set.

In [None]:
tf.random.set_seed(42)
shuffled = ratings.shuffle(100_000, seed=42, reshuffle_each_iteration=False)

train = shuffled.take(80)
test = shuffled.skip(80).take(20)

cached_train = train.shuffle(100).batch(4)
cached_test = test.batch(8).cache()

In [None]:
# No label column specified
#dataset_csv = tf.data.experimental.make_csv_dataset("/content/sample_data/clickstream_demo.csv", batch_size=2)
#dataset_csv_iterator = dataset_csv.as_numpy_iterator()
#print(dict(next(dataset_csv_iterator)))


### Shallow model

We're ready to try out our first, shallow, model!

In [None]:
num_epochs = 10

model = MovielensModel([8])
model.compile(optimizer=tf.keras.optimizers.Adagrad(0.1))

one_layer_history = model.fit(
    cached_train,
    validation_data=cached_test,
    validation_freq=5,
    epochs=num_epochs,
    verbose=0)

accuracy = one_layer_history.history["val_factorized_top_k/top_100_categorical_accuracy"][-1]
print(f"Top-100 accuracy: {accuracy:.2f}.")

This gives us a top-100 accuracy of around 0.27. We can use this as a reference point for evaluating deeper models.



In [None]:
brute_force = tfrs.layers.factorized_top_k.BruteForce(model.query_model)
#离线候选集
brute_force.index_from_dataset(
     items.batch(128).map(lambda x: (x["item_id"], model.candidate_model(x)))
)

#在线预测
_, items_predict = brute_force({
        "user_id": np.array(["42"]),
        "user_sex": np.array(["female"]),
        "user_age": np.array([29]),
        "timestamp": np.array([1198090049]),
    }, k=3)
print(f"Top recommendations: {items_predict[0]}")

In [None]:
brute_force_lots = tfrs.layers.factorized_top_k.BruteForce(model.query_model)
brute_force_lots.index_from_dataset(
     tf.data.Dataset.zip((full_items.batch(128).map(lambda x: x["item_id"]), full_items.batch(128).map(model.candidate_model)))
)

#在线预测
_, items_predict = brute_force_lots({
        "user_id": np.array(["42"]),
        "user_sex": np.array(["female"]),
        "user_age": np.array([29]),
        "timestamp": np.array([1198090049]),
    }, k=3)
print(f"Top recommendations: {items_predict[0]}")

In [None]:
# Export the query model.
with tempfile.TemporaryDirectory() as tmp:
  path = os.path.join("/content/sample_data", "model")

  # Save the index.
  tf.saved_model.save(brute_force_lots, path)

  # Load it back; can also be done in TensorFlow Serving.
  loaded = tf.saved_model.load(path)

  # Pass a user id in, get top predicted movie titles back.
  scores, titles = loaded({
        "user_id": np.array(["42"]),
        "user_sex": np.array(["female"]),
        "user_age": np.array([29]),
        "timestamp": np.array([1198090049]),
    })

  print(f"Recommendations: {titles[0][:3]}")

### Deeper model

What about a deeper model with two layers?

In [None]:
model = MovielensModel([64, 32])
model.compile(optimizer=tf.keras.optimizers.Adagrad(0.1))

two_layer_history = model.fit(
    cached_train,
    validation_data=cached_test,
    validation_freq=5,
    epochs=num_epochs,
    verbose=0)

accuracy = two_layer_history.history["val_factorized_top_k/top_100_categorical_accuracy"][-1]
print(f"Top-100 accuracy: {accuracy:.2f}.")

The accuracy here is 0.29, quite a bit better than the shallow model.

We can plot the validation accuracy curves to illustrate this:

In [None]:
num_validation_runs = len(one_layer_history.history["val_factorized_top_k/top_100_categorical_accuracy"])
epochs = [(x + 1)* 5 for x in range(num_validation_runs)]

plt.plot(epochs, one_layer_history.history["val_factorized_top_k/top_100_categorical_accuracy"], label="1 layer")
plt.plot(epochs, two_layer_history.history["val_factorized_top_k/top_100_categorical_accuracy"], label="2 layers")
plt.title("Accuracy vs epoch")
plt.xlabel("epoch")
plt.ylabel("Top-100 accuracy");
plt.legend()

Even early on in the training, the larger model has a clear and stable lead over the shallow model, suggesting that adding depth helps the model capture more nuanced relationships in the data.

However, even deeper models are not necessarily better. The following model extends the depth to three layers:

In [None]:
model = MovielensModel([128, 64, 32])
model.compile(optimizer=tf.keras.optimizers.Adagrad(0.1))

three_layer_history = model.fit(
    cached_train,
    validation_data=cached_test,
    validation_freq=5,
    epochs=num_epochs,
    verbose=0)

accuracy = three_layer_history.history["val_factorized_top_k/top_100_categorical_accuracy"][-1]
print(f"Top-100 accuracy: {accuracy:.2f}.")


In fact, we don't see improvement over the shallow model:

In [None]:
plt.plot(epochs, one_layer_history.history["val_factorized_top_k/top_100_categorical_accuracy"], label="1 layer")
plt.plot(epochs, two_layer_history.history["val_factorized_top_k/top_100_categorical_accuracy"], label="2 layers")
plt.plot(epochs, three_layer_history.history["val_factorized_top_k/top_100_categorical_accuracy"], label="3 layers")
plt.title("Accuracy vs epoch")
plt.xlabel("epoch")
plt.ylabel("Top-100 accuracy");
plt.legend()

This is a good illustration of the fact that deeper and larger models, while capable of superior performance, often require very careful tuning. For example, throughout this tutorial we used a single, fixed learning rate. Alternative choices may give very different results and are worth exploring. 

With appropriate tuning and sufficient data, the effort put into building larger and deeper models is in many cases well worth it: larger models can lead to substantial improvements in prediction accuracy.



## Next Steps

In this tutorial we expanded our retrieval model with dense layers and activation functions. To see how to create a model that can perform not only retrieval tasks but also rating tasks, take a look at [the multitask tutorial](multitask).