# Notebook 7
### In this notebook we compare one static in our LTS environment that uses our LTS User Model and LTS Document Sampler. This environment creates the Kale/Chocolate problem discussed in the Google RecSim Paper. In this notebook we use slate size = 4.

### Table of Contents
- Section 1 : Create Document Sampler, User Model, and LTS Environment
- Section 2 : Create and Train Agent
- Section 3 : Evaluate Agent Performance with Tensorboard

### Imports

In [1]:
# Import My Cloned Github Repository
!pip install git+https://github.com/jgy4/recsim

Collecting git+https://github.com/jgy4/recsim
  Cloning https://github.com/jgy4/recsim to /private/var/folders/63/s86bv36d4c7968bfvh7fb4w40000gn/T/pip-req-build-fs_zmtl0
  Running command git clone -q https://github.com/jgy4/recsim /private/var/folders/63/s86bv36d4c7968bfvh7fb4w40000gn/T/pip-req-build-fs_zmtl0
Collecting dopamine-rl>=2.0.5
  Downloading dopamine_rl-4.0.2-py3-none-any.whl (164 kB)
[K     |████████████████████████████████| 164 kB 2.2 MB/s eta 0:00:01
[?25hCollecting gin-config
  Downloading gin_config-0.5.0-py3-none-any.whl (61 kB)
[K     |████████████████████████████████| 61 kB 20.0 MB/s eta 0:00:01
Collecting jax>=0.1.72
  Downloading jax-0.3.7.tar.gz (944 kB)
[K     |████████████████████████████████| 944 kB 16.3 MB/s eta 0:00:01
[?25hCollecting tf-slim>=1.0
  Downloading tf_slim-1.1.0-py2.py3-none-any.whl (352 kB)
[K     |████████████████████████████████| 352 kB 49.5 MB/s eta 0:00:01
Collecting tensorflow-probability>=0.13.0
  Downloading tensorflow_probability-

In [2]:
# Load Libraries
import numpy as np
import tensorflow as tf
from recsim.environments import interest_evolution
from recsim.agents import full_slate_q_agent
from recsim.agent import AbstractEpisodicRecommenderAgent
from recsim.simulator import runner_lib
from gym import spaces
import matplotlib.pyplot as plt
from scipy import stats

from recsim import document
from recsim import user
from recsim.choice_model import MultinomialLogitChoiceModel
from recsim.simulator import environment
from recsim.simulator import recsim_gym
# Load the TensorBoard notebook extension
%load_ext tensorboard

### Section 1 : Create Document Sampler, User Model, and LTS Environment

#### Section 1a: Create document class and document sampler

In [3]:
#Create our document class

class LTSDocument(document.AbstractDocument):
    def __init__(self, doc_id, kaleness):
        self.kaleness = kaleness
        # doc_id is an integer representing the unique ID of this document
        super(LTSDocument, self).__init__(doc_id)

    def create_observation(self):
        return np.array([self.kaleness])

    @staticmethod
    def observation_space():
        return spaces.Box(shape=(1,), dtype=np.float32, low=0.0, high=1.0)
  
    def __str__(self):
        return "Document {} with kaleness {}.".format(self._doc_id, self.kaleness)

In [4]:
#Create our document sampler

class LTSDocumentSampler(document.AbstractDocumentSampler):
    def __init__(self, doc_ctor=LTSDocument, **kwargs):
        super(LTSDocumentSampler, self).__init__(doc_ctor, **kwargs)
        self._doc_count = 0

    def sample_document(self):
        doc_features = {}
        doc_features['doc_id'] = self._doc_count
        doc_features['kaleness'] = self._rng.random_sample()
        self._doc_count += 1
        return self._doc_ctor(**doc_features)

Here we demonstrate with an example of how to simulate documents:

In [5]:
sampler = LTSDocumentSampler()
for i in range(5): print(sampler.sample_document())
d = sampler.sample_document()
print("Documents have observation space:", d.observation_space(), "\n"
      "An example realization is: ", d.create_observation())

Document 0 with kaleness 0.5488135039273248.
Document 1 with kaleness 0.7151893663724195.
Document 2 with kaleness 0.6027633760716439.
Document 3 with kaleness 0.5448831829968969.
Document 4 with kaleness 0.4236547993389047.
Documents have observation space: Box([0.], [1.], (1,), float32) 
An example realization is:  [0.64589411]


#### Section 1b: Create a user state, a user sampler, a user state transition model, and a user response model

In [6]:
#Create a user state class

class LTSUserState(user.AbstractUserState):
  def __init__(self, memory_discount, sensitivity, innovation_stddev,
               choc_mean, choc_stddev, kale_mean, kale_stddev,
               net_kaleness_exposure, time_budget, observation_noise_stddev=0.1
              ):
    ## Transition model parameters
    ##############################
    self.memory_discount = memory_discount
    self.sensitivity = sensitivity
    self.innovation_stddev = innovation_stddev

    ## Engagement parameters
    self.choc_mean = choc_mean
    self.choc_stddev = choc_stddev
    self.kale_mean = kale_mean
    self.kale_stddev = kale_stddev

    ## State variables
    ##############################
    self.net_kaleness_exposure = net_kaleness_exposure
    self.satisfaction = 1 / (1 + np.exp(-sensitivity * net_kaleness_exposure))
    self.time_budget = time_budget

    # Noise
    self._observation_noise = observation_noise_stddev

  def create_observation(self):
    """User's state is not observable."""
    clip_low, clip_high = (-1.0 / (1.0 * self._observation_noise),
                           1.0 / (1.0 * self._observation_noise))
    noise = stats.truncnorm(
        clip_low, clip_high, loc=0.0, scale=self._observation_noise).rvs()
    noisy_sat = self.satisfaction + noise
    return np.array([noisy_sat,])

  @staticmethod
  def observation_space():
    return spaces.Box(shape=(1,), dtype=np.float32, low=-2.0, high=2.0)
  
  # scoring function: the user is more likely to click on more chocolatey content.
  def score_document(self, doc_obs):
    return 1 - doc_obs

In [7]:
#Create a user sampler

class LTSStaticUserSampler(user.AbstractUserSampler):
  _state_parameters = None

  def __init__(self,
               user_ctor=LTSUserState,
               memory_discount=0.9,
               sensitivity=0.01,
               innovation_stddev=0.05,
               choc_mean=5.0,
               choc_stddev=1.0,
               kale_mean=4.0,
               kale_stddev=1.0,
               time_budget=60,
               **kwargs):
    self._state_parameters = {'memory_discount': memory_discount,
                              'sensitivity': sensitivity,
                              'innovation_stddev': innovation_stddev,
                              'choc_mean': choc_mean,
                              'choc_stddev': choc_stddev,
                              'kale_mean': kale_mean,
                              'kale_stddev': kale_stddev,
                              'time_budget': time_budget
                             }
    super(LTSStaticUserSampler, self).__init__(user_ctor, **kwargs)

  def sample_user(self):
    starting_nke = ((self._rng.random_sample() - .5) *
                    (1 / (1.0 - self._state_parameters['memory_discount'])))
    self._state_parameters['net_kaleness_exposure'] = starting_nke
    return self._user_ctor(**self._state_parameters)

In [8]:
#Create a user response model

class LTSResponse(user.AbstractResponse):
  # The maximum degree of engagement.
  MAX_ENGAGEMENT_MAGNITUDE = 100.0

  def __init__(self, clicked=False, engagement=0.0):
    self.clicked = clicked
    self.engagement = engagement

  def create_observation(self):
    return {'click': int(self.clicked), 'engagement': np.array(self.engagement)}

  @classmethod
  def response_space(cls):
    return spaces.Dict({
        'click':
            spaces.Discrete(2),
        'engagement':
            spaces.Box(
                low=0.0,
                high=cls.MAX_ENGAGEMENT_MAGNITUDE,
                shape=tuple(),
                dtype=np.float32)
    })

In [9]:
#Create functions that maintain user state, evolve user state as a result of recommendations, and generate a response to a slate of reccomendations

def user_init(self,
              slate_size,
              seed=0):

  super(LTSUserModel,
        self).__init__(LTSResponse,
                       LTSStaticUserSampler(LTSUserState,
                                            seed=seed), slate_size)
  self.choice_model = MultinomialLogitChoiceModel({})

def simulate_response(self, slate_documents):
  # List of empty responses
  responses = [self._response_model_ctor() for _ in slate_documents]
  # Get click from of choice model.
  self.choice_model.score_documents(
    self._user_state, [doc.create_observation() for doc in slate_documents])
  scores = self.choice_model.scores
  selected_index = self.choice_model.choose_item()
  # Populate clicked item.
  self._generate_response(slate_documents[selected_index],
                          responses[selected_index])
  return responses

def generate_response(self, doc, response):
  response.clicked = True
  # linear interpolation between choc and kale.
  engagement_loc = (doc.kaleness * self._user_state.choc_mean
                    + (1 - doc.kaleness) * self._user_state.kale_mean)
  engagement_loc *= self._user_state.satisfaction
  engagement_scale = (doc.kaleness * self._user_state.choc_stddev
                      + ((1 - doc.kaleness)
                          * self._user_state.kale_stddev))
  log_engagement = np.random.normal(loc=engagement_loc,
                                    scale=engagement_scale)
  response.engagement = np.exp(log_engagement)

def update_state(self, slate_documents, responses):
  for doc, response in zip(slate_documents, responses):
    if response.clicked:
      innovation = np.random.normal(scale=self._user_state.innovation_stddev)
      net_kaleness_exposure = (self._user_state.memory_discount
                                * self._user_state.net_kaleness_exposure
                                - 2.0 * (doc.kaleness - 0.5)
                                + innovation
                              )
      self._user_state.net_kaleness_exposure = net_kaleness_exposure
      satisfaction = 1 / (1.0 + np.exp(-self._user_state.sensitivity
                                        * net_kaleness_exposure)
                          )
      self._user_state.satisfaction = satisfaction
      self._user_state.time_budget -= 1
      return

def is_terminal(self):
  """Returns a boolean indicating if the session is over."""
  return self._user_state.time_budget <= 0

In [10]:
#Put everything together in a User Model

LTSUserModel = type("LTSUserModel", (user.AbstractUserModel,),
                    {"__init__": user_init,
                     "is_terminal": is_terminal,
                     "update_state": update_state,
                     "simulate_response": simulate_response,
                     "_generate_response": generate_response})

#### Section 1c: Finally, put all the components together in an LTS environment

In [11]:
 slate_size = 4
 num_candidates = 10
 ltsenv = environment.Environment(
            LTSUserModel(slate_size),
            LTSDocumentSampler(),
            num_candidates,
            slate_size,
            resample_documents=True)

### Section 2 : Create and Train Two Agents 

#### Section 2a: Define LTS Gym Environment

In [12]:
# We'll need a reward function to create our final lts environment

def clicked_engagement_reward(responses):
  reward = 0.0
  for response in responses:
    if response.clicked:
      reward += response.engagement
  return reward

In [13]:
#Use the OpenAI Gym Wrapper to create an LTS Gym Environment

lts_gym_env = recsim_gym.RecSimGymEnv(ltsenv, clicked_engagement_reward)

#### Section 2b: Create Agent

In [14]:
#Creating a Full Slate Q Agent

def create_agent(sess, environment, eval_mode, summary_writer=None):
  kwargs = {
      'observation_space': environment.observation_space,
      'action_space': environment.action_space,
      'summary_writer': summary_writer,
      'eval_mode': eval_mode,
  }
  return full_slate_q_agent.FullSlateQAgent(sess, **kwargs)

#### Section 2c: Train Agent

In [15]:
# Set Seed and Environment Configurations

seed = 0
np.random.seed(seed)
env_config = {
  'num_candidates': 10,
  'slate_size': 4,
  'resample_documents': True,
  'seed': seed,
  }
tmp_base_dir = '/tmp/recsim/'

In [16]:
# Train Full Slate Q Agent on LTS Gym Environment

runner = runner_lib.TrainRunner(
    base_dir=tmp_base_dir,
    create_agent_fn=create_agent,
    env=lts_gym_env,
    episode_log_file="",
    max_training_steps=50,
    num_iterations=25)
runner.run_experiment()

INFO:tensorflow:max_training_steps = 50, number_iterations = 25,checkpoint frequency = 1 iterations.
INFO:tensorflow:max_steps_per_episode = 27000




Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor


Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor


INFO:tensorflow:Beginning training...


INFO:tensorflow:Beginning training...


INFO:tensorflow:Starting iteration 0


INFO:tensorflow:Starting iteration 0


INFO:tensorflow:Starting iteration 1


INFO:tensorflow:Starting iteration 1


INFO:tensorflow:Starting iteration 2


INFO:tensorflow:Starting iteration 2


INFO:tensorflow:Starting iteration 3


INFO:tensorflow:Starting iteration 3


INFO:tensorflow:Starting iteration 4


INFO:tensorflow:Starting iteration 4


Instructions for updating:
Use standard file APIs to delete files with this prefix.


Instructions for updating:
Use standard file APIs to delete files with this prefix.


INFO:tensorflow:Starting iteration 5


INFO:tensorflow:Starting iteration 5


INFO:tensorflow:Starting iteration 6


INFO:tensorflow:Starting iteration 6


INFO:tensorflow:Starting iteration 7


INFO:tensorflow:Starting iteration 7


INFO:tensorflow:Starting iteration 8


INFO:tensorflow:Starting iteration 8


INFO:tensorflow:Starting iteration 9


INFO:tensorflow:Starting iteration 9


INFO:tensorflow:Starting iteration 10


INFO:tensorflow:Starting iteration 10


INFO:tensorflow:Starting iteration 11


INFO:tensorflow:Starting iteration 11


INFO:tensorflow:Starting iteration 12


INFO:tensorflow:Starting iteration 12


INFO:tensorflow:Starting iteration 13


INFO:tensorflow:Starting iteration 13


INFO:tensorflow:Starting iteration 14


INFO:tensorflow:Starting iteration 14


INFO:tensorflow:Starting iteration 15


INFO:tensorflow:Starting iteration 15


INFO:tensorflow:Starting iteration 16


INFO:tensorflow:Starting iteration 16


INFO:tensorflow:Starting iteration 17


INFO:tensorflow:Starting iteration 17


INFO:tensorflow:Starting iteration 18


INFO:tensorflow:Starting iteration 18


INFO:tensorflow:Starting iteration 19


INFO:tensorflow:Starting iteration 19


INFO:tensorflow:Starting iteration 20


INFO:tensorflow:Starting iteration 20


INFO:tensorflow:Starting iteration 21


INFO:tensorflow:Starting iteration 21


INFO:tensorflow:Starting iteration 22


INFO:tensorflow:Starting iteration 22


INFO:tensorflow:Starting iteration 23


INFO:tensorflow:Starting iteration 23


INFO:tensorflow:Starting iteration 24


INFO:tensorflow:Starting iteration 24


### Section 3: Evaluate Agent Performance with Tensorboard

In [17]:
  # Evaluate Full Slate Q Agent

  runner = runner_lib.EvalRunner(
      base_dir=tmp_base_dir,
      create_agent_fn=create_agent,
      env=lts_gym_env,
      max_eval_episodes=50,
      test_mode=True)
  
  runner.run_experiment()

INFO:tensorflow:max_eval_episodes = 50


INFO:tensorflow:max_eval_episodes = 50


INFO:tensorflow:max_steps_per_episode = 27000


INFO:tensorflow:max_steps_per_episode = 27000


INFO:tensorflow:Beginning evaluation...


INFO:tensorflow:Beginning evaluation...


INFO:tensorflow:Restoring parameters from /tmp/recsim/train/checkpoints/tf_ckpt-24


INFO:tensorflow:Restoring parameters from /tmp/recsim/train/checkpoints/tf_ckpt-24


INFO:tensorflow:eval_file: /tmp/recsim/eval_50/returns_1500


INFO:tensorflow:eval_file: /tmp/recsim/eval_50/returns_1500


In [18]:
!pip install tensorboard

ERROR! Session/line number was not unique in database. History logging moved to new session 631
The folder you are executing pip from can no longer be found.


In [None]:
#View Results on Tensorboard
%tensorboard --logdir=/tmp/recsim/

Launching TensorBoard...

ERROR:root:Internal Python error in the inspect module.
Below is the traceback from this internal error.

ERROR:root:Internal Python error in the inspect module.
Below is the traceback from this internal error.

ERROR:root:Internal Python error in the inspect module.
Below is the traceback from this internal error.



Traceback (most recent call last):
  File "/Users/jasmineyoung/opt/anaconda3/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3343, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-19-9f838b8dabc7>", line 2, in <module>
    get_ipython().run_line_magic('tensorboard', '--logdir=/tmp/recsim/')
  File "/Users/jasmineyoung/opt/anaconda3/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 2326, in run_line_magic
    result = fn(*args, **kwargs)
  File "/Users/jasmineyoung/opt/anaconda3/lib/python3.8/site-packages/tensorboard/notebook.py", line 117, in _start_magic
    return start(line)
  File "/Users/jasmineyoung/opt/anaconda3/lib/python3.8/site-packages/tensorboard/notebook.py", line 152, in start
    start_result = manager.start(parsed_args)
  File "/Users/jasmineyoung/opt/anaconda3/lib/python3.8/site-packages/tensorboard/manager.py", line 401, in start
    working_directory=os.getcwd(),
FileNotFoundError: [Errno 

#### Results & Conclusions