# Complete guide

## Introduction

This Notebook contains an overview of the basic functionality of the simulator. It introduces the simplest ways to get started with the simulator, and it dives into more advanced concepts that will allow you to get a sense of the flexibility of the system. At the end of this guide, you should be able to run the pre-loaded simulations with custom parameters.

## Main components
A simulation needs the following components:
(Add figure of overview.)

- **Users**: agents who interact with each other and with items.
- **Model**: agent that defines the behavior of the sociotechnical system. The model mediates the interactions among users and between users and the system.
- **Items**: passive components that are served to the users by the model.
- **Measurements**: modules built into the models which automatically compute information about the system.

## Dynamics
The following steps are at the heart of the all simulations:
1. The **model** presents the **users** with some recommended **items**. In general, the items are chosen such that they maximize the probability of user engangement. This probability is based on the model's _prediction_ of user preferences.
2. The **users** view the items presented by the **model**, and interact with some **items** according to some _actual_ preferences.
3. The **model** updates its system state (such as the prediction of user preferences) based on the interactions of **users** with **items**, and it takes some **measurements**.

We will see how this framework is very flexible and it can be a generalization of many classic models.

## Quick start: instantiate a model and run
The fastest way to get started is to choose a model, instantiate it with no parameters, and run it for some time steps. Here we run a simple content filtering recommendation system.

Content filters infer information about the _attributes_ of users based on their past interactions and recommend items with similar attributes to those of users.

In [34]:
from rec.models import ContentFiltering
# Create ContentFiltering instance without arguments
default_filtering = ContentFiltering()
# Run for 5 time steps
default_filtering.run(timesteps=5)

100%|██████████| 5/5 [00:00<00:00, 18.94it/s]


In [35]:
# To collect the results of the measurements
results = default_filtering.get_measurements()
import pandas as pd
print("Results of the simulation:")
pd.DataFrame(results)

Results of the simulation:


Unnamed: 0,MSE,Timesteps
0,,0
1,162.99763100693693,1
2,162.761,2
3,162.761,3
4,162.76,4
5,162.759,5


In what follows, we expand on this minimal example to explore what happens under the hood. We offer suggestions on how to exploit the power of the framework and build on the pre-loaded models.

## Models

As in the ``Quick Start,`` if you want to run a simulation, the smallest piece of information you need is the model you want to run. There are a number of pre-loaded models that work out of the box. We continue to use a generic content filtering recommendation system.

Recall that content filters infer information about the _attributes_ of users based on their past interactions and recommend items with similar attributes to those of users.

In [3]:
# Again, we instantiate the model with no parameters
default_filtering = ContentFiltering()

In the cell above, we instantiated a content filtering recommender system with default parameters. We print below the default number of users and items in the system.

In [4]:
print("Number of users in system: %d" % default_filtering.num_users)
print("Number of items in system: %d" % default_filtering.num_items)

Number of users in system: 100
Number of items in system: 1250


The model also created a representation for both users and items.

In [5]:
print("In content filtering, the default parameters are given by:")
print("- An all-zeros matrix of users of size %s." % str(default_filtering.user_profiles.shape))
print("- A randomly generated matrix of items of size %s." % str(default_filtering.item_attributes.shape))

In content filtering, the default parameters are given by:
- An all-zeros matrix of users of size (100, 1000).
- A randomly generated matrix of items of size (1000, 1250).


Formally, content filtering supports user profiles of size `|num_users x num_attributes|` and item attributes of size `|num_attributes x num_items|`.

### Set number of users or items
We can customize the number of users in the system:

In [6]:
# instantiate content filter with a different number of users
number_of_users = 500
filtering = ContentFiltering(num_users=number_of_users)
print("The number of users in the system is now %d." % filtering.num_users)
print("The number of items in the system is still %d." % filtering.num_items)

The number of users in the system is now 500.
The number of items in the system is still 1250.


Or the number of items:

In [7]:
# instantiate with a different number of items
number_of_items = 5000
filtering = ContentFiltering(num_items = number_of_items)
print("The number of items in the system is now %d." % filtering.num_items)
print("The number of users in the system is back to %d." % filtering.num_users)

The number of items in the system is now 5000.
The number of users in the system is back to 100.


Or both:

In [8]:
# instantiate with a different number of items and users
number_of_items = 5000
number_of_users = 500
filtering = ContentFiltering(num_items = number_of_items, num_users=number_of_users)
print("The number of items in the system is now %d." % filtering.num_items)
print("The number of users in the system is now %d." % filtering.num_users)

The number of items in the system is now 5000.
The number of users in the system is now 500.


Note that the representations of items and users are set accordingly:

In [9]:
print("The size of item_attributes is %s." % str(filtering.item_attributes.shape))
print("The size of user_profiles is %s." % str(filtering.user_profiles.shape))

The size of item_attributes is (1000, 5000).
The size of user_profiles is (500, 1000).


## Users and Items
We might also want to define our own representation of users and items. We can do so by defining matrices that satisfy the constraints of the model. The constraints for ContentFiltering (some of which have been mentioned above) are:

- User profiles must be of size `|num_users x num_attributes|`.
- User profiles are matrices of integers representing the number of interactions of each user with items that have a given attributes. `user_profiles[i, j]` represents the number of interactions user `i` had with items with attribute `j`.
- Item attributes must be of size `|num_attributes x num_items|`.
- The model doesn't define any constraint on item attributes. If `item_attributes` is binary, then its `[i, j]`th element is 1 if item `j` is attributed attribute `i`; otherwise, it's 0. Item attributes can also be real-valued, representing the probability that each attribute has to describe items.

If you're already familiar with Numpy: the model is compatible with `ndarrays` and _array_like_ data structures. 

If you're not familiar with Numpy: the framework provides a random number generator that lets you draw from several distributions (which in practice is a thin wrapper around `numpy.random.Generator`). Please refer to the Numpy documentation for a [list of distributions](https://numpy.org/doc/stable/reference/random/generator.html?highlight=generator#distributions).

In [10]:
import numpy as np
# Keep the dimensions small for ease of visualization
number_of_users = 5
number_of_attributes = 10
number_of_items = 15
# We define user_representation using the standard integer generator in Numpy.
# We assume a number of interactions with each attribute in the interval [0,4).
user_representation = np.random.randint(4, size=(number_of_users, number_of_attributes))

# We define item_representation using the Generator that comes with the framework
# We assume a binary matrix with a binomial distribution
from rec.random import Generator
item_representation = Generator().binomial(n=1, p=.5,
                                           size=(number_of_attributes, number_of_items))
# Note that this is equivalent to:
# item_representation = np.random.Generator(np.random.MT19937()).binomial(n=1, p=.5, size=(...))

print("User representation:\n%s\n" % (str(user_representation)))
print("Item representation:\n%s" % (str(item_representation)))

User representation:
[[3 3 3 1 2 1 2 0 1 0]
 [1 0 1 1 2 1 2 2 2 2]
 [2 0 2 2 1 2 3 1 0 2]
 [1 1 3 3 0 3 1 1 2 0]
 [0 3 3 2 2 1 2 1 3 2]]

Item representation:
[[0 0 0 1 1 0 0 1 0 1 1 1 0 1 0]
 [1 0 1 1 1 0 0 1 0 0 0 1 0 1 1]
 [1 1 0 1 0 1 1 0 0 0 0 0 0 0 1]
 [1 0 0 1 0 0 0 0 1 0 1 0 1 0 1]
 [0 1 1 1 0 0 1 1 1 0 1 1 0 1 1]
 [1 0 0 0 1 0 0 1 1 1 1 0 1 1 0]
 [0 1 1 1 0 1 1 1 0 0 0 0 1 1 0]
 [1 1 0 1 0 1 1 0 0 1 1 0 0 0 0]
 [0 1 0 1 1 1 1 1 0 0 0 0 0 0 0]
 [0 0 1 1 1 0 1 1 0 1 0 1 0 0 0]]


In [11]:
# Initialize with custom representations
filtering = ContentFiltering(user_representation=user_representation,
                            item_representation=item_representation)

# Check if they're equivalent
is_user_equivalent = "yes" if np.array_equal(user_representation, filtering.user_profiles) else "no"
is_item_equivalent = "yes" if np.array_equal(item_representation, filtering.item_attributes) else "no"
print("Is user_profiles equivalent to user_representation? %s." % is_user_equivalent)
print("Is item_attributes equivalent to item_representation? %s." % is_item_equivalent)

Is user_profiles equivalent to user_representation? yes.
Is item_attributes equivalent to item_representation? yes.


You can also initialize models with `user_representation` and `item_representation` individually. In this case, the representation that has not been initialized will adapt to the size defined by the user.

In [12]:
# Let's only initialize user_profiles
filtering = ContentFiltering(user_representation=user_representation)
print("After initializing user_profiles, the size of item_attributes adapts automatically to it.")
print("Size of user_profiles, as defined above: %s." % str(filtering.user_profiles.shape))
print("Size of item_attributes: %s.\n" % str(filtering.item_attributes.shape))

# The same happens by only initializing item_attributes
filtering = ContentFiltering(item_representation=item_representation)
print("After initializing item_attributes, the size of user_profiles adapts automatically to it.")
print("Size of item_attributes, as defined above: %s." % str(filtering.item_attributes.shape))
print("Size of user_profiles: %s." % str(filtering.user_profiles.shape))

After initializing user_profiles, the size of item_attributes adapts automatically to it.
Size of user_profiles, as defined above: (5, 10).
Size of item_attributes: (10, 1250).

After initializing item_attributes, the size of user_profiles adapts automatically to it.
Size of item_attributes, as defined above: (10, 15).
Size of user_profiles: (100, 10).


## Run a simulation
We can run a simulation for the predefined number of time steps (50), or define our own duration.

In [13]:
# let's initialize a model with both user_representation and item_representation defined above
filtering = ContentFiltering(user_representation=user_representation,
                            item_representation=item_representation)
# Run the model for the predefined number of timesteps:
filtering.run()

100%|██████████| 50/50 [00:00<00:00, 340.09it/s]


At the end of the simulation, we can examine the results of the measurements. For example:

In [25]:
# To get the measurements of all timesteps<=50
measurements = filtering.get_measurements()

# Measurements can be easily converted to pandas DataFrame objects
import pandas as pd
pd.DataFrame(measurements).head()

Unnamed: 0,MSE,homogeneity,Timesteps
0,,,0
1,32.091746147208134,-98.0,1
2,31.8499,-0.5,2
3,31.8495,0.5,3
4,31.8497,0.5,4


## Measurements
At each time step of the simulation, measurements calculate a quantity based on the system state. An example of such quantity is the mean squared error between the predicted user profiles and the actual user profiles -- that is, how close is the model to predicting the real preferences of the system?

It's easy to define new metrics, but in this guide we will use one of the pre-loaded metrics to get a better sense of how they work. For a list of pre-loaded metrics and their full descriptions, see the docs [LINK].

### View metrics

First, we note that the content filtering recommender system, with its default settings, only tracks one metrics: the mean squared error for user profiles.

In [18]:
# The metrics tracked by each model can be examined by printing the `metrics` list.
print(filtering.metrics)

[<rec.metrics.measurement.MSEMeasurement object at 0x10a2344d0>]


### Add metrics
To **maintain compatibility with pandas**, we suggest to **only add metrics to instances of models that have not been run yet**. This is to avoid having measurements that start at different time steps, resulting in arrays of different length.

We can instantiate a new model, add a new metric, and then run the model.

We will add the `HomogeneityMeasurement`, which provides a measure of the homogeneity of user interactions in the system as a whole.

In [45]:
filtering = ContentFiltering(num_users=5, num_items=10, num_attributes=10)
from rec.metrics import HomogeneityMeasurement
# This method accepts a variable number of metrics
filtering.add_metrics(HomogeneityMeasurement())
print(filtering.metrics)

[<rec.metrics.measurement.MSEMeasurement object at 0x11c8b8550>, <rec.metrics.measurement.HomogeneityMeasurement object at 0x11c590fd0>]


In [46]:
# now we run the model
filtering.run(timesteps=5)
measurements = filtering.get_measurements()
pd.DataFrame(measurements)

100%|██████████| 5/5 [00:00<00:00, 77.41it/s]


Unnamed: 0,MSE,homogeneity,Timesteps
0,,,0
1,0.7524791781120291,-4.5,1
2,0.27224,0.0,2
3,0.274983,0.0,3
4,0.273922,0.0,4
5,0.269191,0.0,5


Measurements at time step 0 can be undefined (`None`, `NaN`, etc.) because it denotes the measurements before the start of the simulation. MSE is undefined at the beginning because the system has not yet made predictions on the user profiles; similarly, homogeneity is meaningless before the simulation begins because there are no user interactions to consider.

## System state


Some applications might require storing the system's internal state for future processing. This is useful, for example, to study the evolution of predicted user profiles.

In [47]:
system_state = filtering.get_system_state()
# System state is likely to be tricky to convert to pandas DataFrame
print(system_state.keys())

dict_keys(['Actual user scores', 'Items', 'Predicted scores', 'Timesteps'])


In [51]:
# now we can take a look at the system state
system_state['Predicted scores']
filtering.predicted_scores

array([[0.65217391, 0.52173913, 0.60869565, 0.82608696, 0.34782609,
        0.39130435, 0.86956522, 0.82608696, 0.47826087, 0.73913043],
       [0.26470588, 0.79411765, 1.        , 0.20588235, 0.55882353,
        0.88235294, 0.41176471, 0.79411765, 0.35294118, 0.64705882],
       [0.4       , 0.8       , 1.        , 0.2       , 1.        ,
        0.6       , 0.4       , 0.8       , 0.2       , 0.6       ],
       [0.54545455, 0.31818182, 0.72727273, 0.54545455, 0.27272727,
        0.45454545, 0.81818182, 0.72727273, 0.86363636, 0.90909091],
       [0.28571429, 1.        , 0.85714286, 0.28571429, 0.57142857,
        0.71428571, 0.42857143, 0.57142857, 0.14285714, 0.57142857]])

### Other parameters: general list
We can also modify other parameters, specifically:
- verbose: (default: False) if True, it enables a log of the main events in the system.
- num_items_per_iter
- num_new_items

### Specific to ContentFiltering