# Create your own models
This Notebook illustrates how to create your own models using the framework. At the end of this guide, you'll be able to run simulations with your own models. For a guide on how to create new metrics, please see [advanced-metrics](advanced-metrics.ipynb). In what follows, we assume you are familiar with the main concepts of the framework shown in [complete-guide](complete-guide.ipynb).

## Dynamics
Recall that the dynamics of the framework are expressed by the following steps:
> 1. The **model** presents the **users** with some recommended **items**. In general, the items are chosen such that they maximize the probability of user engangement. This probability is based on the model's _prediction_ of user preferences.
> 2. The **users** view the items presented by the **model**, and interact with some **items** according to some _actual_ preferences.
> 3. The **model** updates its system state (such as the prediction of user preferences) based on the interactions of **users** with **items**, and it takes some **measurements**.

## Skeleton code
This code illustrates the skeleton to define a new model, NewModel:
```python
from rec.models import BaseRecommender

class NewModel(BaseRecommender):
    def __init__(self, ...):
        # ...
        BaseRecommender.__init__(self, ...)
    
    def _update_internal_state(self, interactions):
        # ...
    
    def train(self, user_profiles=None, item_attributes=None):
        predicted_scores = ...
        return predicted_scores
```

### `__init__`

1. The constructor must initialize a number of data structures to pass to the parent constructor. Importantly, it must initialize:
    - An array of user preferences as predicted by the system (`user_representation`)
    - An array of item attributes in the system (`item_representation`)
    - ~~A representation of real user preferences, which will not be used by to the system to make predictions (`actual_user_representation`)~~ (will fix asap)
    - The number of users in the system (`num_users`)
    - The number of items in the system (`num_items`)
    - The number of items presented to each user at each time step (`num_items_per_iter`)
    - (Optional) The metrics that the system should monitor (`measurements`)
    - (Optional) A list of system state components to monitor in addition to the "Observables" already present in the system (`system_state` see [advanced-metrics](advanced-metrics.ipynb) for more details)
    - (Optional) A bool to toggle a log of the main events in the system (`verbose`)
    - (Optional) A seed for random number generators (`seed`)
 
2. The constructor must also initialize any class attribute that is _not_ already being initialized by `BaseRecommender`. `BaseRecommender` initializes (see [docs](https://elucherini.github.io/algo-segregation/reference/models.html#module-models.recommender) and code for more details):
    - `user_profiles`, from `user_representation`
    - `item_attributes`, from `item_representation`
    - `predicted_scores`, calculated internally
    - `actual_users`, drawn randomly internally for the moment -- later from `actual_user_representation`
    - `num_users`, from `num_users`
    - `num_items`, from `num_items`
    - `num_items_per_iter`, from `num_items_per_iter`
    - `random_state`, using `seed`
    - `measurements`, monitoring all metrics in `measurements`, and, by default, the MSE between user profile predictions and actual user scores
    - `_system_state`, monitoring all "Observables" in `system_state`, and, by default: `user_profiles`, `item_attributes`, `predicted_scores`, and `actual_users`.
    - `verbose`, from `verbose`
    
   Any other class attribute (e.g., [infection state](https://elucherini.github.io/algo-segregation/reference/models.html#models.bass.InfectionState)) must be inizialized by `NewModel`.
   
3. A current requirement is that `user_representation` and `item_representation` be of compatible size for dot products (e.g., `num_users x 1` and `1 x num_items`, respectively). This requirement is necessary to use the `train()` function in `BaseRecommender` -- which can be easily overridden if needed.
    
4. The constructor must then call the `BaseRecommender` constructor respecting its signature.

#### Concrete example
A possible implementation of the constructor is (adapted from [BassModel](https://elucherini.github.io/algo-segregation/reference/models.html#module-models.bass) ([link to paper](https://5harad.com/papers/twiral.pdf)):

```python
# We define default values in the signature so we can call the constructor with no argument
def __init__(self, num_users=100, num_items=1, infection_state=None,
    item_representation=None, user_representation=None, infection_thresholds=None,
    actual_user_scores=None, verbose=False, num_items_per_iter=1,
    seed=None):
    # We use the internal random number generator to generate the data structures
    # that have not been defined by the user
    # For simplicity, we ignore seed, actual_user_scores, and system_state
    from rec.random import Generator
    generator = Generator()

    if item_representation is None:
        # in this model, items are represented by their probability of infection
        item_representation = generator.uniform(size=(1,num_items))
    if user_representation is None:
        # users are represented by a social network adjacency matrix
        import networkx as nx
        from rec.random import SocialGraphGenerator
        user_representation = SocialGraphGenerator.generate_random_graph(n=num_users,
                                                                    p=0.3)
    # we consider a binary infection_state matrix: if element [u, i] is 1, then user u has been infected by item i
    if infection_state is None:
        # We start with one random user being infected
        infection_state = np.zeros((num_users, num_items))
        infected_users = generator.integers(num_users)
        infectious_items = generator.integers(num_items)
        infection_state[infected_users, infectious_items] = 1
    # this class attribute is not initialized in BaseRecommender, so we must do it here (notice the "self.")
    self.infection_state = infection_state

    if infection_thresholds is None:
        # these threshold represent how likely is each user to be infected
        infection_thresholds = abs(generator.uniform(size=(1, num_users)))
    # another class attribute that is not initialized in BaseRecommender
    self.infection_thresholds = infection_thresholds
    # we want the model to monitor its "Structural Virality" (see paper cited above or dedicated Notebook)
    from rec.metrics import StructuralVirality
    measurements = [StructuralVirality(np.copy(infection_state))]
    # Initialize base class
    BaseRecommender.__init__(self, user_representation, item_representation,
                         actual_user_scores, num_users, num_items,
                         num_items_per_iter,
                         measurements=measurements, system_state=system_state,
                         verbose=verbose, seed=seed)
```

### `train`
This function updates the predicted user scores (that is, a `num_users x num_items` matrix that the system uses to make predictions on the items to present to users).

**Overriding it is not a required step**: `BaseRecommender` provides a system to update the user scores that, in short, calculates the dot product between  `user_profiles` and `item_attributes`:

```python
    predicted_scores = numpy.dot(user_profiles, item_attributes)
```

If we need further functionality, it is recommended to override `train()`.

1. Use this signature: `train(self, user_profiles, item_attributes, normalize)`
2. Return a `num_users x num_items` matrix

#### Concrete example

We override train in [BassModel](https://elucherini.github.io/algo-segregation/reference/models.html#module-models.bass):

```python
    def train(self, user_profiles=None, item_attributes=None, normalize=False):
        # normalizing the user profiles is meaningless here
        # This formula comes from Goel et al., The Structural Virality of Online Diffusion
        if user_profiles is None:
            user_profiles = self.user_profiles
        # here we could technically just call BaseRecommender.train()
        dot_product = np.dot(user_profiles,
            self.infection_state*np.log(1-self.item_attributes))
        # Probability of being infected at the current iteration
        predicted_scores = 1 - np.exp(dot_product)
        return predicted_scores
```

### `_update_internal_state`

This function is called at each timestep, right after the system has collected the interactions from users. In this step, we update the internal state of the system based on the user interactions. `interactions` is an array of size `num_users` in which element `u` is the index of the item that user `u` has interacted with.

So the necessary steps are:
1. The signature must be `_update_internal_state(self, interactions)`
2. It should not return anything; all necessary updates must be in the body of the function.

#### Concrete example
Still following [BassModel](https://elucherini.github.io/algo-segregation/reference/models.html#module-models.bass), this a possible implementation of `_update_internal_state`.

```python
# In the Bass model, we update infection_state if there has been a new infection
def _update_internal_state(self, interactions):
    # Get infection probabilities of the items that the users have interacted with
    # These predictions are assumed to be stored in predicted_scores by train()
    infection_probabilities = self.predicted_scores[self.actual_users._user_vector,
                                                    interactions]
    # find new infections based on infection_thresholds
    newly_infected = np.where(infection_probabilities > self.infection_thresholds)
    # update infection_state based on new infections
    if newly_infected[0].shape[0] > 0:
        self.infection_state[newly_infected[1], interactions[newly_infected[1]]] = 1
```

## Putting everything together

In [29]:
import networkx as nx
import numpy as np
from rec.metrics import StructuralVirality
from rec.models import BaseRecommender
from rec.random import Generator


class NewModel(BaseRecommender):
    def __init__(self, num_users=100, num_items=1, infection_state=None,
        item_representation=None, user_representation=None, infection_thresholds=None,
        actual_user_scores=None, verbose=False, num_items_per_iter=1,
        seed=None):
        
        generator = Generator()

        if item_representation is None:
            item_representation = generator.uniform(size=(1,num_items))
        if user_representation is None:
            from rec.random import SocialGraphGenerator
            user_representation = SocialGraphGenerator.generate_random_graph(n=num_users,
                                                                        p=0.0001)
        if infection_state is None:
            infection_state = np.zeros((num_users, num_items))
            infected_users = generator.integers(num_users)
            infectious_items = generator.integers(num_items)
            infection_state[infected_users, infectious_items] = 1
        self.infection_state = infection_state

        if infection_thresholds is None:
            infection_thresholds = abs(Generator().uniform(size=(1, num_users)))
        self.infection_thresholds = infection_thresholds
        measurements = [StructuralVirality(np.copy(infection_state))]
        BaseRecommender.__init__(self, user_representation, item_representation,
                             actual_user_scores, num_users, num_items,
                             num_items_per_iter,
                             measurements=measurements, system_state=None,
                             verbose=verbose, seed=seed)
    
    def train(self, user_profiles=None, item_attributes=None, normalize=False):
        if user_profiles is None:
            user_profiles = self.user_profiles
        dot_product = np.dot(user_profiles,
            self.infection_state*np.log(1-self.item_attributes))
        predicted_scores = 1 - np.exp(dot_product)
        return predicted_scores
    
    def _update_internal_state(self, interactions):
        infection_probabilities = self.predicted_scores[self.actual_users._user_vector,
                                                        interactions]
        newly_infected = np.where(infection_probabilities > self.infection_thresholds)
        if newly_infected[0].shape[0] > 0:
            self.infection_state[newly_infected[1], interactions[newly_infected[1]]] = 1

## And now let's use it to run a simulation:

In [35]:
model = NewModel(num_users=1500)
model.run(timesteps=10)

100%|██████████| 10/10 [00:00<00:00, 142.16it/s]


In [36]:
measurements = model.get_measurements()
measurements['num_infected']
# these numbers are ridiculous tbh, I will change the random generators to more realistic scenarios

[1, 63, 1404, 1500, 1500, 1500, 1500, 1500, 1500, 1500, 1500]