# Non-marketplace simulations

### A. Introduction

In this notebook you will see how to run the four classes of non-marketplace simulations. The factor that differentiates these simulations is the number of Rho and Herding parameters considered in each of them. At this point it might be useful to formulate a brief reminder about the nature of the aforementioned parameters:

- __Rho (ρ)__: Represents the cost of posting a rating measured in terms of the difference between the expected product experience and the actual product experience. The higher Rho is, the higher the difference between the expected and the actual experience will have to be for the user to leave a review about the product. 



- __Herding parameter__: A product-specific measurement of the probability of exhibiting herding behavior (i.e. be influenced by previous users' reviews at the time of posting a review). The higher the herding parameter is, the more likely it is that the review of a user will be influenced by all previously posted reviews.


As mentioned, there are four different classes of non-marketplace simulations of increasing complexity defined by the number of Rho and herding parameters considered.

1. __Single Rho Simulation__: It is represented by the SingleRhoSimulator class. This configuration uses a single value for the rho parameter irrespective of whether the actual product experience was better or worse than expected. This implies that the cost of rating is the same irrespective of whether the sentiment behind such rating is negative or positive.  


2. __Double Rho Simulation__: It is represented by the DoubleRhoSimulator class, which is a child class of the single SingleRhoSimulator. This kind of simulation allows for the existence of two different rho parameters. One of these will apply for the cases when the the difference between the actual and the expected product experience is positive (denoted as ρ+) while the other applies in those cases where such difference is negative (denoted as ρ-). This kind of simulation allows you to consider different costs for leaving a review depending on the sentiment (positive/negative) behind it.


3. __Herding Simulation__: It is represented by the HerdingSimulator class, which is a child class of the DoubleRhoSimulation class. It adds a herding parameter to the Double Rho Simulation. Now in case a given user decides to leave a review, (i.e. the difference between the actual and the expected product experience is larger than the corresponding rho parameter) it will be subject to a given probability of exhibiting herding behaviour while doing so.


4. __Double Herding Simulation__: It is represented by the DoubleHerdingSimulator class which is a child of the HerdingSimulatorClass. A second product-specific herding parameter is included. One of the herding parameters will apply when the visitor's intended rating is above a given metric of choice (the mode or median are currently implemented) of the previously existing rating, while the other applies when the intended rating is below such metric. 

If you want to see more in detail these different simulator classes you can visit the "simulator_class.py" file located in "snpe/snpe/simulations/" inside the repo, where you will find the code behind these classes.


### B. Preparing the simulation

After having reviewed the differences between the four classes of non-marketplace simulations, let's start by executing the following cell containing the relevant imports to run the simulations.

In [3]:
%load_ext autoreload
%autoreload 2

### Just a formatting related plugin
%load_ext nb_black

%matplotlib inline
import matplotlib.pyplot as plt

import sys

sys.path.append("../")

import multiprocessing as mp

from collections import deque
from pathlib import Path
from typing import Dict, Optional

import arviz

import numpy as np
import pandas as pd
import pyreadr
import sbi
import sbi.utils as sbi_utils
import seaborn as sns
import statsmodels.formula.api as smf

import torch  # -rm

from joblib import Parallel, delayed
from matplotlib.lines import Line2D
from scipy.stats import ttest_ind
from snpe.inference import inference_class
from snpe.simulations import simulator_class
from snpe.utils.statistics import review_histogram_correlation
from snpe.utils.tqdm_utils import tqdm_joblib
from tqdm import tqdm

### Set plotting parameters
sns.set(style="white", context="talk", font_scale=2.5)
sns.set_color_codes(palette="colorblind")
sns.set_style("ticks", {"axes.linewidth": 2.0})

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload
The nb_black extension is already loaded. To reload it, use:
  %reload_ext nb_black


<IPython.core.display.Javascript object>

In the cell below the path where the output of the simulation will be stored is defined, modify it to match your storage preferences.

In [4]:
ARTIFACT_PATH = Path("../../../")

<IPython.core.display.Javascript object>

Another option to adjust is the number of CPUs involved in the simulation. The cell below adjusts it to employ all the CPUs available.

In [5]:
print(f"The number of available CPUs is: {mp.cpu_count()}")
torch.set_num_threads(mp.cpu_count())
print(f"The number of CPUs to be employed will be {torch.get_num_threads()}")

The number of available CPUs is: 8
The number of CPUs to be employed will be 8


<IPython.core.display.Javascript object>

### C. The simulation and its arguments

The generate and save function defined below, as you will see, is in charge of calling the desired simulator class and delivering the necessary arguments to run the simulation. Let's briefly review which are the arguments involved in the simulation.

- __num_simulations__: Number of customers (Integer) to simulate. Each one is equivalent to the entire journey of a consumer consisting of:
    1. Reviewing the prior reviews of a product and forming an expectation of the product
    2. Purchase and experience the product
    3. Decide whether to leave a rating or not based on the difference between the expected experience and the actual experience. If the result of the decision is positive the consumer will continue to points 4 and 5, otherwise the simulation ends here.
    4. "Decide" whether to exhibit herding behavior in its rating in case it is being posted.
    5. Post the rating.
    

- __review_prior__: Set of initial ratings of the product that are pre-loaded before the simulation starts, taking the shape of an array of five integer values. By default, this is set as an array of five ones. This implies that by the time the first consumer of the simulation reviews the product 5 prior reviews will be observed, each of them assigned to one of the five values composing the rating scale [1 - 5]. 


- __tendency_to_rate__: Underlying tendency to rate for all consumers taking float values in the interval [0,1]. In other words, this is the proportion of consumers that will post a rating regardless of the value of the rho parameter(s) and the difference between their actual and expected product experience. If set at the default value of 0.05, 5% of all consumers will post a rating independently of the other factors at play in the simulation. This is necessary to address the "cold start" problem where by random chance for some products, we might have high enough values of rho that no visitors ever leave a rating.


- __simulation_type__: Type of simulation output to produce between timeseries and histogram. Accepts the strings "timeseries" and "histogram" as inputs. Returns the time series of the simulated ratings, in a cumulative histogram format (so, the order of rating accumulation is preserved) if "timeseries" is chosen. For "histogram", returns the final histogram of ratings (and throws away the order of rating accumulation). 


- __previous_rating_measure__: Measure of previous ratings that will be taken as a reference when experiencing herding behavior. It can be either the mean, the mode or the latest review posted. For example, if a consumer leaves a rating being subject to herding and this parameter is set as mode, it will herd towards the mode of all previous reviews. This argument is specific to the Herding and Double herding simulations and takes the strings "mode", "mean" and "latest" as valid inputs.


- __min_reviews_for_herding__: Minimum number of pre-existing reviews for a consumer to be able to be subject to herding behavior. It has to be an integer value larger than 0. This argument is specific to the Herding and Double herding simulations.


- __herding_differentiating_measure__: Measure of the already posted ratings to be considered as threshold around which the different herding parameters will apply. This measure can be the mean or the mode of the previous ratings. In case this parameter is set as "mean" one herding parameter will apply if the consumer's intended rating is above the mean while a different one will do if it is below. This argument is specific to the Double herding simulation and takes the strings "mode" and "mean" as valid inputs.


- __simulation_class__:  Simulation class to run between the four existing ones. It takes any of the strings described below as a valid input:  

    - "singlerho" for Single rho simulation
    - "doublerho" for Double rho simulation
    - "herding" for Herding simulation
    - "doubleherding" for  Double herding simulation

In [6]:
def generate_and_save_simulations(
    num_simulations: int = None,
    review_prior: np.array = None,
    tendency_to_rate: float = None,
    simulation_type: str = None,
    previous_rating_measure: str = "mean",
    min_reviews_for_herding: int = 5,
    herding_differentiating_measure: str = "mean",
    simulation_class: int = "singlerho",
):
    assert simulation_class in [
        "singlerho",
        "doublerho",
        "herding",
        "doubleherding",
    ], " Can only use singlerho/doublerho/herding/doubleherding as simulation classes, please enter a valid class."
    params = {
        "review_prior": review_prior,
        "tendency_to_rate": tendency_to_rate,
        "simulation_type": simulation_type,
        "previous_rating_measure": previous_rating_measure,
    }

    params_double = params.copy()
    params_herding = params.copy()
    params_herding.update(
        {
            "previous_rating_measure": previous_rating_measure,
            "min_reviews_for_herding": min_reviews_for_herding,
        }
    )

    params_double_herding = params_herding.copy()
    params_double_herding.update(
        {"herding_differentiating_measure": herding_differentiating_measure}
    )

    simulation_classes = {
        "singlerho": simulator_class.SingleRhoSimulator(params),
        "doublerho": simulator_class.DoubleRhoSimulator(params_double),
        "herding": simulator_class.HerdingSimulator(params_herding),
        "doubleherding": simulator_class.DoubleHerdingSimulator(params_double_herding),
    }

    simulator = simulation_classes.get(simulation_class)
    print("Simulation type" + str(simulation_classes.get(simulation_class)))
    simulator.simulate(num_simulations=num_simulations)
    simulator.save_simulations(ARTIFACT_PATH)

<IPython.core.display.Javascript object>

### D. Running the simulations

Finally, we can run the different simulations by calling the "generate_and_save" function and providing the appropriate arguments. The examples below show how to run the four different classes of simulations employing a set of example arguments shaping each of them such that: 

1. Ten consumers will be generated.
2. Five ratings (one for each rating value) are pre-loaded.
3. The tendency to rate is set at 5%.
4. It will return a time series of the simulated ratings.

In the case of the Herding simulation, the three additional arguments required are provided as follows:
 
5. The mode (of all previous ratings) is taken as the reference metric for herding.
6. At least five previous ratings are required for herding to start happening.

For the case of the Double herding simulation, its idiosyncratic parameter determines that:

7. The threshold for the Double herding is determined by the mean of all previous ratings.


#### D.1 Single Rho simulation

In [7]:
generate_and_save_simulations(
    10, np.ones(5), 0.05, "timeseries", simulation_class="singlerho"
)

Simulation type<snpe.simulations.simulator_class.SingleRhoSimulator object at 0x7fc12e9b32b0>


Simulations: 100% 10/10 [00:13<00:00,  1.33s/it]
  self.simulations = np.array(simulations)


<IPython.core.display.Javascript object>

#### D.2 Double Rho simulation

In [8]:
generate_and_save_simulations(
    10, np.ones(5), 0.05, "timeseries", simulation_class="doublerho"
)

Simulation type<snpe.simulations.simulator_class.DoubleRhoSimulator object at 0x7fc12e901580>


Simulations: 100% 10/10 [00:02<00:00,  3.78it/s]


<IPython.core.display.Javascript object>

#### D.3 Herding simulation

In [9]:
generate_and_save_simulations(
    10, np.ones(5), 0.05, "timeseries", "mode", 5, simulation_class="herding"
)

Simulation type<snpe.simulations.simulator_class.HerdingSimulator object at 0x7fc12e7c2ca0>


Simulations: 100% 10/10 [00:12<00:00,  1.24s/it]


<IPython.core.display.Javascript object>

#### D.4 Double herding simulation

In [10]:
generate_and_save_simulations(
    10,
    np.ones(5),
    0.05,
    "timeseries",
    "mode",
    5,
    "mean",
    simulation_class="doubleherding",
)

Simulation type<snpe.simulations.simulator_class.DoubleHerdingSimulator object at 0x7fc12e9017f0>


Simulations: 100% 10/10 [00:09<00:00,  1.00it/s]


<IPython.core.display.Javascript object>

### Appendix 1: Rating Scale simulation

Since this tutorial was first written a new kind of non-marketplace simulator, the Rating Scale simulation, has been implemented. It is represented by the RatingScaleSimulator class, which is a child class of the HerdingSimulator class. Its main addition is to introduce 5 new parameters that create a rating scale to determine the value of the review that each consumer will post (1 to 5 stars) depending on the difference between its expected and actual experience with the product (delta). This rating scale is delimited by the following two elements:

- __five_star_highest_limit__: Upper bound for delta that will result in a 5-star rating. The actual limit will lie between `five_star_highest_limit` and `five_star_highest_limit` * 0.5. If delta is larger than this limit, the user's review will be a 5-star one. 


- __one_star_lowest_limit__:  Lower bound for delta that will result in a 1-star rating. The actual limit will lie between `one_star_lowest_limit` and `one_star_lowest_limit` * 0.5. If delta is lower than this limit, the user's review will be a 1-star one. 

Considering these two limits the rating scale is governed by the following four parameters introduced by the Rating Scale simulator:

1. __P5__: taking values between 0.5 and 1, it determines the limit to which delta is compared to get a 5-star rating. Such limit will be equivalent to: five_star_highest_limit * p5. Thus, for example, if p5 = 1 then the limit will be five_star_highest_limit.


2. __P4__: taking values between 0.25 and 0.75, it determines the limit to which delta is compared to get a 4-star rating. Such limit will be equivalent to: five_star_highest_limit * p5 * p4.


3. __P1__: taking values between 0.5 and 1, it determines the limit to which delta is compared to get a 1-star rating. Such limit will be equivalent to: one_star_lowest_limit * p1. Thus, for example, if p1 = 1 then the limit will be one_star_lowest_limit.


4. __P2__: taking values between 0.25 and 0.75, it determines the limit to which delta is compared to get a 2-star rating. Such limit will be equivalent to: one_star_lowest_limit * p1 * p2.


Having this in mind, 3-star ratings will come from the space where delta is in between one_star_lowest_limit * p1 * p2 to zero, and from zero to five_star_highest_limit * p5 * p4.

Lastly, the fifth parameter introduced by the rating scale simulator is:

5. __bias_5_star__ : Probability that a user is biased towards five stars and leaves a 5 star rating irrespective of its product experience.