# Inference

Inference on simulational parameters is carried out employing Sequential Neural Posterior Estimation (SNPE), implemented by the sbi package. 

### A. Imports and preliminary steps

First of all, the relevant imports are carried out in the cell below.

In [1]:
%load_ext autoreload
%autoreload 2

# Just a formatting related plugin
%load_ext nb_black

%matplotlib inline
import matplotlib.pyplot as plt

import sys

sys.path.append("../")

import multiprocessing as mp

from collections import deque
from pathlib import Path
from typing import Dict, Optional

import arviz
import pickle

import numpy as np
import pandas as pd
import pyreadr
import sbi
import sbi.utils as sbi_utils
import seaborn as sns
import statsmodels.formula.api as smf
import torch

from joblib import Parallel, delayed
from matplotlib.lines import Line2D
from scipy.stats import ttest_ind
from snpe.inference import inference_class
from snpe.simulations import simulator_class, marketplace_simulator_class
from snpe.embeddings.embeddings_to_ratings import EmbeddingRatingPredictor
from snpe.utils.statistics import review_histogram_correlation
from snpe.utils.tqdm_utils import tqdm_joblib
from tqdm import tqdm

# Set plotting parameters
sns.set(style="white", context="talk", font_scale=2.5)
sns.set_color_codes(palette="colorblind")
sns.set_style("ticks", {"axes.linewidth": 2.0})

  from .autonotebook import tqdm as notebook_tqdm


<IPython.core.display.Javascript object>

Then it is time to define the path were the inference output is going to be stored

In [2]:
ARTIFACT_PATH = Path("../../..")

# Estos path de abajo venían de antes, el de arriba es el que te has montado para que te furule a ti
# ARTIFACT_PATH = Path("../../../gcs_mount/artifacts/rating_spacing_simulator")
# ARTIFACT_PATH = Path("/data/reputation-systems/snpe/artifacts/rating_spacing_simulator")

<IPython.core.display.Javascript object>

### B. Main function and its arguments

The function in the code cell below is designed to carry out inference on the basis of time series simulation output. Firstly a uniform distribution of parameters is initialized by defining the upper and lower bounds for the different parameters. Then the inferrer is instantiated (See inference_class.py) and the simulator is loaded alongside the inputs for the inference model. Lastly the inference model is trained, the posterior is built and the results are stored in the previously specified path as a .pkl file containing other relevant parameters alongside the posteriors.

The arguments of the function that will have to be provided are the following:

- `device`: device where the inference model will be trained. Must be str 'cuda' or 'cpu'.
- `simulator_type`: type of the simulator to be loaded (e.g. double_herding, marketplace...).
- `simulation_type`: modality of simulation that is to be fed to the inference model Must be str 'timeseries' or 'histogram'.
- `params`: dictionary of parameters to configure inference model.

In [3]:
def infer_and_save_posterior(
    device: str, simulator_type: str, simulation_type: str, params: Dict
) -> None:
    parameter_prior = sbi_utils.BoxUniform(
        low=torch.tensor([0.0, 0.0, 0.0, 0.0, 0.5, 0.25, 0.25, 0.5]).type(
            torch.FloatTensor
        ),
        high=torch.tensor([4.0, 4.0, 1.0, 1.0, 1.0, 0.75, 0.75, 1.0]).type(
            torch.FloatTensor
        ),
        device=device,
    )
    inferrer = inference_class.TimeSeriesInference(
        parameter_prior=parameter_prior, device=device
    )
    inferrer.load_simulator(
        dirname=ARTIFACT_PATH,
        simulator_type=simulator_type,
        simulation_type=simulation_type,
    )
    batch_size = params.pop("batch_size")
    learning_rate = params.pop("learning_rate")
    hidden_features = params.pop("hidden_features")
    num_transforms = params.pop("num_transforms")
    inferrer.infer_snpe_posterior(
        embedding_net_conf=params,
        batch_size=batch_size,
        learning_rate=learning_rate,
        hidden_features=hidden_features,
        num_transforms=num_transforms,
    )
    inferrer.save_inference(ARTIFACT_PATH)

<IPython.core.display.Javascript object>

Defining the dictionary of parameters that has to be provided as argument to `infer_and_save_posterior`. The values for the parameter included below values were obtained after carrying a hyperparameter optimization process.

In [4]:
# diccionario de parámetros requerido para utilizar la función

inference_params = {
    "batch_size": 64,
    "learning_rate": 3.1e-4,
    "hidden_features": 70,
    "num_transforms": 8,
    "num_conv_layers": 2,
    "num_channels": 9,
    "conv_kernel_size": 17,
    "maxpool_kernel_size": 11,
    "num_dense_layers": 2,
}

<IPython.core.display.Javascript object>

### C. Training inference model and obtaining posterior distributions

Now that it has been defined and all the ingredients are ready we can call the main function `infer_and_save_posterior`.

In [5]:
infer_and_save_posterior("cuda", "marketplace", "timeseries", inference_params)

  "GPU was selected as a device for training the neural network. "


Embedding net created: 
 Sequential(
  (0): Conv1d(5, 9, kernel_size=(17,), stride=(1,), padding=(8,))
  (1): LeakyReLU(negative_slope=0.01)
  (2): Conv1d(9, 9, kernel_size=(17,), stride=(1,), padding=(16,), dilation=(2,))
  (3): MaxPool1d(kernel_size=11, stride=11, padding=0, dilation=1, ceil_mode=False)
  (4): Flatten(start_dim=1, end_dim=-1)
  (5): LeakyReLU(negative_slope=0.01)
  (6): Linear(in_features=27, out_features=64, bias=True)
  (7): LeakyReLU(negative_slope=0.01)
  (8): Linear(in_features=64, out_features=32, bias=True)
)


  f"Parameters theta has device '{theta.device}'. "


 Neural network successfully converged after 122 epochs.
        -------------------------
        ||||| ROUND 1 STATS |||||:
        -------------------------
        Epochs trained: 122
        Best validation performance: -2.1384
        -------------------------
        


<IPython.core.display.Javascript object>

### (Provisional/Draft) Z. Other uncommented functions and snippets to potantially incorporate to the proper notebook

Include the following functions in the notebook?

In [None]:
def sample_posterior_with_observed(
    device: str,
    observations: np.array,
    num_samples: int,
    simulator_type: str,
    simulation_type: str,
) -> np.ndarray:
    # The parameter prior doesn't matter here as it will be overridden by that of the loaded inference object
    parameter_prior = sbi.utils.BoxUniform(
        low=torch.tensor([0.0, 0.0, 0.0, 0.0, 0.5, 0.25, 0.25, 0.5]).type(
            torch.FloatTensor
        ),
        high=torch.tensor([4.0, 4.0, 1.0, 1.0, 1.0, 0.75, 0.75, 1.0]).type(
            torch.FloatTensor
        ),
        device=device,
    )
    inferrer = inference_class.TimeSeriesInference(
        parameter_prior=parameter_prior, device=device
    )
    inferrer.load_simulator(
        dirname=ARTIFACT_PATH,
        simulator_type=simulator_type,
        simulation_type=simulation_type,
    )
    inferrer.load_inference(dirname=ARTIFACT_PATH)
    posterior_samples = inferrer.get_posterior_samples(
        observations, num_samples=num_samples
    )
    return posterior_samples

In [None]:
# In marketplace simulations, we cannot just supply a set of rho params, then simulate and infer on these simulations
# to test if the inference can recover the initially provided params
# So we instead sample from the posterior of a separate set of marketplace simulations not used in training and see
# if parameters are recovered on this new set
def sample_posterior_on_simulations(
    device: str,
    num_samples: int,
    simulator_type: str,
    simulation_type: str,
    max_inference_length: int,
) -> np.ndarray:
    # We load the larger simulation (over 64 marketplaces) as the separate simulation for the inference to be tested on
    params = {
        "review_prior": np.ones(5),
        "tendency_to_rate": 0.05,
        "simulation_type": simulation_type,
        "previous_rating_measure": "mode",
        "min_reviews_for_herding": 5,
        "num_products": 1400,
        # "num_products": 100,
        "num_total_marketplace_reviews": 300_000,
        # "num_total_marketplace_reviews": 5_000,
        "consideration_set_size": 5,
    }
    simulator = marketplace_simulator_class.MarketplaceSimulator(params)
    simulator.load_simulator(ARTIFACT_PATH / "large_simulation")
    # We pick all simulations from a single marketplace as the observations on which we wish to obtain
    # posterior samples
    # These are the observations for the posterior sampling function defined above
    observations = simulator.simulations[0]
    # Also pick the simulation parameters corresponding to these simulations
    simulation_params = simulator.simulation_parameters.copy()
    simulation_params["rho"] = simulation_params["rho"][: len(observations), :]
    simulation_params["h_p"] = simulation_params["h_p"][: len(observations)]
    # Cut the observations to the max length seen during SNPE training
    observations = np.array(
        [obs[:max_inference_length, :] for obs in observations], dtype="object"
    )
    posterior_samples = sample_posterior_with_observed(
        device, observations, num_samples, simulator_type, simulation_type
    )
    return posterior_samples, simulation_params

#### Main function - histogram (Include?)

In [7]:
#def infer_and_save_posterior(
#    device: str, simulator_type: str, simulation_type: str, params: Dict
#) -> None:
#    parameter_prior = sbi_utils.BoxUniform(
#        low=torch.tensor([0.0, 0.0, 0.0, 0.0, 0.5, 0.25, 0.25, 0.5]).type(torch.FloatTensor),
#        high=torch.tensor([4.0, 4.0, 1.0, 1.0, 1.0, 0.75, 0.75, 1.0]).type(torch.FloatTensor),
#        device=device,
#    )
#    inferrer = inference_class.HistogramInference(
#        parameter_prior=parameter_prior, device=device
#    )
#    inferrer.load_simulator(
#        dirname=ARTIFACT_PATH,
#        simulator_type=simulator_type,
#        simulation_type=simulation_type,
#    )
#    batch_size = params.pop("batch_size")
#    learning_rate = params.pop("learning_rate")
#    hidden_features = params.pop("hidden_features")
#    num_transforms = params.pop("num_transforms")
#    inferrer.infer_snpe_posterior(
#        embedding_net_conf=params,
#        batch_size=batch_size,
#        learning_rate=learning_rate,
#        hidden_features=hidden_features,
#        num_transforms=num_transforms,
#    )
#    inferrer.save_inference(ARTIFACT_PATH)

<IPython.core.display.Javascript object>

In [9]:
#infer_and_save_posterior("cuda", "marketplace", "histogram", inference_params)

  "GPU was selected as a device for training the neural network. "


TypeError: can't convert np.ndarray of type numpy.object_. The only supported types are: float64, float32, float16, complex64, complex128, int64, int32, int16, int8, uint8, and bool.

<IPython.core.display.Javascript object>