# Likelihood-Free Inference - Data Generation

Ali Al Kadhim and Harrison B. Prosper<br>
Department of Physics, Florida State University<br>
Date: 22 April 2022

## Description

In this notebook we generate data comprising the triplets $(Z_i, \theta_i, D_i)$ where 

$
\begin{align}
\theta & \sim \textrm{uniform}(0, 20), \\
N & \sim \textrm{poisson}(\theta),\\
D & \sim \textrm{randint}(0, 10), \textrm{ and } \\
Z & = I[ n \leq D ],
\end{align}
$

where $I$ is the indicator function. 

These data are used in __LFI_train.ipynb__ to fit a model that approximates $E(Z | \theta, D)$, which can be used to compute upper limits, with exact coverage, for the Poisson parameter $\theta$.

In [1]:
import os, sys

# the standard module for array manipulation
import numpy as np

# the standard module for tabular data
import pandas as pd

# standard scientific python module
import scipy as sp
import scipy.stats as st

#  a function to save results
import joblib as jb

### Generate data

$
\begin{align}
\theta & \sim \textrm{uniform}(0, 20), \\
N & \sim \textrm{poisson}(\theta),\\
D & \sim \textrm{randint}(0, 10), \textrm{ and } \\
Z & = I[ N \leq D ],
\end{align}
$

In [2]:
Ndata    = 510000
thetaMin =  0
thetaMax = 20
Dmin     =  0
Dmax     = 10

filename = 'data.db'
print(filename)

theta = st.uniform.rvs(thetaMin, thetaMax, size=Ndata)
N     = st.poisson.rvs(theta)
D     = st.randint.rvs(Dmin, Dmax, size=Ndata)
Z     = (N <= D).astype(np.int32)

# save in a pandas dataframe
data = pd.DataFrame({'Z': Z, 'theta': theta, 'D': D})
jb.dump(data, filename)

data.db


['data.db']