In [None]:
import tensorflow

We use the setup from Predjer et al 2010.

* $N$ workers in the platform through the period of the study, with different levels of arrival and departure
* We have $K$ time periods 
* We assume memoryless arrivals and departures, with $\lambda_k$ the arrivals at time $k$ and $\mu_k$ the departures at time $k$.
* Worker $i$ arrives (birth) at time $b_i$ and departs at time $d_i$ (both unobserved)
* Each worker has a capture history $C_i = [0 \ldots 0 1 \ldots 0 1 0 \ldots 0 0 0]$, where $H$ is a vector with $K$ binary values, and with $c_{ik}=1$ indicating that the worker $i$  appeared in the survey during the the $k$-th time period. 
* We use $u_i$ to denote the number of times that we captured the worker. 
* We use $f_i$ to denote the first time period that we see worker $i$, 
* We use $l_i$ to denote the last time period that we see worker $i$.
* We use $a_i$ for the propensity of worker $i$ to participate in the task. We assume that $a_i$ is distributed according to a Beta distribution $P(a_i) \propto B(\alpha, \beta)$.

The likelihood function for a worker that participated at least once is:

### $L_i = \sum_{s=1}^{f_i} \lambda_s \cdot \left[ \sum_{q=l_i}^K \left( \prod_{v=f_i}^{q-1} 1 - \mu_v \right) \cdot \mu_q \cdot R_i \right] $

where we use $s$ sum index to estimate the (unobserved) arrival time, and $q$ the unobserved departure time. (So the part above tries to estimate the likelihood of arrival and departure times.) Given the $s$ and $q$, we then define

### $R_i = \int_0^1 \left( \prod_{k=s}^{q} a_i^{c_{ik}} \cdot (1-a_i)^{1-c_{ik}} \right) dP(a_i) $

with the $R_i$ capturing the likelihood of observing the capture history of the worker with a propensity $a_i$, integrated over all possible propensity values.

By expanding $P(a_i) \propto B(\alpha, \beta)$, and using the same process as when deriving the Beta-Binomial distribution, we can simplify the above expression as: 

### $R_i = \prod_{k=s}^{q} \frac{B(c_{ik} + \alpha, 1-c_{ik} + \beta)}{B(\alpha, \beta)}$.

So, if the worker $i$ has been captured $u_i$ times, between their arrival at time $s$ and their departure at $q$, we have:

### $R_i = B(1 + \alpha, \beta)^{u_i} \cdot B(\alpha, 1 + \beta)^{q-s-u_i} \cdot B(\alpha, \beta)^{-(q-s)}$

and by using the properties of the Beta function $B(x+1,y) = B(x, y) \cdot \dfrac{x}{x+y}$ and $B(x,y+1) = B(x, y) \cdot \dfrac{y}{x+y}$, we simplify further:

### $R_i = \alpha^{u_i} \cdot \beta^{q-s-u_i} / (\alpha+\beta)^{q-s}$

### $R_i = \left(\frac{\alpha}{\beta}\right)^{u_i} \cdot \left( \frac{\beta}{\alpha+\beta} \right)^{q-s}$

So, the overall likelihood function for a worker with $u_i$ captures between times $f_i$ and $l_i$ becomes:

### $L_i = \left(\frac{\alpha}{\beta}\right)^{u_i} \sum_{s=1}^{f_i} \lambda_s \cdot \left[ \sum_{q=l_i}^K \left( \prod_{v=f_i}^{q-1} 1 - \mu_v \right) \cdot \mu_q \cdot  \left( \frac{\beta}{\alpha+\beta} \right)^{q-s} \right] $

and for the never observed workers:

### $L_0 = \sum_{s=1}^{K} \lambda_s \cdot \left[ \sum_{q=s}^K \left( \prod_{v=s}^{q-1} 1 - \mu_v \right) \cdot \mu_q \cdot  \left( \frac{\beta}{\alpha+\beta} \right)^{q-s} \right] $


In [None]:
# F, I-dimensional vector with first appearances of each worker (1..K)
# L, I-dimensional vector with first appearances of each worker (1..K)
# U, I-dimensional vector with the number of appearances of each worker (1..K)
# L, K-dimensional vector with the arrival rates for each time period 1..K
# M, K-dimensional vector with the departure rates for each time period 1..K
