## Problem Notation


### Supervised Scenario
* Consider an input pattern $x \in \mathbb{X}$ observed with probability distribution $p(x)$ and a ground-truth label $z \in \mathbb{Z}$ observed with conditional probability distribution $p(z|x)$.
* Given a finite sample $S=\{\left(x^{(1)},z^{(1)}\right), \ldots,\left(x^{(N)},z^{(N)}\right)\}$, where $\left(x^{(i)},z^{(i)}\right) \sim p(x,z)=p(z|x)p(x) \, \ \forall i \in [N]$. 
* Objective: estimate a predictive model $f(x)$ that maps $x \rightarrow z$ or learn statistics of $p(z|x)$.

<img src="https://miro.medium.com/max/1204/0*qf-O7Jm1mmZrXYqA" width="50%" />
    


### Crowdsourcing scenario
* Same objective that supervised scenario, but the ground-truth labels $z^{(i)}$ corresponding to the input patterns $x^{(i)}$ are not directly observed. 
* Consider labels $y \in \mathbb{Z}$ that do not follow the ground-truth distribution $p(z|x)$. Instead, they are generated from an unknown process $p(y^{(\ell)}|x,z)$ that represents the annotator **ability** to detect the ground truth.

> #### Individual
* Consider multiple noise labels $\mathcal{L}_i = \{y_i^{(1)},\ldots, y_i^{(T_i)}\}$ given by $T_i$ annotators.
* These annotations come from a subset $\mathcal{A}_i$ of the set of all the annotators $\mathcal{A}$ participating in the labelling process. ($T = |\mathcal{A}|$ )
* The annotator identity could be define as a variable: $a_{i}^{(\ell)} \in \mathcal{A}$, con $\mathcal{A} = \{ 1, \ldots, T\}$, 
    * Then $p(y|x,z, a=\ell) = p(y^{(\ell)}|x,z)$
* Given a sample $\{(x_i, \mathcal{L}_i )\}_{i=1}^N$ or $\{(x_i, (\mathcal{L}_i, \mathcal{A}_i) )\}_{i=1}^N$

> #### Global
* Consider that we do not known or do not care which annotators provided the labels: we know $|\mathcal{A}_i|$ but not $\mathcal{A}_i$
* Consider the number of times that all the annotators gives each possible labels: $r_{ij} \in \{0,1,\ldots,T_i\}$
* Given a sample $\{ (x_i,r_i) \}_{i=1}^N$.




#### Focus
In this implementation, we study the pattern recognition case, that is, we let $\mathbb{Z}$ be a small set of $K$ categories or classes $\{c_1,c_2,\ldots,c_K\}$. 

---

One also can define two scenarios based on the annotation density and assumptions:
* **Dense**: 
    * All the annotators labels each data: $\mathcal{A}_i = \mathcal{A}$
    * The implementation is simpler since fixed size matrices are assumed.
* **Sparse**: 
    * The number of labels collected by data point and annotator varies: $|\mathcal{A}_i| \neq |\mathcal{A}_j| < |\mathcal{A}| = T$
    * An appropiate implementation lead to computational efficiency.

