# Generative Models for Discrete Data 
Book by: Susan Holmes and Wolfgang Huber

**probability model** - predicts the occutence of some event  <br>
HIV-genome size = $10^4$ <br>
Mutations per replication cycle is 5 and this follows Poisson distribution: <br>
$e^{-λ}λ^{x}/x!$ <br>
Poisson Distribution is used to provide information of rare evens (probability of succes is very small, but the number of trials is large)

How often we see 3 mutations when the predicted number of mutations after 
one cycle is 5

In [17]:
D  = dpois(x= 3 , lambda = 5) 
print(D)

[1] 0.1403739


This means that the probability of seeing **exactly** three mutations after one replication cycle is 14% 

In [18]:
genotype = c("AA", "AO", "BB", "AO", "OO", "AA", "BO", "BO", "AO", "BB",
             "AO", "BO", "AB", "OO", "AB", "BB", "AO", "AO", "AO")
table(genotype)

genotype
AA AB AO BB BO OO 
 2  2  7  3  3  2 

## A generative model for epitope detection:
**Epitopes** - molecular sites responsible of allergic reactions. <br>
"A spesific portion of a macromolecular antigen to which an antibody binds." 

**ELISA** Enzyme-linked immunosorbent assay is used to recognize specific epitopes. <br>
- the baseline noise per level (false positive rate) 1% <br>
- protein is tested at 100 different positions (independent)

In [58]:
data_patient = c(0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
                 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
                 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
                0,0,0,0,0,0,0,0,0,0)
length(data_patient)
1-ppois(6, 0.5)
ppois(6, 0.5, lower.tail = FALSE)

**Computing probabilities by simulation:** <br>
**Monte Carlo method** <br>
simulation based on generative model, to predict probabilities of even's of interest

In [62]:
# generating 100,000 instances of picking the the maximum from 100
# Poisson distributed numbers.
maxes = replicate(100000, {
    max(rpois(100, 0.5))
})
table(maxes)

maxes
    1     2     3     4     5     6     7     8     9 
   12 23367 60580 14286  1616   126    11     1     1 

In [65]:
mean(maxes >= 7) # probability of seeing a number 7 or higher 

## Multinomial distributions : the case of DNA <br>

Multinomial distributions - when there is more than two outcomes. Levels of discrete categorical variable. <br>
EXAMPLE : four nucleotides of DNA 

In [68]:
dmultinom(c(4,2,0,0), prob=rep(1/4,4))
pvec = rep(1/4, 4)
t(rmultinom(1, prob = pvec, size = 8))


0,1,2,3
5,1,1,1


0
1
1
0
6
