# Probability

**Introduction**

* probability $P(A) = \frac{|events_{A}|}{|events_{all}|}$
* probability of join events $P(A \cup B) = P(A) + P(B) - P(A \cap B)$
* probability of independent events $P(A \cap B) = P(A) P(B)$
* Birthday problem - chaining probabilities of not having same birthday (numerator desc)
* probability of dependent events $P(A \cap B) = P(A) P(B|A)$

**Conditional probability applications**


* Bayes theorem - $P(A|B) = \frac{P(B|A) P(A)}{P(B)}$
* Prior - original probability $P(A)$
* Posterior - probability after an event $E$, $P(A|E)$, better estimation than the prior!
* Naive Bayes - assumptions of multiple events $E$ are independent, so we can just multiply $P(A|E)$ estimations

# Probability distributions

* random variables allow us to model the whole experiment at once

### Discrete variables

Binomial distribution

* discrete distribution, probability mass function (PMF) - histogram of possible results of an event
* based on binomial coefficient $n\choose k$, which counts all the combinations for landing $k$ heads in $n$ coint tosses and probability rules

Example:  
* coin toss > $P(H) = p$
* event > $X = x: x$ heads in 5 tosses
* all possible orders > $5 \choose n$
* probability of seeing $x$ heads > $p^x$
* probability of seeing $5-x$ tails > $(1-p)^{5-p}$
* all together > $P(X) = {5\choose x} p^x (1-p)^{5-p}$
* $X$ follows a binomial distribution $X \sim Binomial(5, p)$

Generalized form:  
* general PMF for $X$: number of ehads in $n$ coin tosses
* $P(H) = p$
* Event $X=x:x$ heads in $n$ tosses
* $p_x(x) = {n\choose k}p^x(1-p)^{n-x}, x=0,1,2...,n$

Binomial coefficient
* choosing from set of options, while a pick reduces a number of available options for the next pick $n!$
* compensating repetition of unordered sets by $(n-k)!k!$

Bernoulli distribution
* probability of successful event $p$
* $X \sim Bernoulli(p)$

### Continuous variables

* we are interested in interval, not a discrete event

PDF
* $P(a<X<b) = AUC f_X(x)$
* rate of accumulation of probability around each point,
* defined for continuous vars (defined on R)
* positive, sum of the AUC is 1, probability at any point is 0

CDF
* CDF allows for simple calculation of the AUCs through differencing


Normal distribution
* bell shaped PDF, ie PMF for coin toss of $n$ tries with $n$ being very large,
* $f_x(x) = \frac{1}{\sigma\sqrt{2\pi}}e^{-\frac{1}{2}(\frac{x-\mu}{\sigma})^2} \rightarrow X \sim N(\mu,\sigma^2)$
* z-score for transforming any normal distribution do standard one

Chi-squared distribution
* used to model noise in communication, based on normal distribution
* $W = Z^2$
* $F_W(w) = P(W\leq w) = P(Z^2\leq w) = \\ P(|Z|\leq \sqrt w) = P(-\sqrt w \leq Z \leq \sqrt w)$ (CDF)
* $ f_W(w) = F'_W(w)$ (PDF)
* noise accumulation over $k$ transmitions $W_k = \sum_{i=1}^{k}{Z^2_i}$ (k degrees of freedom, CDF flattening with more df)

Sampling from a distribution
* obtaining synthetical data,
* sampling can be done directly with CDF (uniform sampling on y ax, reading results from x ax)