# Probability for Computational Biologists
> A comprehensive outline of prudent concepts in probability.

- toc: true 
- badges: true
- comments: true
- categories: [jupyter]


# Introduction

This will be a comprehensive outline of topics in probability, including random variables, distributions, and rules of probability. My hope is that this will serve as a picture of (almost) everything you need may need to know regarding probability.

Topics such as statistical and deep learning leverage these concepts as assumed knowledge. It is therefore vital to have at least working knowledge of what is to come.

# Counting



## Multiplication Rule

Given an experiment with many potential outcomes, we can simply multiply the number of outcomes at each step for the overall number of possible outcomes.

$n_{total} = \prod_i n_i$

## Sampling

Sampling $k$ items from a population of size $n$ results in the number of possibilities as follows:

- Replacement (place each of the $k$ samples back into the population after choosing!)
    - Order-Preserving (we count each unique order in which we select the $k$ samples)
        - $n^k$
    - Order Doesn't Matter (we only care about the class membership of the $k samples)
        - $ {n+k-1}\choose{k} $      

- No Replacement
    - Order-Preserving
        - $\frac{n!}{(n-k)!}$
        - A *permutations* problem.
    - Order Doesn't Matter
        - $n\choose k$ = $\frac{n!}{k!(n-k)!}$
        - A *combinations* problem; use the binomial coefficient

## Naïve Probability

- If all outcomes in a given event space are equally likely, $P_{naive}(X=x_i) = \frac{\text{outcomes favorable to } x_i}{\text{number of outcomes}}$

- This is intuitive. The probability of heads given a strictly fair coin (i.e., $X\sim Bernoulli(p=.50)$) over two trials is $\frac{1}{2}$.

# Conditional Probability

## Independence

### Independent Events

Two events $A,B$ are independent if the outcome of one event has no bearing on the other. In other words, knowing the outcome of $B$ gives no information about the potential outcome of $A$:

- For independent events $A,B$,
    - $P(A,B) = P(A)P(B)$
    - $P(A|B) = P(A)$
    - $P(B|A) = P(B)$

### Conditional Independence

Two events $A,B$ are *conditionally* independent given another event outcome $C$ if: $$P(A,B|C) = P(A|C)P(B|C)$$ That is, we can tease apart the probability of a given event if they share a relevant background variable.

As an example, take the problem of three genetic mutations, $i,j$ and $k$. Let $i$ and $j$ be tightly correlated in a dataset and assume the probability distributions of each event is known. At first glance, we might think we could never model the two mutations independently. We may even assume $i$ causes $j$ in some fashion, or vice versa.

However, if we discovered that $k$ was a definitely a mutation appearing in an upstream promoter region impacting both sites corresponding to $i$ and $j$, we could then show conditional independence between $i$ and $j$ given the mutation $k$. Suddenly, our assumptions change and we may be more inclined to target $k$ as an event outcome worthy of attention.

## Unions, Intersects, Complements

Set theory can be useful in the realm of probabilty. At the end of the day, we use probabilistic models to try and understand real events. This is the case with both discrete and continuous outcomes.

### De Morgan's Laws

De Morgan's Laws offer versatility in logical reasoning, proofs, set theory, and other areas of intrigue. One thing to note is that generally, we frequently use $AND$ and $OR$ operators in probability. Note the following:

- $\lnot(A \lor B)= \lnot A \land \lnot B $
- $\lnot(A \land B) = \lnot A \lor \lnot B $

## Joint, Marginal, and Conditional Probability


### Joint Probability

- $P(A,B)$
    - Note, $P(A,B) = P(A)P(B|A)$
    - We can tease apart the distributions by conditioning on the portion of the event space occupied by $B$ where $A$ also has a bearing.
    
    - Note also how $P(A,B)=P(B,A) = P(B)P(A|B)$. While consistency is important, this is just a matter of our choice of labels on the events.

### Marginal (Unconditional / Prior) Probability
- $P(A)$

### Conditional Probability

- $P(A|B) = \frac{P(A,B)}{P(B)}$
- *The Probability of A given B is equal to the Probability of A and B over the (prior) Probability of B*

- Note: We can easily see how Bayes' Rule follows. Given that we can "flip" the order of the joint probability expression, what is the right side equivalent to?
    - $P(A|B) = \frac{P(A,B)}{P(B)} \to P(A|B)P(B) = P(A,B) = P(B|A)P(A)$
    - $\implies P(A,B) = \frac{P(B|A)P(A)}{P(B)}$, or Bayes' Rule!


### Conditional Probability *is* Probability

- $P(A|B)$ is a probability function like any other for a fixed $B$. Any theorem applicable to probability is relevant for conditional probability.

## Chain Rule for Probability

- Note we can disentangle a joint probability by use of the "chain" rule, an extension of operations on two-event probabilities.
- $P(A,B,C) = P(A|B,C)P(B,C) = P(A|B,C)P(B|C)P(C)$.
- This is exaclty the same as calling the joint event space $B,C = D$ and writing $P(A,D) = P(A|D)P(D)$.

## Law of Total Probability

For an event $A$ and disjoint sample partitions $B_1,...,B_n$, we can always marginalize out irrelevant event spaces.

- $P(A) = P(A|B_1)P(B_1) + ... + P(A|B_n)P(P_n)$

## Bayes' Rule

Combining the definitions of conditional probility $P(A|B) = \frac{P(A,B)}{P(B)}$ and joint probability $P(A,B) = P(A|B)P(B) = P(B|A)P(A)$, we can describe Bayes' Rule:

$$P(A|B) = \frac{P(B|A)P(A)}{P(B)}$$

For 3 events $A,B,C$, we can write $$P(A|B,C)  = \frac{P(A,B,C)}{P(B,C)} = \frac{P(B,C|A)P(A)}{P(B,C)}$$

We can also use the chain rule to our liking:

$$P(A|B,C) = \frac{P(A,B,C)}{P(B,C)} = \frac{P(A|B,C)P(B|C)P(C)}{P(B|C)P(C)} = P(A|B,C)$$

Note that it may be useful to commute these terms depending on the circumstance.

# Random Variables and their Distributions

A random variable can take on a number of values according to a mathematical function. This may be thought of as the probability of a given outcome of an experiment in a global sense.

## Probability Mass Functions (PMF) and Probability Distribution Functions (PDF)

## Cumulative Distribution Functions (CDF)

## Survival Functions

## Independence of Random Variables

# Expected Values and Indicators 

## Expected Values and Linearity

### Expected Values

Mean, expectation, or first moment

### Linearity

### $\text{Distribution} \implies \text{Mean}$

### Conditional Expected Value

## Indicator Random Variables

### Indicator RVs

### Distribution of an Indicator RV 

### Fundamental Bridge of an Indicator RV

## Variance and Standard Deviation (w.r.t. Expectation)


# Continuous Random Variables, Law of the Unconscious Statistician (LOTUS), and the Universaility of Uniform (UoU)

## Continuous Random Variables

### Probability of a CRV in a Given Interval

### The Probability Density Function of a CRV

### Expected Values of CRVs versus DRVs

###  Law of the Unconscious Statistician (LOTUS)

### $g(RV_i) = RV_j$: The function of a random variable is itself a random variable.

- I.e., one need only know the PMF/PDF of $X$ to find the PMF/PDF of $g(x)$.

## Universality of Uniform / Probability Integral Transform

- Substitution of any $X_{cts}$ into its cumulative distribution function $F_X(x) = P(X\leq x)$ yields $U(0,1)$.
- Let $Y=F_X(X)$. Then, $F_Y(y) = P(Y \leq y) = P(F_X(X) \leq y) = P(X\leq F^{-1}(y)) = F_X(F_X^{-1}(y)) = y$ for $Y\sim U(0,1)$ and $X$ is some continous random variable with CDF $F_X$.

    - I.e. $F_X(X_{cts}) = \int_{-\infty}^{x}f(t)dt = \int_{-\infty}^{x}P(X\leq t)dt = X \sim U(0,1)$.

Now, in Python.

# Moments and Moment-Generating Functions

## Moments

## Moment-Generating Functions

# Joint Probability Density Functions (PDFs) and Comulative Distribution Functions (CDFs)


## Joint Distributions

## Conditional Distributions

### Bayes' Rule for Discrete RVs

### Bayes' Rule for Continuous RVs

### Marginal Distributions
- Discrete Case: Marginal PMF from Joint PMF
- Continous Case: Marginal PDF from Joint PDF

## Independence of Random Variables


## Multivariate LOTUS

#  Further Topics

We will now take a look at topics that are more practical and/or obscure, such as relevant distributions and Markov Models

# Covariance and Transformations

## Covariance and Correlation

## Covariance and Independence

## Covariance and Variance

## Properties of Covariance

## Correlation: Location and Space-Invariant

## Transformations

- Single Variable

- Two Variables


## Convolutions

## Convolution Integral

## Relevance to "Convolutional" Neural Networks

# Poisson Processes

# Law of Large Numbers

# Central Limit Theorem

# Markov Chains

### Markov Property

### States

### Transition Matrix

### Chain Properties

### Stationary DIstributions

# Some Continuous Distribitions

## Normal

## Exponential

## Gamma

## Beta

## Chi-Square

# Some Discrete Distribitions

## Sampling: Varying Number of Trials and Replacement

## Bernoulli

## Binomial

## Geometric

## First-Success

## Negative Binomial

## Hypergeometric

## Poisson

# Some Multivariate Distributions

## Multinomial

## Multivariate Uniform

## Multivariate Normal

## A Note on Mixture Models

### EM and Mixture Models

# In Case You Missed It: Special Cases of Distributions

- 

# Important Inequalities

## Cauchy-Shwarz

## Markov

## Chebyshev

## Jensen

# Background: Formulas

## Geometric Series

## Exponential Function

## Gamma and Beta Integrals

## Euler's Approximation for a Harmonic Sum

## Stirling's Approximation for Factorials