# Table of Contents

* [Probability and Linear Algebra](#Probability-and-Linear-Algebra)
    * [Random Variables](#Random-Variables)
    * [Axioms/Theorems of Probability/Set Theory](#Axioms/Theorems-of-Probability/Set-Theory)
    * [Discrete Random Variables](#Discrete-Random-Variables)
    * [Continuous Random Variables](#Continuous-Random-Variables)
    * [Expectation and Variance](#Expectation-and-Variance)
    * [Conditional Probability and Bayes Theorem](#Conditional-Probability-and-Bayes-Theorem)
    * [Vectors and Matrices](#Vectors-and-Matrices)
    * [Matrix Products](#Matrix-Products)
    * [Matrix Properties](#Matrix-Properties)
* [Linear Regression](#Linear-Regression)

### Probability and Linear Algebra

#### Random Variables

A random variable $A$ represents an event that can take place.

*Example*

* $A$ = I have a headache
* $A$ = Sally will be president in 2020

$P(A)$ is the probability $A$ will be true.

#### Axioms/Theorems of Probability/Set Theory
$P(A \lor B) = P(A) + P(B) - P(A \land B)$

$P(A \lor B) \leq P(A) + P(B)$

$P(\lnot A) = 1 - P(A)$

$P(A) = P(A \land B) + P(A \land \lnot B)$

#### Discrete Random Variables

Discrete random variables (DRV) take on a finite number of values. Uniform random variables are discrete random variables in which each possibility has an equal probability.

Bernouli random variables are DRV where there are 2 possibilities.

Binomial random variables are DRV where we want to find the probability that a Bernouli random variable comes up k times. To find this, use the following formula (assume p is the probability of the positive case):

${n\choose k}p^{k}(1 - p)^{n - k}$

#### Continuous Random Variables

$X$ is a continuous random variable (CRV) if $X$ can take on an infinite number of values.

The **cumulative distribution function CDF** $F$ for $X$ is defined for every value $x$ by:

$F(x) = Pr(X \leq x)$

The **probability distribution function PDF** $f(x)$ for $X$ is

$f(x) = \frac{dF(x)}{dx}$

Think of PDF as probability at a point, and CDF of probability that variable is at least that point.

#### Expectation and Variance

Expectation: The weighted average value for a random variable.

*Properties*:
* $E[ag(X)] = aE[g(X)] \text{ (a is constant)}$
* $E[f(X) + g(X)] = E[f(X)] + E[g(X)]$

Variance: The average value of the square distance from the mean value. Can be calculated as follows:

$E[X^{2}] - E[X]^{2}$

Here is a good video for finding these values for CRV: https://youtu.be/Ro7dayHU5DQ

#### Conditional Probability and Bayes Theorem


### $P(A|B) = \frac{P(B|A)P(A)}{P(B)}$

#### Vectors and Matrices

Vectors are ordered sets of numbers. Row vectors are of dimensions $n\times 1$, column vectors are of dimensions $1\times n$

Norms are a measure of the "length" of a vector.

* L1 norm: $||x||_{1} = \sum_{n}^{i=1}|x_{i}|$
* L2 norm: $||x||_{2} = \sqrt{\sum_{n}^{i=1}x_{i}^{2}}$

#### Matrix Products

Vector dot (inner) product: let $r$ be a row vector, $c$ be a column vector $rc = \sum_{n}^{i=1}r_{i}c_{i}$

In order to multiply matrices, their dimensions must match up: $A \in I\!R^{m\times n} B \in I\!R^{n\times p}$

#### Matrix Properties

Associative: $(AB)C = A(BC)$

Distributive: $A(B + C) = AB + AC$

**NOT** Commutative: $AB \neq BA$

Transpose: Think of it as flipping a matrix. A row vector transpose is a column vector. $m\times n$ &rarr; $n \times m$

### Linear Regression

