### Random variables

- Continuous random variables: infinitely many possible values.
	- Probability Density Function.

- Discrete random variables: finite set of possible values.
	- Probability Mass Function.

### Simulation basics

- Framework for modeling real-world events
	- Characterized by repeated random sampling.
	- Gives us an approximate solution.
	- Can help prove complex problems.

#### Simulation steps

1. Define possible outcomes for random variables.
2. Assign probabilities.
3. Define relationships between random variables.
4. Get multiple outcomes by repeated random sampling.
5. Analyze sample outcomes.


### Probability basics

- Sample space $S$: Set of all possible outcomes.
- Probability $P(A)$: Likelihood of event $A$.
	- $0 \leq P(A) \leq 1$
	- $P(S) = 1$
- For **mutually exclusive events**:
	- $P(A \cap B) = 0$
	- $P(A \cup B) = P(A) + P(B)$
- Probability:
	- $P(A \cup B) = P(A) + P(B) - P(A \cap B)$

#### Steps for estimating probability

1. Construct sample space or population.
2. Determine how to simulate one outcome.
3. Determine rule for succes.
4. Sample repeatedly and count succeses
5. Calculate frequency of successes as an estimate of probability.

#### Conditional Probability

$P(A | B) = \displaystyle\frac{P(A \cap B)}{P(B)}$

and 

$P(B | A) = \displaystyle\frac{P(B \cap A)}{P(A)}$

given that,

$P(A \cap B) = P(B \cap A)$

- If $P(A) \neq 0$ and $P(B) \neq 0$:
	- **Bayes' rule**: $P(A|B) = \displaystyle\frac{P(B|A)\ P(A)}{P(B)}$

#### Independent events

- $P(A \cap B) = P(A) \cdot P(B)$
- Conditional probability: $P(A|B) = \displaystyle\frac{P(A \cap B)}{P(B)} = \displaystyle\frac{P(A)\ P(B)}{P(B)} = P(A)$

#### Marginal probability

- $P(A) = P(A \cap B) + P(A \cap \neg B)$


### eCommerce Simulation

#### Funnel

- Ad Impression -> Click -> Signup -> Purchase

#### Signup flow

- Ad Impression: $\lambda \sim \textrm{Normal RV}$ -> Poisson RV
- Click: Clickthrough rate -> Binomial RV
- Signup: Signup rate -> Binomial RV

#### Purchase flow

- Signup: Signup rate -> Binomial RV
- Purchase: Purchase rate -> Binomial RV
- Purchase value: Avg Purchase Value -> Exponential RV

### Resampling methods

#### Why resample?

- Advantages:
	- simple implementation procedure.
	- applicable to comlex estimators.
	- no strict assumptions.
- Drawbacks:
	- computationally expensive.

#### Types of resampling methods

- Bootstrapping: sampling with replacement
- Jackknife: leave out one or more data points
- Permutation testing: label switching

#### Bootstrapping

- Run at least 5-10k iterations
- Expect an approximate answer
- Consedir bias correction

#### Jackknife

- Jackknife estimate:

$ \hat{\theta}_{\textrm{jackknife}} = \displaystyle\frac{1}{n} \displaystyle\sum^{n}_{i=1} \hat{\theta}_{i} $

$\hat{\theta}_{\textrm{jackknife}}$: Jackknife estimate.

$\hat{\theta}_{\textrm{i}}$: estimate for each Jackknife sample.

- Variance of Jackknife estimate

$ Var(\hat{\theta}_{\textrm{jackknife}}) = \displaystyle\frac{n-1}{n} \sum \big( \hat{\theta}_{\textrm{i}} - \hat{\theta}_{\textrm{jackknife}} \big)^2 $



#### Permutation testing

1. Determine test statistic
2. Pool observations and gnerate a new data set for every possible permutation of labels
3. Calculate the difference in means for each data set
4. Check to see where the test statistic falls in the distribution of differences in means
	- If the difference is in the confidence interval, tehre is no real difference between groups.

- Advantages:
	- Very flexible
	- No strict assumptions
	- Widely applicable
- Drawbacks
	- Computationally expensive
	- Custom coding required

### Monte Carlo integration

- Calculate overall area
- Randomly sample points in the area
- Multiply the fraction of the points below the curve by overall area

### Power Analysis

- power = $P(\textrm{rejecting Null}|\textrm{true alternative})$
- Probability of detecting an effect if it exists
- Depends on sample size, $\alpha$ and effect size
