# Crash Course on Statistics and Hypothesis Testing
## Created by Jim Shepich
### 21 November 2021

# Table of Contents
- [Introduction](#Introduction)
  - [Example: Coin Flip](#ex-coinflip)
  - [Example: Die Roll](#ex-dieroll)
- [Random Variables](#variables)

# Introduction

We'll start out with some statistics terminology:

- **Experiment** - any procedure that can be repeated infinitely many times and has a well-defined set of outcomes
- **Sample Space** - the set of all possible outcomes of an experiment
- **Event** - a collection of outcomes

### Example: Coin Flip <a class="anchor" id="ex-coinflip"></a>

Let's say we're going to perform an **experiment** where we flip three coins. The **sample space** of this experiment is the set:

{TTT, TTH, THT, THH, HTT, HTH, HHT, HHH}

The **event** "there are more heads than tails" is the subset:

{THH, HTH, HHT, HHH}

The **event** "exactly two coins are tails" is the subset:

{TTH, THT, HTT}

### Example: Die Roll <a class="anchor" id="ex-dieroll"></a>

Let's say we do another **experiment** where we roll two six-sided dice. The **sample space** of the experiment is the set:

### {⚀⚀, ⚀⚁, ⚀⚂, ⚀⚃, ⚀⚄, ⚀⚅, ⚁⚀, ⚁⚁, ⚁⚂, ⚁⚃, ⚁⚄, ⚁⚅, ⚂⚀, ⚂⚁, ⚂⚂, ⚂⚃, ⚂⚄, ⚂⚅, ⚃⚀, ⚃⚁, ⚃⚂, ⚃⚃, ⚃⚄, ⚃⚅, ⚄⚀, ⚄⚁, ⚄⚂, ⚄⚃, ⚄⚄, ⚄⚅, ⚅⚀, ⚅⚁, ⚅⚂, ⚅⚃, ⚅⚄, ⚅⚅}

The **event** "both dice have the same number showing" is the subset:

### {⚀⚀, ⚁⚁, ⚂⚂, ⚃⚃, ⚄⚄, ⚅⚅}

And the **event** "the sum of the two numbers showing is equal to 5" is the subset:

### {⚀⚃, ⚃⚀, ⚁⚂, ⚂⚁}

# Probability

The **probability** of an outcome is a number between 0 and 1 that indicates how often that outcome should be observed when we perform the experiment. In our two examples (Coin Toss and Die Roll), each outcome has the same probability, but that is not true for every experiment. 

The **probability** of an event is equal to the sum of the probabilities of the outcomes in the event. Let's go back to our examples and look at some probabilities.

We can think of probability as a function that maps an outcome or event to a real number in the interval $[0,1]$. Common ways to denote the probability of an event $E$ are: $P(E)$, $Pr(E)$, or $\mathbb{P}(E)$.

## Coin Toss

As we said, each outcome has an equal probability. There are 8 outcomes in the sample space, so each outcome has a probability of $\frac{1}{8}$. 
- The event "there are more heads than tails" has four outcomes each with probability $\frac{1}{8}$, so the probability of that event is $\frac{4}{8}=\frac{1}{2}$.
- The event "exactly two coins are tails" has three outcomes each with probability $\frac{1}{8}$, so the probability of that event is $\frac{3}{8}$.

## Die Roll

As we said, each outcome has an equal probability. There are 36 outcomes in the sample space, so each outcome has a probability of $\frac{1}{36}$. 
- The event "both dice have the same number showing" has 6 outcomes each with probability $\frac{1}{36}$, so the probability of that event is $\frac{6}{36}=\frac{1}{6}$.
- The event "the sum of the two numbers showing is equal to 5" has 4 outcomes each with probability $\frac{1}{36}$, so the probability of that event is $\frac{4}{36}=\frac{1}{9}$.




# Random Variables <a class="anchor" id="variables"></a>

A **random variable** is a variable whose value is determined by the outcome of a statistical experiment. Some examples of random variables and their possible values:

- The number of coins that come up heads if three coins are flipped. Can be 0, 1, 2, or 3.
- The sum of the showing numbers of two rolled (six-sided) dice. Can be any whole number between 2 and 12.
- The number of students who show up to school on a given day. Can be as few as zero or many as all the students enrolled at the school.


Traditionally, random variables must be real-valued. A non-numeric analogue of a random variable is a **random quantity** (source: https://stats.stackexchange.com/questions/236765/does-a-random-variable-needs-to-be-numeric). An example of a random quantity is the UV index on a given day, which can take on values of low, moderate, high, and very high. Many random quantities can represented with random variables by mapping the outcomes to real numbers (i.e. replace low, moderate, high, very high with 0, 1, 2, 3).


## Probability of Random Variables

When you are studying a random variable, you will often want to know "what is the probability that this random variable, when measured is equal to some number?" If $X$ is our random variable and $k$ is the outcome in question, then we write the probability as $\mathbb{P}(X=k)$.

### Example: 