```
From: https://github.com/ksatola
Version: 0.0.1

TODOs
1. Distributions: 
    - https://mathworld.wolfram.com/PoissonDistribution.html
    - https://mathworld.wolfram.com/BinomialDistribution.html
    - https://mathworld.wolfram.com/UniformDistribution.html

```

# Statistics - Basics

## Correlation vs. Regression
Correlation and regression are far from the same concept. So, let’s see what the relationship is between correlation analysis and regression analysis.

There is a single expression that sums it up nicely: **correlation does not imply causation**!

### The Relationship between Variables
**Correlation** measures the degree of relationship between two variables. **Regression analysis** is about how one variable affects another or what changes it triggers in the other.

### Causality
**Correlation** doesn’t capture causality but the degree of interrelation between the two variables. **Regression** is based on causality. It shows no degree of connection, but cause and effect.

### Are X and Y Interchangeable?
A property of **correlation** is that the correlation between x and y is the same as between y and x. You can easily spot that from the formula, which is symmetrical. **Regressions** of y on x and x on y yield different results. Think about income and education. Predicting income, based on education makes sense, but the opposite does not.

### Graphical Representation of Correlation and Regression Analysis
The two methods have a very different graphical representation. **Linear regression analysis** is known for the best fitting line that goes through the data points and minimizes the distance between them. Whereas, **correlation** is a single point.

<img src="images/stats_correlation_and_regression.jpg" alt="corr" style="width: 600px;"/>

## Probability Distribution

When we use the term normal distribution in statistics, we usually mean a `probability distribution`. Good examples are the Normal distribution, the Binomial distribution, and the Uniform distribution. **A distribution in statistics is a function that shows the possible values for a variable and how often they occur**.

<img src="images/stats_probability_distributions.png" alt="" style="width: 600px;"/>

The distribution of an event consists not only of the input values that can be observed, but is made up of all possible values.

<img src="images/stats_probability_rolling_a_die.png" alt="" style="width: 600px;"/>

So, the distribution of the event – rolling a die – will be given by the following table. The probability of getting one is 0.17, the probability of getting 2 is 0.17, and so on… you are sure that you have exhausted all possible values when the sum of probabilities is equal to 1 or 100%. For all other values, the probability of occurrence is 0.

<img src="images/stats_probability_rolling_a_die2.png" alt="" style="width: 600px;"/>

Each probability distribution is associated with a graph describing the likelihood of occurrence of every event. Here’s the graph for our example. This type of distribution is called a `uniform distribution`.

<img src="images/stats_probability_rolling_a_die3.png" alt="" style="width: 600px;"/>

It is crucial to understand that the distribution in statistics is defined by the underlying probabilities and not the graph. The graph is just a visual representation. 

Now think about rolling two dice. What are the possibilities? One and one, two and one, one and two, and so on. 

<img src="images/stats_probability_rolling_a_die4.png" alt="" style="width: 600px;"/>

Here’s a table with all the possible combinations. 

<img src="images/stats_probability_rolling_a_die5.png" alt="" style="width: 600px;"/>

We are interested in the sum of the two dice. So, what’s the probability of getting a sum of 1? It’s 0, as this event is impossible. What’s the probability of getting a sum of 2? There is only one combination that would give us a sum of 2 – when both dice are equal to 1. So, 1 out of 36 total outcomes, or 0.03. Similarly, the probability of getting a sum of 3 is given by the number of combinations that give a sum of three divided by 36. Therefore, 2 divided by 36, or 0.06. We continue this way until we have the full probability distribution.

Let’s see the graph associated with it. So, looking at it we understand that when rolling two dice, the probability of getting a 7 is the highest. We can also compare different outcomes such as: the probability of getting a 10 and the probability of getting a 5. It’s evident that it’s less likely that we’ll get a 10.

<img src="images/stats_probability_rolling_a_die6.png" alt="" style="width: 600px;"/>

### The Normal Distribution
The normal distribution is essential when it comes to statistics. Not only does it approximate a wide variety of variables, but decisions based on its insights have a great track record. Also, distributions of sample means with large enough sample sizes could be approximated to normal (even the original distributions from which the samples were drawn are not normal).

The statistical term for it is **Gaussian distribution**. Though, many people call it the **Bell Curve**, as it is shaped like a bell. It is symmetrical and its mean, median and mode are equal. It has no skew. It is perfectly centred around its mean.

<img src="images/stats_normal_distribution.jpg" alt="" style="width: 600px;"/>

On the plane, you can notice that the highest point is located at the mean. This is because it coincides with the mode. The spread of the graph is determined by the standard deviation, as it is shown below.

<img src="images/stats_normal_distribution2.jpg" alt="" style="width: 600px;"/>

## Hypothesis Testing

There are four hypothesis testing steps in data-driven decision-making:
1. Formulate a hypothesis.
2. Find the right test for your hypothesis.
3. Execute the test.
4. Make a decision based on the result.

### A Hypothesis
A hypothesis is an idea that can be tested (compared with something else).

So, if I tell you that apples in New York are expensive, this is an idea, or a statement, but is not testable, until I have something to compare it with. For instance, if I define expensive as: any price higher than $1.75 dollars per pound, then it immediately becomes a hypothesis.

### An Example

#### Two-sided or а two-tailed test

Here’s a simple topic that can be tested.

According to Glassdoor (the popular salary information website), the mean data scientist salary in the US is 113,000 dollars. So, we want to test if their estimate is correct.

There are two hypotheses that are made: the null hypothesis, denoted H zero, and the alternative hypothesis, denoted H one or H A. The null hypothesis is the one to be tested and the alternative is everything else. In our example,

The null hypothesis would be: The mean data scientist salary is 113,000 dollars,

While the alternative: The mean data scientist salary is not 113,000 dollars.

Now, you would want to check if 113,000 is close enough to the true mean, predicted by our sample. In case it is, you would accept the null hypothesis. Otherwise, you would reject the null hypothesis.

The concept of the null hypothesis is similar to: innocent until proven guilty. We assume that the mean salary is 113,000 dollars and we try to prove otherwise.

#### One sided or one-tailed test

This was an example of a two-sided or а two-tailed test. You can also form one sided or one-tailed tests. Say your friend, Paul, told you that he thinks data scientists earn more than 125,000 dollars per year. You doubt him so you design a test to see who’s right.

The null hypothesis of this test would be: The mean data scientist salary is more than 125,000 dollars.

The alternative will cover everything else, thus: The mean data scientist salary is less than or equal to 125,000 dollars.

It is important to note that outcomes of tests refer to the population parameter rather than the sample statistic! As such, the result that we get is for the population.

Another crucial consideration is that, generally, the researcher is trying to reject the null hypothesis. Think about the null hypothesis as the status quo and the alternative as the change or innovation that challenges that status quo. In our example, Paul was representing the status quo, which we were challenging.

Alright. We showed you the four hypothesis testing steps.

## Type I and Type II Errors
In general, we can have two types of errors – `type I error` and `ype II error`.

**Type I error** is when you reject a true null hypothesis and is the more serious error. It is also called `a false positive`. The probability of making this error is `alpha` – the `level of significance`. Since you, the researcher, choose the alpha, the responsibility for making this error lies solely on you.

**Type II error** is when you accept a false null hypothesis. The probability of making this error is denoted by `beta`. Beta depends mainly on sample size and population variance. So, if your topic is difficult to test due to hard sampling or has high variability, it is more likely to make this type of error. As you can imagine, if the data set is hard to test, it is not your fault, so Type II error is considered a smaller problem.

We should also mention that the `probability of rejecting a false null hypothesis` is equal to 1 minus beta. This is the researcher’s goal – to reject a false null hypothesis. Therefore, 1 minus beta is called `the power of the test`. Generally, researchers increase the power of a test by increasing the sample size.

### An Example

You are in love with this girl from the other class, but are unsure if she likes you.

There are two errors you can make.
First, if she likes you back and you don’t invite her out, you are making the type I error.

The null hypothesis in this situation is: she likes you back. It turns out that she really did like you back. Unfortunately, you did not invite her out, because after testing the situation, you wrongly thought the null hypothesis was false. In other words, you made a type I error – you rejected a true null hypothesis and lost your chance. It is a very serious problem, because you could have been made for each other, but you didn’t even try.

Now imagine another situation. She doesn’t like you back, but you go and invite her out. The null hypothesis is still: she likes you back, but this time it is false. In reality she doesn’t really like you back, that is. However, after testing, you accept the null hypothesis and wrongly go and invite her out. She tells you she has a boyfriend that is much older, smarter and better at statistics than you and turns her back.

You made a type II error – accepted a false null hypothesis. However, it is no big deal, as you go back to your normal life without her and soon forget about this awkward situation.

# Statistics 1

## Mean, Median, Mode, Weighted Mean, Harmonic Mean

### Example 1
```
Provide the mean and median for this data set: 4, 6, 8, 10, 17
mean = 45/5 = 9
median = 8
```
### Example 2
```
Provide the mode for the dataset: 3,5,7,10,3,3,9,2,5,10,9.
3
```


## Min, Max, Range, Variance, Standard Deviation

## z-score
How many standard deviations our data point lies from the mean?

<img src="images/z-score.png" alt="" style="width: 200px;"/>

Simply put, a `z-score` (also called a `standard score`) gives you an idea of how far from the mean a data point is. But more technically it’s a measure of how many standard deviations below or above the population mean a raw score is.

A z-score can be placed on a normal distribution curve. Z-scores range from -3 standard deviations (which would fall to the far left of the normal distribution curve) up to +3 standard deviations (which would fall to the far right of the normal distribution curve). In order to use a z-score, you need to know the mean μ and also the population standard deviation σ.

Z-scores are a way to compare results to a “normal” population. Results from tests or surveys have thousands of possible results and units; those results can often seem meaningless. For example, knowing that someone’s weight is 150 pounds might be good information, but if you want to compare it to the “average” person’s weight, looking at a vast table of data can be overwhelming (especially if some weights are recorded in kilograms). A z-score can tell you where that person’s weight is compared to the average population’s mean weight.

see: https://www.statisticshowto.com/probability-and-statistics/z-score/

z-score, standard score example:

<img src="images/z-score-iq-example.png" alt="" style="width: 400px;"/>


<img src="images/z-score-empirical-rule.png" alt="" style="width: 400px;"/>

### Example 1
```
For a certain data set we have a mean of 100, a median of 95, and a standard deviation of 25. What is the z-score for the data point 138?
Answer: z-score = (138-100)/25 = 1.52
```

## Empirical Rule ( 68-95-99.7) - Three Sigma Rule
When you use a standard normal distribution (aka Gaussian Distribution):

- About 68% of values fall within one standard deviation of the mean.
- About 95% of the values fall within two standard deviations from the mean.
- Almost all of the values—about 99.7%—fall within three standard deviations from the mean.

These facts are the `68 95 99.7 rule`. It is sometimes called the `Empirical Rul`e because the rule originally came from observations (empirical means “based on observation”).

The Normal/Gaussian distribution is the most common type of data distribution. All of the measurements are computed as distances from the mean and are reported in standard deviations.

The Gaussian curve is a symmetric distribution, so the middle 68.2% can be divided in two. Zero to 1 standard deviations from the mean has 34.1% of the data. The opposite side is the same (0 to -1 standard deviations). Together, this area adds up to about 68% of the data.

<img src="images/Empirical-rule-FINAL.jpg" alt="" style="width: 600px;"/>

## Percentile Rank

<img src="images/percentile_rank.png" alt="" style="width: 600px;"/>

Example of PR calculation for score of 85

<img src="images/percentile_rank_85.png" alt="" style="width: 600px;"/>

<img src="images/PR_and_NCE.gif" alt="" style="width: 600px;"/>



See: https://en.wikipedia.org/wiki/Percentile_rank

### Example 1
```
Six students earn the following test scores: 60, 70, 80, 90, 95, 100. The student that scored 95 is in the _____ percentile.
Answer: (4 + 0.5) / 6 * 100 = 75
```

## Probability
Always consider a type of probability you encounter and ask how it was calculated/infered to assess how reliable it might be
- **Objective** probabilities (based on calculations)
    - **Classical** (coin flip, roll a die) - we know all possible outcomes and they are equally likely to occur
        - How: `number of wins / all possible outcomes = % of probability`
    - **Empirical** (number of successful shots by a footbal player) 
        - possible outcomes are not equally likely to occur, each attempt is different and can be influenced by many factors. 
        - based on past data - we can only use historical data to infer, the more data we have, the more trust we can in the probability 
        - not perfect but can be done if we have some data in repeating situations
        - gives a nice idea of what to expect
        - How: `number of successful shots / all shots by the player = % of probability (ratio)`
- **Subjective** (based on experience)
    - Uses people's opinions and experience and perhaps some related data that influence the statement about probability
    - A guess
    
### Example 1
```
When people attend a fundraiser, 40% of them donate money. If three people attend this month’s fundraiser -- Jane, Kate, and Liza -- what are the exact chances that just one of them will donate money?
Answer:
Probability of donating Amount = 40/100 = 0.4
Probability of not donating Amount = 1 - 0.4 = 0.6
Three possibilities
- Jane Donates but Kate & liza do not donate
- Kate donate but jane & liza do not donate
- Liza donate but Jane & Kate do nt donate

p = (0.4)(0.6)(0.6) + (0.4)(0.6)(0.6) + (0.4)(0.6)(0.6) = 0.432 = 43.2%
```
### Example 2
```
On an exam, 60 of 100 people got a passing score. 50 of 100 people studied for the exam. 45 of the people that studied got a passing score. Studying for the exam and passing the exam are _____.
Answer: dependent events

```

## Permutations: the order of things
The number of ways in which objects can be arranged (order matters)
- `n!` (n factorial)
- for 5 objects: 5! = 5x4x3x2x1 = 120

How many permutation we have when selecting x objects out of n?
- `n! / (n-x)!`
- if we have 8 players, how many permutations we may have on the podium (3 places)?
- 8! / (8-3)! = 8x7x6 = 336

### Example 1
```
There are 5 total candidates available for 2 different jobs. How many permutations are there for those two jobs from the pool of 5 candidates?
Answer: 5! / (5-2)! = 5x4 = 20
```

## Combinations: permutations without regard for order
The number of ways in which objects can be chosen (order not important)
- `n! / [(n-x)! * x!]` where n is total number of objects and x is number of objects chosen at one time
- if we have 10 students in a class, how many combinations of 4 person team we could randomly choose?
- 10! / [(10-4)! * 4!] = 10x9x8x7 / 4x3x2x1 = 5040 / 24 = 210 possible teams of four

- what is the probability that 2 specific students (Tom and Kate) end up in the same team?
- how many other students can fill the last free spots (2 first sports are already taken by our 2 students, 10 were in total, so we have 8 other students left)
- 8! / [(8-2)! * 2!] = 28

- so, we have 210 outcomes and 28 desired outcomes
- the probability that 2 specific students end up in the same team is 28/210 = 14%

Eight adults are carpooling to an event. At random 2 will be chosen as the drivers for the rest of the group. How many combinations of 2 drivers are there among this group of 8?
- To solve you divide 8! by the product of 6!*2! This would be (8*7*6*5*4*3*2*1) / [(6*5*4*3*2*1)*(2*1)] This reduces to (8*7)/2 which equals 28.
- 8! / [(8-2)! * 2!] = 28

You have 10 employees chosen at random to be placed in a team of 4 people. You want Li and Raoul to be on the team. What will the formula 8!/[(8-2)! 2!] provide you?
- The number of combinations for the eight employees other than Li and Raoul to be on the team.
- When you calculate this, you will also know the probability of Li and Raoul being on the team.


## Random Experiment and Random Variable
`Random experiments` are opportunities to observe the outcome of a chance event. If we were rolling dice, the `random experiment` is observing and recording the outcome, which brings us to a random variable. A `random variable` is the numerical outcome of a random experiment. If we rolled a two and a three, our `random variable` would be five. 

## Random Variables
As the result of the outcome is unknown (random), we call the result from an experiment a random variable
- Discrete experimental results often characterised by whole numbers (no decimals are allowed (always whole numbers) and there is a limited number of possible outcomes)
- Continuous - there is infinite number of possible outcomes (there are endless of possibilities in terms of outcomes)

The distinction between the two is important because you will calculate probabilities differently in each type of situation.

### Discrete Probability Distribution

Probability distribution of drinks orders during a party:

<img src="images/probability_distribution_discrete.png" alt="" style="width: 400px;"/>

<img src="images/probability_distribution_discrete2.png" alt="" style="width: 400px;"/>

Relative frequency:

<img src="images/probability_distribution_discrete_relative_frequency.png" alt="" style="width: 500px;"/>

Mean of discrete probability distribution:

<img src="images/probability_distribution_discrete_relative_frequency_mean.png" alt="" style="width: 600px;"/>

An average consumer ordered 1.46 drinks during the party.

Standard deviation of discrete probability distribution:

<img src="images/probability_distribution_discrete_relative_frequency_std.png" alt="" style="width: 650px;"/>

3.02 (sum of squared weights) - 2.13 (mean squared) = 0.89 (variance) -> 0.94 (variance squared root -> standard deviation / sigma)

#### Expected Value
Total of the weighted payoffs associated with the decision. `Expected monetary value (EMV)` is a variation of the mean for a discrete probability distribution that includes subtracting the cost of the investment.
```
You hold a lottery ticket. There is a 10% chance you win $500, a 40% chance you win $25, and a 50% you win $0. What is the expected monetary value of your lottery ticket?
Answer: For each outcome multiply the possible winnings times the percentage, then add up all three products. (0.1*500)+(0.4*25)+(0.5*0) = $60
```

#### Binomial random variable
An experiment that has only two possible outcomes. With binomial random variables, you can use n, the number trial, and the chance of success represented as p to predict a result.

Binomial vs. Normal distribution

<img src="images/CompareBinomialAndNormalDistribution.png" alt="" style="width: 400px;"/>

### Example 1
```
10 customers ordered pizza. Five ordered 1 pizza, two ordered 2 pizzas, two ordered 3 pizzas, and one ordered 4 pizzas. What's the mean number of pizzas ordered for this discrete distribution?

    pizzas ordered    frequences    relative freq.    weights
        1                5                5/10=0.5        1*0.5=0.5
        2                2                2/10=0.2        2*0.2=0.4
        3                2                    0.2         3*0.2=0.6
        4                1                    0.1         4x0.1=0.4
        
                                                        Mean (sum of weights) = 1.9

```

### Continuous Probability Distribution

<img src="images/probability_distribution_continuous.png" alt="" style="width: 400px;"/>

Probability density curves can be used to show the distribution of outcomes. The area under the curve represents the probability of outcomes. The probability of A is X, and the probability of B is Y. In reality, the probability in a single point is equal to 0, so we usually check the probability over a ranges of random variables. The all area under the curve is equal to 1 (or 100%).

<img src="images/probability_distribution_continuous_density_distribution_curve.png" alt="" style="width: 400px;"/>

<img src="images/probability_distribution_continuous_density_distribution_curve2.png" alt="" style="width: 400px;"/>



### Normal distribution & Central Limit Theorem

The `"fuzzy" central limit theorem` says that data which are influenced by many small and unrelated random effects are approximately normally distributed.

<img src="images/probability_distribution_continuous_normal.png" alt="" style="width: 800px;"/>

## Z-transformations

### Example 1
What percentage of men weight more than 211 pounds? The weight is normally distributed with mean of 150 pounds and std of 25.

First, calculate the `z-score`

<img src="images/z-score-example.png" alt="" style="width: 600px;"/>

It would appear here far to the right

<img src="images/z-score-example1b.png" alt="" style="width: 600px;"/>

Then, find the percentage of men who would weight more than 211 pounds by using `standard normal distribution / z-score table` and find the probability value for z-score of 2.44

<img src="images/z-score-example1c.png" alt="" style="width: 600px;"/>

According to our mean and std 99.27% of all men weight 211 pounds or less, which means that the percentage of men that weight more than 211 pounds is 1-0.9927 = 0,0073 or 0,73%.

### Example 2
What is the probability a man weight between 140 and 170 pounds?

First, we need two z-scores: one for 140, another for 170. Next, check the standard normal distribution table (chart value) values for the scores. 

<img src="images/z-score-example2b.png" alt="" style="width: 600px;"/>

How to find a value for a negative z-score? Because the bell curve is symetrical, we can substract number found for the positive z-score from 1.0.

<img src="images/z-score-example2d.png" alt="" style="width: 400px;"/>

<img src="images/z-score-example2e.png" alt="" style="width: 400px;"/>

So, the result is:

<img src="images/z-score-example2f.png" alt="" style="width: 400px;"/>


### Example 3
Comparing distributions

<img src="images/example_ztransform.png" alt="" style="width: 600px;"/>

See: http://www.statistics4u.info/fundstat_eng/ee_ztransform.html

### Example 4
```
A student scores 1510 on a standardized test, for a z-score of 2.17. The z-score table shows that the z-score for 2.17 is 0.9850. Therefore, the probability someone scored >1510 on this test is 0.015 or 1.5%.
Answer: 0.9850 indicates that this student was equal to or greater than 98.50% of other student scores. Thus 1.5% are likely to score higher.
```

### Example 5
```
The average height for men is 5'9". Michael's height is 6'7". If the data is normally distributed and you calculated the Z-score, how can you find the percentage of men who are 6'7" or taller?
Answer: The Standard Distribution Table gives you the percentage of men shorter than 6'7", thus giving you the probability of being taller. The table associates Z-scores to percentages at or below the Z-score, which leaves the percentage of men taller than the mean.
```

# Statistics 2

## Inferential Statistics
Based on samples of data infer about whole population.

## Sampling
The challenge is getting the right answers, especially when the world, even your small slice of it is very big. 

Measuring everything is just way 
- too expensive, 
- too time consuming and 
- in some cases, it's just impossible. 

Political operatives can't poll every voter. Cell phone companies can't measure the quality level of every single item the produce. A farmer can't measure the actual size of every tomato grown. Scientists, they can't track the health of every single person in the country. Instead of measuring everything, they just measure a small group or subset of the total population. That small subset of measurements is a `sample`. And under the right circumstances, this `sample can act as a representative of the entire population`. The best samples are chosen at random.

### Random Sample
The most dependable type of data comes from what we call a `simple random sample`. This means that 
- the sample is chosen such that each individual in the population has the same probability of being chosen at any stage during the sampling process. 
- And each subset of k individuals has the same probability of be chosen for the sample as any other subset of k individuals.

The simple random sample can be rather elusive. Eliminating bias and maintaining data independence is quite challenging. As a result, `alternatives to the simple random sample` are sometimes utilized (but the simple random sample is still the only way to get dependable statistical outcomes). These alternative methods are simpler to organize, easier to carry out, and often, they seem both logical and sound:
- **Systematic sample** - Choose one unit and then every k unit thereafter. So if we're measuring customer satisfaction at a store, perhaps you might ask the first person to come out of the store for their opinions, then you might ask every tenth customer after them, for their opinion.
- **Sources of bias** - The sampling time and sampling location as well as the presence of a sampler, may introduce bias or inhibit independence.
- **Opportunity sample** - the sampler simply takes the first n number of units that come along.
- **Stratified sample** - Is one where the total population is broken up into homogeneous groups. Let's say, we're trying to figure out the average amount of sugar in a single cookie, regardless of the type of cookie. We could break up the population into so many different cookie types. Chocolate chip, peanut butter, oatmeal, sugar, ginger, snickerdoodle, oatmeal raisin. From there, we might take a sample of 30 cookies from each category. Perhaps chocolate chip cookies make up 50% of all cookies and ginger cookies make up only 3% of all the cookies. Our very fair-looking system might actually be biased against the most popular cookies.
- **Cluster sample** - is similar to stratified samples in that we are breaking things up into groups. What's the difference? In stratified groups, all the members of each group were the same. In clusters, the groups are likely to have a mix of characteristics. They're heterogeneous. Suppose we are testing a new product. We might ask for samples of people in 20 major cities, what they think about the new product. While the people in a single sample might all be from the same city, each sample might contain men and women, people of different races, politics and socio-economic backgrounds.

The `simple random sample` will always be the gold standard, but these `alternative sampling methods` should not be completely dismissed.

### Sample Size
A `sample` is a group of units drawn from a population, and the `sample size` is the number of units drawn and measured for that particular sample. The total population itself may be very large or, perhaps, immeasurable, so a `sample` is just looking at a slice of the population in the hopes of providing us a representative picture of the entire population. The larger the sample size, the more accurate our measurement or, at least, the more confidence we have that our sample is actually providing us a glimpse of the whole population. 

<img src="images/sample_size.png" alt="" style="width: 600px;"/>

And this is where sample size becomes important. Why? Well, let's take a look at the formula for standard deviation, where `n` is our sample size. `P` and the quantity `one minus p`, they won't change, but sample size will. So, `the bigger the sample size, the smaller our standard deviation`. 

<img src="images/sample_size4.png" alt="" style="width: 600px;"/>

For this sample, if n is equal to five, then one standard deviation would be 13.4% from 90% in either direction, which means that, with a sample size of five, we would expect 68% of all of our samples to have between 76.6% and 100% good forks, since we can't have more than 100% good forks in any sample. 

<img src="images/sample_size2.png" alt="" style="width: 600px;"/>

If n equals 25, one standard deviation would be 6%. At n equals 100, one standard deviation would be 3%. And at n equals 400, one standard deviation would be 1.5%, which means that, with a sample size of 400, we would expect at least 68% of all of our samples to have between 88.5% and 91.5% good forks. As you can see, a larger sample size can really make us feel so much more comfortable with our results. It gives us more confidence when we apply the sample results to our larger population.

<img src="images/sample_size3.png" alt="" style="width: 600px;"/>

### The central limit theorem
The central limit theorem. Let's start simple. A distribution of discrete numbers. We start on the left, where we have five values of five. We move right along our distribution. Two units of 10. Four units of 15. Six units of 20. And on the right of our distribution, we have three values of 25. 20 different readings in our entire population. If we average out the values of our 20 different readings, we get an average of 15.0. 

<img src="images/central_limit_theorem.png" alt="" style="width: 600px;"/>

Now suppose we didn't want to tally up all 20 values, but we still wanted to find the average of the data set. Could we use samples to direct us to the population mean? Let's try it. Let's take samples of four units every day. Here's our first sample. Sample one, 10, 15, 20, 25. Our sample mean for this sample is 17.5. Sample two, five, 15, 20, and 20. Our sample mean for this sample is 15. Sample three, five, five, 15, and 20. Our sample mean here is 11.25. We have three samples, thus we have three sample means, 17.5, 15, and 11.25. If we average those, we get the mean of our means, 14.58. Not too far off from our actual population mean of 15.0. 

<img src="images/central_limit_theorem2.png" alt="" style="width: 600px;"/>

`The central limit theorem tells us the more samples we take, the closer the means of our sample means will get to the population mean`. Actually, it's even more interesting, because as we start to take many more samples, dozens of samples, hundreds of samples, even thousands of samples, the sample means, if plotted as a histogram, our sampling distribution of our sample means would start to look like a normal distribution. 

<img src="images/central_limit_theorem3.png" alt="" style="width: 600px;"/>

`Not only does the central limit theorem work with our example with a tiny population and discrete values, it works with massive populations and continuous values`. So no matter if you're interested in learning about the average test scores of a small school, the average weight of watermelons grown in North America, or the political preferences of voters in the United States, the central limit theorem is there to provide you the guidance to understand the overall population with the assistance of some simple random samples.

### Standard Error (for proportions)
We've already begun to see the impact of sample size. In general, `the larger the sample size, the more confidence we have in our results`. Now let's shift our attention to the standard error. In short, `the standard error` is the standard deviation of our proportion distribution. 

Through an example, let's take a look at how we calculate the standard error and also what that calculated number would mean to us. In the cell phone industry, companies struggle to keep their clients happy. Suppose a reputable, national poll finds that 60% of adults are satisfied with their cell phone provider. Let's take that as our population proportion `p`=0.60. We'd like to see if cell phone service in our city reflects what is being seen nationally. In an attempt to measure this, we'll take simple random samples of 100 cell phone users in our city. Now, we know that 100 people can't possibly reflect the satisfaction levels of everyone in our city, therefor we can assume that each sample will carry with it some level of `standard error`. 

So, how big is this standard error? Well the answer to that question really depends on you guessed it: sample size. `The standard error is ultimately related to the standard deviation`, our formula for standard deviation is seen here, `p` is the proportion population, in this case 0.60 and `n` is sample size. In this case, 100.

<img src="images/standard_error_proportions.png" alt="" style="width: 600px;"/>

So we can see that for n=100, our standard deviation is approximately 0.05 or 5%. And remember, if we assume a normal distribution, 68% of all the samples taken should fall within one standard deviation of the population proportion. So in this situation, for simple random samples with 100 ratings, we would expect 68% of the samples to provide `sample proportions` or `p-hats` between 55% and 65%. 55% is our lower limit. 65% is our upper limit and 60% our population proportion that would be at the center. 

<img src="images/standard_error_proportions2.png" alt="" style="width: 600px;"/>

So if tomorrow we gathered a simple random sample of 100 cell phone customers and 57% of those customers were satisfied, we can say that our city was likely on par with the national proportion of 60% because we were within the 5% standard error. 

Then again, if we could afford to take simple random samples with sample sizes of 1000 cell phone customers. Notice what happens here, since n is now 1000. Our standard deviation drops to about 0.015 or 1.5%. So if n=1000, we would expect 68% of those larger samples to have p-hats between 58.5% and 61.5%. 

<img src="images/standard_error_proportions3.png" alt="" style="width: 600px;"/>

So let's recap, `the standard error in situations where we are looking at proportions is the standard deviation`. This is the formula for the standard deviation of a sample proportion, `p-hat`. The bigger our sample size, the smaller our standard deviation. This standard deviation is our standard error. `The standard error allows us to set up a range around the population proportion that extends the equivalent of one standard deviation in both the positive and negative direction`. The formula for our upper limit is p plus the standard deviation and for the lower the limit, p minus our standard deviation. Once the range is established, if we assume the probability distribution is nearly normal, then we would expect that 68% of the simple random samples gathered in the upcoming weeks, that they would fall within one standard deviation or the standard error. 

What happens when 68% of our samples are not falling within our calculated upper and lower limits? 
- Perhaps, it signals that something in our city is different form the overall nation. Customers, companies, or a combination of the two might create a unique environment in our city. 
- Perhaps there is a flaw in the reported national average of 60%, maybe their data gathering techniques were flawed. 
- Perhaps, the market has changed since that number was first reported or perhaps our sampling method was biased. 

Don't forget, while standard errors are there to help us judge and analyze future samples, samples that fall beyond the standard error, they should be analyzed not necessarily judged as failures.

### Sampling distribution of the mean
Suppose we know that the average player in a men's college basketball league weighs 180 pounds. Let's also say that the median player weighs about 190 pounds, so that means quite a few of the smaller players in the league are bringing down that average. This league has over 4,000 players. 

Would we have to weigh everyone of those 4,000 plus players to know the average weight of a player in the league? 

Well, if you remember, `the Central Limit Theorem tells us that by taking some simple random samples, we can get a very good approximation of the true population average`. 

If we take five random samples, with a sample size of only four, we might find that those five tiny samples will have sample means that average to perhaps 182 pounds. Now, if we take five random samples, but increase the sample size to 25, we would likely see the mean of the sample means closer to 180.5 pounds. 

<img src="images/sampling_distribution_of_the_mean.png" alt="" style="width: 600px;"/>

Only five simple random samples with sample sizes as low as four can get us very close to our true population mean of 180 pounds. Now, this basketball league has the capability of tracking all 4,000 plus players, so we knew that the average weight of the players in this league was actually 180 pounds, but hopefully the example has helped convince you that even when we have a massive population, the Central Limit Theorem tells us we can trust our simple random samples to point us in the direction of the true population mean. 

Let's say instead of 4,000 well-tracked male college basketball players, we wanted to know the average weight of 18 to 24 year old men in United States colleges. There are millions of young males in college, and these men are not tracked nearly as well as the basketball players, but that shouldn't be a concern. The Central Limit Theorem tells us that if we are diligent in collecting simple random samples, we should trust our sample means in this scenario, with a population of over three million students, to be as accurate as the example of five samples collected from the pool of 4,000 college basketball players. And it works the other way too. If you have a school of only 50 students, a very small population, you can approximate the population mean for those 50 students by simply taking a few simple random samples.

### Standard Error (for means)
Through the use of the central limit theorem, we've seen how taking just a few random samples can guide us in the direction of the population mean. Of course when we use only a few samples to try and figure out a population mean, we understand that `the average of our sample means comes with a standard error`. 

So how do we figure out the standard error for our simple random samples? Let's say we're trying to figure out how long it takes to get our coffee drinks from our local cafe between 7 a.m. and 8 a.m. on a Monday morning. We take samples on four different Mondays. Our sample size for each of these samples is five. Here is our data set for those days, times are in minutes. 

<img src="images/standard_error_means.png" alt="" style="width: 600px;"/>

So, for sample A you can see that the time it took our five customers to get coffee ranged from 0.6 minutes to 2.4 minutes. The average for sample A was 1.58 minutes. If we take the average of the sample means, we will find that the average time to get a coffee drink was about 1.52 minutes. And the standard deviation of those four sample means is 0.25 minutes. 

`The standard deviation of our sample means this is our estimated standard error`. What's interesting is that `there's a relationship between the standard error of our population means and the standard deviation of the population`. Take a look at this formula, Sigma Xbar is equal to Sigma over the square root of our sample size n. Sigma Xbar is our standard error. So it is the standard deviation of our four sample means. On the other side of the equation, we have another Signma, this Sigma is the standard deviation for the entire population.

<img src="images/standard_error_means2.png" alt="" style="width: 600px;"/>

So by plugging in our calculated standard error from our cafe example which was 0.25 minutes, and then plugging in 5 for n, our sample size. We can then solve for Sigma, our populations standard deviation. We can see that based on these four samples with a sample size of five, we can estimate the population's standard deviation is 0.56 minutes. Again, we can see how sample size can have a huge impact on working with samples to find out information about the entire population. In essence, what the formula tells us is that `if we use larger sample sizes, our standard error gets smaller`. This is also important as we collect samples in the future, why, well, when we collect a sample of drink service times from our cafe tomorrow and the rest of the week, we will know that 68% of our samples should have sample means that fall within 0.25 minutes of 1.52 minutes, our average. Upper limit would then be 1.77 minutes and our lower limit, 1.23 minutes. The standard error formula is very simple, but still very informative, by understanding the simple relationship between sample size, the standard deviation of our population and the stand deviation of our sample means, we can better understand our population as well as the samples we take going forward.

### Example 1
80% of customers pay with a debit or credit card. 25 customers are chosen at random each day. 68% of the samples would have p-hats between:
```
Sigma p-hat = sqrt((0.80*0.20)/25) = sqrt(0.0064). Next take the square root of 0.0064, this provides a standard dev. of 0.08 or 8% (standard error). The p-hats would be between both 80%-8% and 80%+8%.
```
### Example 2
Three of the choices are true of the Central Limit Theorem. Which is not?
- the larger the sample size, the taller and more narrow our distribution
- the central limit theorem works for both small and large population sizes
- the mean of our random sample means directs us to the population mean
- (wrong) the greater the size of our samples, the greater the standard deviation

### Example 3
A school has 70% female students. 25 random students win a prize each day. We would expect 68% of the daily pools of winners to have between _____ females.



## Confidence Intervals
At this point, you should hopefully feel comfortable with the concepts of sample size and the central limit theorem. `The central limit theorem tells us that if we take enough simple random samples, we can get an excellent approximation of our population means`. In other words, rather than measure everything in the population, we can take some random samples. Those random samples will provide us with the measurements that will be nearly normal in our distribution, and will direct us to the population mean. 

And you also, hopefully, remember that `the larger the sample size of those random samples, the smaller the standard deviation of our distributions`, so the more certain we are about our resulting population mean. Lots of samples make us feel confident about our population numbers.

In this section, in which we will cover `confidence intervals`, we're going to go in the opposite direction. `What happens when we have only one sample?` If we have only one sample, how confident are we that this single sample mean is near our actual population mean? 

In this section, you'll often see results that look like this. We are 95% confident that the average adult in the United States drinks between two and three liters of beverages per day. As you can see, one random sample will allow us to calculate our range and attach to it a level of confidence. Think about how incredibly powerful this is, the efficient use of resources, the overall savings, and the ability to be 95% confident, or perhaps even more confident than that. 

But before we move on, let's take a moment to discuss what a 95% confidence level means. It means that if, instead of taking a poll only once, we took a similar poll 20 times. 19 times, the results of the poll, in other words, the resulting ranges of those 19 polls, they would capture the population mean. But of course, one of the 20 times, the reported range would not include our population mean. Remember that the next time a pre-election poll predicts the wrong candidate to win. So, let's get started with that exact example. Let's see how they create confidence intervals for those pre-election polls.

### What exactly is a confidence interval?
Let's create a 95% confidence interval for an election poll where the voters have two choices: Candidate A and Candidate B. As you may have guessed, we'll be working with proportions. Before we start creating a 95% confidence interval for this scenario, let's recap a few things. 

- First, if we took a lot of voter samples, the distribution would be approximately normal. 
- Second, the larger the voter sample size, the smaller the variation, and thus, the smaller the standard deviation of the resulting distribution. 
- Third, a 95% interval is one where we are 95% certain that our interval which will be centered at the sample's proportion will contain the actual population proportion. 

Now what we're going to do is take a single sample. This single random sample might include 50 eligible voters, but the resulting sample proportion of eligible voters that favor Candidate A is a single number we can call `p-hat`. This single sample proportion is a single dot on this distribution. 

<img src="images/confidence_intervals.png" alt="" style="width: 600px;"/>

Now the question is, is it a dot that is close to the population's true proportion or is it a dot that is very far away from the true proportion. I also want you to remember we don't actually know the true population proportion. So while our single sample is here, we don't know if the true population proportion is here or here or here. So, what are we actually doing? Well, let's imagine that this is the true population proportion distribution. We're going to take a sample and build an interval around that sample. And we're going to hope that the true population proportion falls within that interval. Now, what we really want to create is an interval, an interval of a certain length we will call `y`. This interval will have a lower limit and an upper limit. And this interval will be centered on our sample proportion, `p-hat`. This interval would need to be big enough where if we took 20 samples, 19 of the 20 intervals would contain the true population proportion. In other words, 95% of my samples would have an interval within range of `p`. 

So let's recap. When we gather a sample with a certain sample proportion `p-hat` and when we create an interval around that p-hat, we are 95% certain that the real population proportion is somewhere between the lower and upper limit of that interval.

### 95% confidence intervals for population proportions
So there's an upcoming election for mayor of a large city between two candidates. Let's simply call the candidates Candidate A and Candidate B. You work for Candidate A. Candidate A's campaign team wants to know where Candidate A stands so they ask you to conduct a poll. You gather a random sample of 100 voters and ask if they will be voting for Candidate A or Candidate B. 55% of the 100 voters polled said they planned on voting for Candidate A. Anything over 50% in the real election would result in a win for your candidate. So far, based on the results of the poll, things look promising for your candidate but remember, this was just one sample with a sample size of 100. 

<img src="images/confidence_intervals2.png" alt="" style="width: 600px;"/>

Now, look, I understand that my small pre-election poll likely didn't provide the actual percentage of votes Candidate A will get on the actual day of the election but maybe we're close. So let's create a 95% confidence interval. In other words, let's use our sample result to create an interval that very likely includes the actual percentage of votes Candidate A will get on election day. 

Let's take a look at a normal distribution curve. If we want a 95% confidence level, that would mean we'd want to capture 95% of the area under the curve between two points equidistant from the sample proportion which means 2.5% of the area under the curve on the right side of the curve would not be included and 2.5% of the area under the curve on the left side of the curve would not be included. So how do we find these two points which establish our interval? We'll have to take a look at `z-scores`. 

<img src="images/confidence_intervals3.png" alt="" style="width: 600px;"/>

`Z-scores tell us how many standard deviations away from the mean we would need to be to capture a certain percentage of the total distribution`. So we pull up a z-score table. 

<img src="images/confidence_intervals4.png" alt="" style="width: 600px;"/>

We find 0.975. This means 97.5% of the data points are to the left of this point and thus, 2.5% of the data points would fall to the right of this point. As you can see, our z-score is 1.96. This means if we go 1.96 standard deviations in the positive direction and 1.96 standard deviations in the negative direction, 95% of the area under our distribution will fall between these two limits. 

<img src="images/confidence_intervals5.png" alt="" style="width: 600px;"/>

So let's see where we are so far. `Our sample proportion is 0.55`. `This will be the center of our interval`. Therefore, the upper limit will be 0.55 plus 1.96 times our standard deviation and our lower limit will be 0.55 minus 1.96 times our standard deviation. So of course now, we need to find our standard deviation. 

In the absence of the population proportion and the population's standard deviation, we can use the formula for the standard error. It's basically the formula for the standard deviation but we use the sample proportion as our p hat. Since we use the sample proportion, p hat, instead of the population proportion, p, we can't call it the standard error. When you use the sample proportion, it's called `the sampling error`. So we put p hat into our formula and of course, in our particular example, we polled 100 voters so n is equal to 100. As you can see, our standard deviation or sampling error in this case is 0.05. 

<img src="images/confidence_intervals6.png" alt="" style="width: 300px;"/>

So when we plug that standard deviation into our formula, we get an interval that has a lower limit of 0.452 and an upper limit of 0.648. 

<img src="images/confidence_intervals7.png" alt="" style="width: 300px;"/>

Uh oh, our interval goes all the way down to 0.452 which means that our margin of error tells us that losing is still possible. Then again, a large proportion of our interval is over 50%. Still, your candidate is probably a bit nervous but how about if we took a bigger sample? 

How about if the campaign is willing to fund a poll of one thousand voters? The numbers on this poll are a bit lower for your candidate. This poll tells us 54% of the voters are for Candidate A. Let's calculate our sampling error with our new sample proportion and sample size of one thousand. 

<img src="images/confidence_intervals8.png" alt="" style="width: 300px;"/>

Our sampling error is now 0.16. So let's calculate our interval limits. 

<img src="images/confidence_intervals9.png" alt="" style="width: 300px;"/>

Our new interval stretches from 0.509 to 0.571. If the election were to have a result identical to anything within our 95% confidence interval, Candidate A would win. Remember, according to this sample, `there is now a 95% chance that on election day, Candidate A will receive between 50.9% and 57.1% of the vote`. There's only a 5% chance that the election day results will fall outside of that interval. And don't forget, it's possible that those 5% might include results that are even better than 57.1% of the vote for Candidate A. No matter how you slice it, that should make Candidate A's team feel pretty good, right? Yeah, there's always that one person on the team that asks, can we get a 96% interval or what about 98%? So for those people, next, we'll create confidence intervals that are greater than 95%. 

### Do you want to be more than 95% confident?
For some people, 95% just isn't good enough. So what happens if someone demands a 98% confidence interval? Well, let's remember a 95% confidence interval stretches in equal distances in opposite directions from our sample proportion. How far? Enough to include 95% of the probability distribution, which means that a 98% confidence interval would have to stretch a little bit farther so our interval would include 98% of the probability distribution. Notice my numbers didn't really get any better. It's more like saying, "I'm 75% sure "my lost car keys are in my living room, "but I'm 99% sure my lost car keys "are somewhere in this house. I simply increase the likely location of my keys and that increased the likelihood that this area contained my keys. 

So, when someone demands that we provide a 98% confidence interval instead of a confidence interval of 95%, it's important that they understand what the difference is between the two intervals. With that in mind, let's go ahead and figure out how to calculate the limits of this expanded interval. Remember, to find the limits of our 95% confidence interval, we used these formulas. 

<img src="images/confidence_intervals10.png" alt="" style="width: 300px;"/>

Our sample proportion, p-hat plus or minus our sampling error, which is really just our sample's standard deviation, times 1.96. Why 1.96? Because that was the appropriate z-score for 95%. So really, the only thing we will change to adjust our interval is the z-score of 1.96. But how do we find the right z-score for, let's say, 98%? Don't get fooled, you do not want the z-score for 0.98. You actually need the z-score for 0.99. Why? Well, let's take a look at our distribution. We want to set limits where 98% of the data is under the curve between the limits, so 2% of the distribution falls outside of the limits. 1% of the distribution on the right end of the curve and 1% on the left end of the curve. So we need to find the z-score for 0.99. Here's a z-score table. 

<img src="images/confidence_intervals11.png" alt="" style="width: 600px;"/>

Within the table, we are looking for 0.9900 or the closest number that is greater than that. In this case, the appropriate z-score is 2.33, and since the interval will stretch in equal distances in opposite directions, we will use 2.33 for both our upper and lower limit. 

<img src="images/confidence_intervals12.png" alt="" style="width: 300px;"/>

So, if we poll 1,000 people and 540 of those people favor Candidate A instead of Candidate B, then we know we have a p-hat of 0.54 and we know that n is equal to 1,000.

<img src="images/confidence_intervals13.png" alt="" style="width: 300px;"/>

Using this formula, we can calculate our sampling error. 

<img src="images/confidence_intervals14.png" alt="" style="width: 300px;"/>

Our sampling error, therefore, is 0.017. 

We now have what we need to calculate our 98% confidence interval.

<img src="images/confidence_intervals15.png" alt="" style="width: 300px;"/>

Look at that. If an election victory requires at least 50% of the vote, victory is still within our margin of error. I know, I know there's always the person that wants more. How about 99%? Using the same logic as with 98%, we realize we need to find the z-score for 0.995, which is 2.58. Let's calculate our 99% interval. 

<img src="images/confidence_intervals16.png" alt="" style="width: 300px;"/>

Here we see that the chance for a narrow election loss is within our margin of error. Nonetheless, it looks like, based on our simple random sample, we can feel fairly confident that an election win is likely. Still, even with this strong statistical evidence, the candidate can lose. If the candidate lost after we reported that a win seemed rather likely, how might that loss be explained? Let's try and figure that out next.

### Explaining unexpected outcomes
Suppose before an election, a polling organization reports a 95% confidence interval for candidate A. This confidence interval stretches from 0.51 to 0.54. In other words, the poll believes that Candidate A will get between 51% and 54% of the vote. Then election day comes around and Candidate A looses. Candidate A would probably be furious. Before the election, they were very confident of a win. And now they realize that they actually lost. 

How could this have happened? Well this is where it helps to be a well rounded statistician. `Beyond having a knowledge of the numbers and formulas, you need to understand the real environment that surrounds the poll`. In this case, it would be helpful if we understood how political polls are done and also the nature of the actual election. What might go wrong during the actual poll? 
- Lying, 
- Correspondents might want to throw off the polls, they may just lie. 
- Or perhaps, they're embarrassed to tell a pollster about their true opinions, and thus they would rather give them an answer that would please the pollster. 
- Maybe they didn't lie. Perhaps the respondent just changed their mind between the time of the poll and the actual date of the election. It's possible some people were unsure who they wanted to vote for on the day of the poll. But they chose a candidate for the poll just to please the pollster. 
- Perhaps, there were issue in gathering a random sample. The location of the poll. The people chosen. How they were chosen. The incentives used to entice more participants. 
- It takes a very experienced organization to gather a truly random sample. Sometimes, in an effort to influence voters that are still uncertain about who they will vote for, some politically biased polling organizations might actually seek biased polling results that they can use in the media to show that their candidate is popular among likely voters. These organizations may have had poor sample selection, poorly worded questions, or other questionable if not deceitful practices. 

As you can see, the polling process itself is filled with challenges. So let's move on to election day. What might go wrong on Election Day. 
- Bad weather. 
- A health epidemic. 
- Unsafe travel conditions. 
- Car trouble. 
- Work or family commitments. 

This is just a short list of reasons people might not be able to get to the voting booth on Election day. Perhaps between the time of the pre election poll, and election day, something changed. 
- An event occurred that changed the way voters made their decisions. 
- Perhaps a scandal involving one of the candidates was uncovered. 

Sometimes, voters just choose to stay home. Why? 
- They don't really care that much. 
- Maybe they just forgot. 
- Perhaps they heard the lines were long. 
- Maybe they had been watching the news, and analysts made it seem as though the outcomes were certain. 
- Perhaps the voter thought the election is a done deal. 
- My vote isn't going to make a difference at this point. 

Hopefully you aren't growing distrustful of statistics. If you investigate confidence intervals that have been reported over the last few decades, you'll see that very often, they are very accurate. Nonetheless, when the confidence interval misses the mark, it's important to know where the poll might have gone wrong. Actually, it's best to know these things before the study is even performed. `If you're looking to use confidence intervals to make important decisions, be sure you investigate how the study was done and which assumptions and simplifications were included in the development of the study. As I said, a good statistician has to know more than numbers and formulas. They need to really understand the environment that they're looking to measure`.

### 95% confidence intervals for population means
Political campaigns rely heavily on developing confidence intervals for voter preference. In many cases, these types of confidence intervals are for proportions, but how about if we wanted to develop confidence intervals for these types of situations? 
- What's the average salary of a cardiologist? 
- What's the weight of the average grapefruit grown in the state of Texas? 
- How long does it take the average female, 20 to 29 years old, to run a mile? 

In these cases, we're not looking at proportions. Instead, `we're looking for means`. So, how do we create a confidence interval for a population mean? Well, it's not really that much different than the reasoning we use to develop confidence intervals for proportions. Here are the `formulas used to develop a 95% confidence interval for proportions`. 

<img src="images/confidence_intervals17.png" alt="" style="width: 300px;"/>

The sample proportion plus or minus 1.96, times the sample's sampling error. Remember, `1.96` is the appropriate `Z-Score for a 95% interval`. So, if we wanted a different confidence interval, we would just find the appropriate Z-Score. Now let's move from proportions to means. Here are the `formulas for developing a 95% confidence interval for means`. 

<img src="images/confidence_intervals18.png" alt="" style="width: 300px;"/>

Pretty much the same thing, except we substitute in the sample mean where we had the population proportion. 

So, suppose we wanted to create a 95% confidence interval for the average time it takes females, in their twenties, to run a mile. We could take a simple random sample of women in their twenties. We would then time their miles. Suppose these are the one mile run times in minutes for a simple random sample of nine women in their twenties. 

<img src="images/confidence_intervals19.png" alt="" style="width: 100px;"/>

So, all we need to do is find the sample mean. In this case, it's 8.96 minutes. The standard deviation for these nine times is 1.36. We then use this standard deviation, 1.36, to compute our sampling error. 

<img src="images/confidence_intervals20.png" alt="" style="width: 600px;"/>

If we plug these numbers into our formulas for the 95% confidence interval, we will get an interval from 8.08 minutes to 9.84 minutes. 

<img src="images/confidence_intervals21.png" alt="" style="width: 300px;"/>

According to this simple random sample, `there is a 95% chance that the population mean for the one mile run time for females in their twenties is somewhere between 8.08 minutes and 9.84 minutes`. Just as with our proportion intervals, by adjusting our sample size, which influences our sampling error, we can impact the size of the interval. And, of course, when we ask for an interval with a different confidence level, we can just adjust our Z-Score, which will also influence our confidence interval.

You are now ready to develop confidence intervals for situations that use proportions as well as situations that require means.

### Example 1
Which of the following is NOT an explanation of why poll results may differ from an election outcome?
- (correct) the group polled was a random sample
- the respondents were still unsure at time of poll
- a major event occurred between the poll date and the election date
- the polling organization was biased

### Example 2
```
When looking for a 95% confidence interval we need the z-score for ___.
0.975
In order for to have 0.95 within the interval, 0.05 would be outside the interval, 0.025 on each end of the interval. Thus a z-score of 0.975 would account for all but the 0.025.
```

### Example 3
```
A 95% confidence interval means that if a similar random sample were taken 20 times, we would expect the population mean to fall within the resulting range ____ times.
19
20 samples * 0.95 = 19 samples within the range.
```

## Hypothesis Testing

Have you ever come upon situations, outcomes, or events that just seem odd? In a city made up of 51% women, where jury pools are said to be chosen at random, a certain jury pool of 50 people contains only eight women. A national restaurant chain provides a game piece for every drink a customer buys. There are 10 prizes worth over $100,000. Two of those prizes are won by relatives of restaurant employees. Three employees from a particular chemical factory with 400 employees are diagnosed with brain cancer in a two-year period. When you hear things like this, they make you think. It doesn't seem right. Is that even possible? And if it is possible, how likely is it that it could have happened at random? Sometimes these questions and the related answers could impact our careers and companies. They may help us make decisions. They might influence our superiors to act. Perhaps you work at a healthcare company. Your company has developed a drug to treat the common cold. It's reported that the average adult with the common cold will experience cold symptoms for about 8.5 days. When testing this new medicine on a random sample of 250 people with the common cold, it's found that these patients recovered about 1.2 days sooner that those that did not take this drug. Is this significant? Could this sample just be the result of chance, or did this drug have an impact? Should the drug be tested further? Does this mean this new drug should be approved for use? This is where hypothesis testing comes in. 

**Hypothesis testing** is an extremely popular method for exploring outcomes. In general, statisticians will: 
- make an assumption about a population. 
- collect a random sample from the population. 
- measure the sample. 
- see whether or not the sample measurement supports their assumption. 

It can be complex, but when done properly, hypothesis testing can be extremely powerful. Hypothesis testing, it really requires you to use all of your statistical muscles, but once you have hypothesis testing at your disposal, you'll be able to provide valuable input in almost any career. Science, medicine, business, education, public policy, and even sports and entertainment. No matter your field, make sure you understand the most basic elements of hypothesis testing.

### How to test a hypothesis in four steps

<img src="images/hypothesis_testing_01.png" alt="" style="width: 300px;"/>

The adult residents of a large town with an adult population of 35,000 are half male and half female. Each week, 50 adults are chosen at random to participate in jury duty. Women have complained that they are getting called to jury duty more often than the men. Jury administrators contend the system is random and fair. A committee is setup to investigate. They use the next jury pool as a sample. They find that in that pool of 50 potential jurors, 14 are men and 36 are women. The jury administration contends this happened by chance. A lobbying group disagrees. What we have here is an excellent opportunity to utilize hypothesis testing. 

**Hypothesis testing** is really a process. In our case, a four step process. 

<img src="images/hypothesis_testing_02.png" alt="" style="width: 600px;"/>

In our first step, we need to setup our hypotheses. There will typically be two hypotheses. H sub-zero or H not, this is our `null hypothesis`. We might refer to this as what we consider to be the status quo. In our case, this basically accepts that these jury numbers did happen by chance. In this hypothesis test, the null hypothesis states that everything's okay. And thus, the odds of a women being picked for the jury duty was at least 50%. So, our null hypothesis is p is less than or equal to 0.50. In other words, women had a 50% chance, or perhaps even less than a 50% chance, of being chosen for jury duty. 

Let's move to our `alternative hypothesis`, H sub-a, This would be the opposite of the null hypothesis. This one would say that women did not have a 50% chance of being chosen for jury duty. In fact, the chance of a women being chosen for the jury are greater than 50%. So here, our alternative hypothesis is p is greater than 0.50. 

<img src="images/hypothesis_testing_03.png" alt="" style="width: 600px;"/>

With our two hypotheses now stated, we'll then also want to state as `significance level`. Essentially, this sets a threshold for our test. In other words, suppose through our test we find that 36 or more women might end up on a 50 person jury pool by chance 30% of the time or 20% of the time. Or, what if it's only 10% of the time? Would you believe that this actually happened by chance or would you say, if it is below some `significance level alpha`, you would think that something was wrong. 

So, let's set our significance level at 5%. If 36 or more women ending up on a jury has less than a 5% chance of occurring at random, then, we will reject our null hypothesis. 

<img src="images/hypothesis_testing_04.png" alt="" style="width: 600px;"/> 

In our second step, we look to find a `statistic` that will assess the validity of our null hypothesis. How could we see if this outcome, 36 women and 14 men, or an outcome that is even more extreme, could happen at random under these circumstances. Here we will use binomial probabilities. Our `p` here is equal to 0.50. That's the probability of a women being chosen for the jury panel. And, we will have 50 trials since that's how many seats there are on the jury panel. Finally, the number of successful trials will be 36. We have our null hypothesis. We also have our `test statistic`. Now, we find the `p value for that test statistic`. 

The `p value` is the probability that this outcome, 36 women and 14 men, or an outcome even more extreme, could occur by chance. So, we're looking for the probability that the number of women chosen for the jury panel would be 36 or more. What do we find? Whether you did the calculation the hard way, with a binomial table, with a spreadsheet, or perhaps with an online binomial calculator, you would find the probability of this particular outcome would be 0.0013 or 0.13%. 

<img src="images/hypothesis_testing_05.png" alt="" style="width: 600px;"/>

Those are some pretty long odds. So, our final step, time to compare our p value to our fixed significance level. 

<img src="images/hypothesis_testing_06.png" alt="" style="width: 600px;"/>

What we found was that assuming that the odds that a man and a women were equally likely to be chosen for a jury there was only a 0.13% chance that at random 36 or more women would be chosen for a panel of 50 potential jurors. Our fixed significance level, alpha, was 0.05 or 5%. Clearly, the p value fell short of our significance level and thus, we must reject the null hypothesis. This means, we believe that something is making it much more likely for a women to be chosen versus a man. 

Now, it doesn't prove that the cause is evil or intentional, nor does it prove that the cause is unintentional and innocent. It simply means, we reject the null hypothesis. Let's briefly talk about that too. Our only outcomes possible for this test would have been to reject the hypothesis, which is what we just saw. The alternative would have been do not reject the null hypothesis. Notice, that does not mean we said accept the hypothesis. We were looking to contradict the hypothesis. It's sort of like saying, a person on trial is guilty or not guilty. Guilty means the evidence is there to convict. Not guilty means there was a lack of evidence. Not guilty does not necessarily mean the jury believed the person was innocent, they just lacked the evidence to prove guilt. 

### One-tailed vs. two-tailed tests
Let's consider three different statements. 
- First, a recent national study found that the average American between the ages of 18 and 24 checks their phone 74 times per day. A mobile service provider questions these results. 
- Second, the average amount of time it takes an adult to recover from the common cold is 8.5 days. A new medicine was tested on a sample of adults suffering from the common cold. The average recovery time for the people in this group was 7.3 days. The company that developed this medicine thinks the drug should be considered for federal approval. 
- Finally, consider the national average for the college entrance exam, 1000 points. The Regent Test Prep Academy claims that their students consistently beat that national average. 

These are all situations where `hypothesis testing` would be useful. But each of these situations would require a different type of hypothesis test. Let's look at each situation individually. 

#### Scenario 1
In our first situation, we had a claim that said people between the ages of 18 and 24 checked their phones 74 times per day. Some folks doubted that claim though. Notice, this group did not say the number was too high, nor did they contend the number was too low. They just expressed doubt in the stated average of 74 times per day. In this case, out hypotheses look like this. 

<img src="images/hypothesis_testing_07.png" alt="" style="width: 300px;"/>

Our null hypothesis, H sub zero, or H naught. Mu is equal to 74.0. Our alternative hypothesis H sub A, Mu is not equal to 74.0 if we look at our normal distribution, what we have is this, 74.0 is the mean of our null hypothesis. Let's say that we thought that anything more than 1.7 standard deviations from the mean in either direction, would mean we could reject our null hypothesis.

<img src="images/hypothesis_testing_08.png" alt="" style="width: 300px;"/>

On the the other hand, anything that was less than 1.7 standard deviations, from the null hypothesis mean, would tell us that we could not reject the null hypothesis. 

<img src="images/hypothesis_testing_09.png" alt="" style="width: 300px;"/>

As you can see, we have two rejections areas here, one rejection area in the positive direction, greater than the mean. The other in the negative direction, less than the mean. This is considered a `two tailed test` because the null hypothesis is tested in both directions. 

#### Scenario 2
On the other hand, in our example where the average person recovers from the cold in 8.5 days, our test group recovers in 7.3 days. This is a `one tail test`. Why? Well in this case, our hypotheses look like this. 

<img src="images/hypothesis_testing_10.png" alt="" style="width: 600px;"/>

Our null hypothesis H sub zero, Mu is greater than or equal to 8.5 days. Our alternative hypothesis H sub a, Mu is less than 8.5 days. So our null hypothesis is saying that patients do not recover faster with the drug and perhaps they may even take longer to recover. Both of these situations would indicate the medicine was not helpful. The alternative hypothesis is indicating that the medicine does in fact have an impact. 

<img src="images/hypothesis_testing_11.png" alt="" style="width: 300px;"/>

On our normal distribution graph, we have 8.5 as our null hypothesis mean. We can set two areas, 1.7 standard deviations from the mean. The difference here is that the drug can only be considered helpful if the patients actually get better faster. Which means that this area to the left, this small single tail, represents the area where we would reject the null hypothesis. 

<img src="images/hypothesis_testing_12.png" alt="" style="width: 300px;"/>

The large area the right would indicate that we could not reject the null hypothesis. That would be bad news for the drug company. 

#### Scenario 3
So lets take a look at our test prep school. This example is very similar to the cold medicine example. The difference here is that we are looking for an increase in the test scores. Here are our hypotheses for this situation. 

<img src="images/hypothesis_testing_13.png" alt="" style="width: 600px;"/>

Our null hypothesis, H sub zero, H naught. Mu is less than or equal to 1000 points. Our alternative hypothesis H sub A, Mu is greater than 1000 points. Our null hypothesis is saying students of the Regent School do not see increased test scores. Instead they see average scores or perhaps even below average scores. The alternative hypothesis says that the Regent students do score over the national average. 

<img src="images/hypothesis_testing_14.png" alt="" style="width: 300px;"/>

What do we see on our normal distribution? Again, we see one tail. If we land in this area on the right, we would reject the null hypothesis. 

<img src="images/hypothesis_testing_15.png" alt="" style="width: 300px;"/>

If we land anywhere else in the large area to the left, we would not reject the null hypothesis. 

As you start to look for opportunities to utilize hypothesis testing, be sure you consider whether your hypothesis test is a one tailed test, or a two tailed test.

### Significance test for proportions
A candidate's campaign finds that in a random sample of 500 eligible voters, 54% of those polled said they planned on voting for this candidate. The candidate needs over 50% of the vote to win the election. This candidate would like to test the hypothesis that he will win this election. Let's go through our four step process. 

<img src="images/hypothesis_testing_16.png" alt="" style="width: 300px;"/>

Step one, we're going to develop the hypotheses and state the significance level. 

<img src="images/hypothesis_testing_17.png" alt="" style="width: 600px;"/>

H sub zero, our null hypothesis, will be p is less than or equal to 0.50. This hypothesis states that the candidate would get 50% or less of the votes, and thus not have enough of the votes to win the election. H sub a, our alternative hypothesis, the candidate wins. This would be the opposite of the null hypothesis. This one would say that this candidate would get a majority of the vote and thus win the election. Our alternative hypothesis is p is greater than 0.5. 

<img src="images/hypothesis_testing_18.png" alt="" style="width: 600px;"/>

Our significance level for this test will be 5%. If this has less than a 5% chance of occurring, then we reject our null hypothesis. We're looking at a one-tailed test, where the rejection region is on the right-hand side of our distribution. If our proportion does not fall in the rejection region, we'll not have enough evidence to reject the null hypothesis. 

<img src="images/hypothesis_testing_19.png" alt="" style="width: 300px;"/>

So let's go to step two, we're going to identify the test statistic. In this situation, our test statistic is a `z-score`. We will call it `z sub p`. This `z-score` will establish the point on our distribution which divides the do not reject area from the rejection area. `p hat` is the sample proportion. `p sub zero` is the proportion from our null hypothesis. `n` is our sample size. 

<img src="images/hypothesis_testing_20.png" alt="" style="width: 600px;"/>

So in our case, p hat is equal to 0.54. p sub zero is equal to 0.50. And n our sample size is 500. And if we use these numbers, we get a z-score of 1.79. 

So now we move on to step three, our `p-value`. 

<img src="images/hypothesis_testing_21.png" alt="" style="width: 300px;"/>

So our test statistic z sub p is 1.79. If we look at this number on our z-score chart, you'll find that 1.79 leads you to 0.9633. So our p-value is 1 minus 0.9633 which gives us a p-value of 0.0367. 

Step four, so now we're going to compare our p-value to our fixed significance level. 

<img src="images/hypothesis_testing_22.png" alt="" style="width: 300px;"/>

Our fixed significance level, alpha, was 5% or 0.05. Our p-value was 0.0367. That's smaller than our 0.05 significance level, thus `we can reject the null hypothesis`. 

Graphically, we can look at this a few ways. Here's our distribution. 

<img src="images/hypothesis_testing_27.png" alt="" style="width: 300px;"/>

We established that the left side of the distribution is the do not reject the null hypothesis area. The right part of the distribution is the reject the null hypothesis area. Our alpha was 0.05, which means that 95% of the distribution was on the left side of the distribution, and 5% was to the right. 

We can also look at this by comparing z-scores. The z-score for 0.05 on a one-tailed test is 1.65. That would be 1.65 standard deviations from the null hypothesis population proportion, which was 0.50. 

<img src="images/hypothesis_testing_23.png" alt="" style="width: 600px;"/>

Our calculated z sub p though, was 1.79. 1.79 standard deviations from the population proportion. 

<img src="images/hypothesis_testing_24.png" alt="" style="width: 600px;"/>

Again, we land in this region on the right. So we reject the null hypothesis. The politician can breathe easy. Unless they demand `a hypothesis test with a 2% significance level`. So here is where the 2% significance level would be on our distribution. 

<img src="images/hypothesis_testing_25.png" alt="" style="width: 300px;"/>

Here is where our p-value of 0.0367 would put us. 

If we wanted to use our z-scores, the z-score for 0.02 on a one-tailed test is 2.06. That would be 2.06 standard deviations from the null hypothesis population proportion which was 0.50. Our calculated z sub p though was 1.79. 

<img src="images/hypothesis_testing_26.png" alt="" style="width: 600px;"/>

No matter the method, we are now in the do not reject the null hypothesis area. In this case, the hypothesis test tells us `we cannot reject the hypothesis` that the candidate will get 50% or less of the vote. As you can see, this hypothesis test hinged on significance level.

### Significance test for means (acceptance sampling)
K-Nosh is a national gourmet dog food company. They sell thousands of bags of dog food each day. They sell dog food in eight, 20, and 40-pound bags. And the 20-pound bag is by far the most popular size. K-Nosh's high-end customers demand outstanding products and excellent service. Customers don't want a bag with less than 20 pounds. So while the bag is labeled as 20 pounds, K-Nosh sets the desired weight of each bag at 20.15 pounds to ensure customers get at least 20 pounds in each bag. 

<img src="images/hypothesis_testing_28.png" alt="" style="width: 300px;"/>

Each day, K-Nosh employees pull a random sample of 100 bags out of the thousands they ship. Based on the 100-bag sample, they will either send out the shipment or they will reject the shipment for that day. Today's sample had an average weight of 20.10 pounds, and the population standard deviation is 0.26 pounds. So let's start our four-step process. 

Step one, develop the hypotheses and state the significance level. So, let's develop our hypotheses. 

<img src="images/hypothesis_testing_29.png" alt="" style="width: 300px;"/>

Our null hypothesis, H sub zero or H-naught, mu is greater than or equal to 20.15 pounds. This hypothesis states that the bags of dog food are equal to or greater than 20.15 pounds. It's what we would consider the standard state. Our alternative hypothesis, H sub a, this one says the bags of dog food weigh less than 20.15 pounds. This would be the opposite of the null hypothesis. Our alternative hypothesis is mu is less than 20.15 pounds. As usual, we will see whether or not we will reject the null hypothesis. If we reject the null hypothesis, that would mean K-Nosh would not make any shipments of 20-pound bags on that date. 

<img src="images/hypothesis_testing_30.png" alt="" style="width: 300px;"/>

Our significance level for this test will be five percent. If this has less than a five percent chance of occurring, then we reject our null hypothesis. We're looking at a one-tail test where the rejection region is on the left-hand side of our distribution. If our sample falls in the rejection region, we will reject the entire shipment. 

So now we move on to step two, identify the test statistic. 

<img src="images/hypothesis_testing_31.png" alt="" style="width: 600px;"/>

In this situation, our test statistic is a z-score. This `z-score` will establish the point on our distribution which divides the "do not reject" area from the rejection area. `x-bar` is the sample mean. `Mu` is the mean from our null hypothesis. `n` is our sample size. And `sigma` is the population standard deviation. In our case, x-bar was equal to 20.10 pounds. Mu was 20.15 pounds. n, our sample size, is 100. And sigma, the population standard deviation, is 0.26 pounds. So if we use these numbers, we get a z-score of -1.92. 

<img src="images/hypothesis_testing_32.png" alt="" style="width: 300px;"/>

Step three, our p-value. Our test statistic z is -1.92. If we look this up on our z-score chart, you will find that -1.92 leads you 0.0274 or 2.74%. That is our p-value, 0.0274. 

<img src="images/hypothesis_testing_33.png" alt="" style="width: 300px;"/>

Step four, we're going to compare our p-value to our fixed significance level. Our fixed significance level alpha was five percent or 0.05. Our p-value was 0.0274. That's smaller than our 0.05 significance level. Thus, we have to reject the null hypothesis. 

<img src="images/hypothesis_testing_35.png" alt="" style="width: 300px;"/>

Our alpha was 0.05 which means that 95% of the distribution was on the right side of the alpha and five percent was to the left. Remember, we want to be close to our goal of 20.15 pounds. If we're too far to the left of 20.15 pounds, the bags are likely too light. If we compare z-scores, the z-score for 0.05 on a one-tail test is -1.65. That would be 1.65 standard deviations from the null hypothesis mean, 20.15. Our calculated z though was -1.92, 1.92 standard deviations from the mean. No matter how you look at it, `we must reject the null hypothesis`. The bags are too light. And so, we must reject the entire shipment. I'm guessing some of you might see this as harsh. But believe it or not, this quality control technique which is called acceptance sampling was very popular in the past and is still used in some industries today.

### Type I and type II errors
In our hypothesis tests, we've always set up a null hypothesis and an alternative hypothesis. The null hypothesis typically assumes that the status quo prevails. The null hypothesis might state that the system works, it might tell us that nothing has changed in our system. Our alternative hypothesis assumes the opposite. The alternative hypothesis might tell us that the system is broken. It might tell us that things have changed. 

Let's use a special type of cancer screening test as an example. This fictional screening would provide a reading based on your blood. The average reading is 100. People that get a reading over 125 get a positive result. This would indicate they have cancer. If we were going to equate this to a hypothesis test, we would say the cancer screening had two hypotheses. 

<img src="images/hypothesis_testing_36.png" alt="" style="width: 300px;"/>

The null hypothesis would be that everything is okay. The person being tested does not have cancer. The alternative hypothesis would state that the person being tested does, in fact, have cancer. Let's say that the incidence of cancer is normally distributed. 

<img src="images/hypothesis_testing_37.png" alt="" style="width: 300px;"/>

So, if we were going to look at this on a normal distribution, we might say that 100 is the mean. Anything to the right of 125 would be considered a positive result for cancer. So, left of 125, we do not reject the null hypothesis, but to the right, we would reject the null hypothesis. 

Up until now, we've assumed that if you are beyond 125, the patient has cancer, but remember, even if 125 represented an alpha of 0.02 or 2%, it would mean that it is extremely unlikely that someone with a reading over 125 is cancer-free. It's unlikely, but with an alpha of 2%, it's not impossible. Just as political polls sometimes predict the wrong candidate to win, cancer screening tests also make mistakes. But there are two types of mistakes or errors. 

Let's look at this small grid. 

<img src="images/hypothesis_testing_38.png" alt="" style="width: 300px;"/>

At the top, we see the true state of the system. The patient does not have cancer, which agrees with our null hypothesis. And the patient has cancer, this would agree with our alternative hypothesis. Along the side, we have the two possible outcomes of the test. The test comes back positive, which means that according to the test, they have cancer. This is the equivalent of rejecting our null hypothesis. 

How about the second outcome for our screening test? The test comes back negative, which means that according to the test, they do not have cancer. This is the equivalent of not rejecting our null hypothesis. 

Now, let's look at the possible results. If we get a negative test and the patient does not have cancer, the hypothesis test worked. If we get a positive test and the patient actually has cancer, the hypothesis test worked. But how about these other two quadrants? It's possible a person might get a positive test but not actually have cancer. This is what we would call a `Type One Error`. Typically, we refer to this as a `false positive`. This is the same as a person getting a reading over 125, but not actually having cancer. If we start to see lots of Type One Errors, Perhaps our screening test is not sensitive enough. You might start to question if there are better ways of testing the null hypothesis. 

The opposite is also possible. A person might get a negative test, even though they do have cancer. This is what is called a `Type Two Error`. This might also be referred to as a `false negative`. This would be the same as a person getting a reading under 125, even though they have cancer. If we start to see lots of Type Two Errors, that may mean our screening test is too sensitive. Again, we may need to question how we are testing our null hypothesis. 

Hypothesis tests, even when they are done the right way, can be flawed. So, it's important to understand that a hypothesis test might make a mistake. And by knowing the different types of errors, Type One and Type Two, it can help you in developing and interpreting our hypothesis tests and the subsequent results.

### Example 1
```
What is the probability that AT LEAST 36 of 50 randomly chosen people will be women, where the probability of choosing a woman is p=0.5? (Assuming the choosing of each person is independent of the others)
This is a binomial (discrete) distribution problem.

1. How many DIFFERENT combinations of choosing X women out of the 50 people are there? 
This would be C(50, X) = 50! / (X!*(50-X)!)

2. What is the probability of getting ONE of these choices? 
This would be p^X*(1-p)^(50-X), which would be in our case 0.5^X*0.5^(50-X)=0.5^50

So what is the probability of getting ANY such choice? 
This would be C(50, X)*0.5^50

What is the probability of choosing 36 women and 14 men? C(50, 36)*0.5^50
What is the probability of choosing 37 women and 13 men? C(50, 37)*0.5^50
.....
What is the probability of choosing 49 women and one man? C(50, 49)*0.5^50
What is the probability of choosing 50 women and zero men? C(50, 50)*0.5^50=1 (because 0!=1) 

The sum of these, is the probability of choosing AT LEAST 36 women (choosing 36 OR MORE women) which is ~0.13%
```

### Example 2
A smoke detector fails to beep even though there was a fire. This is an example of a:
- true negative
- (correct - see the errors table above) type II error
- false positive
- type I error
A type II error is when we have a fire but the alarm does not sound.

### Example 3
If our significance level is 5% and our p-value is calculated as 0.016 we should _____.
reject the null hypothesis
If the p-value is below our stated significance level you must reject the null hypothesis.

### Example 4
What is the first step in a hypothesis test?
The first step is to develop the hypothesis which will be tested and identify the significance level.

### Example 5
```
You measure a value of 1.73 for a random variable with a mean of 2.20 and a standard deviation of 0.22. 
What is the z-value of the measurement?
z = (2.20-1.73)/0.22
```

## Small Sample Sizes

### T-statistic vs. z-statistic
Up until now, we've used the `z-score` to help us identify how many `standard deviations` a data point might lie from the population mean. It's also been very helpful in developing `confidence intervals`. Now remember, the `z-score` requires that our data be normally distributed. It also requires that we know the `standard deviation` of the population. The central limit theorem tells us that given enough iterations, the mean of our sample will be normally distributed. But often, the population `standard deviation` is unknown. So how can we create confidence intervals when the population standard deviation is unknown? 

<img src="images/t-statistic_01.png" alt="" style="width: 300px;"/>

Believe it or not, you can use the standard deviation of a single sample. But if you have only one sample with a sample size under 30, a relatively small sample size, you can probably guess that your confidence interval will suffer, and this is why the `z-score` is not valid in this situation. 

If you're creating a confidence interval when the population variance is unknown, you must instead use something called the `t-distribution`. Before we discuss the differences between the z- and t-distributions, let's first discuss how they are similar. 

<img src="images/t-statistic_02.png" alt="" style="width: 300px;"/>

Both are symmetrical, bell-shaped distributions. Both require data with a normal distribution. And in both cases, the area under the curve is equal to 1.0. So how is the t-test different from our z-test? Well, the z-test is mostly used to compare the mean of a sample to its larger population. The sample comes from the population, so the means of the sample and population are intertwined. On the other hand, the `t-test` compares two completely independent samples. They don't have to come from the same population. 

So because of these differences, and also because of the small sample size, the `t-distribution` isn't one curve but rather a series of curves. Each curve is representative of the distribution for different sample sizes. The smaller the sample size, the flatter the curve. The larger the sample size, the closer it gets to the `z-distribution`, which we use for the standard normal curve. 

<img src="images/t-statistic_03.png" alt="" style="width: 600px;"/>

Since all of the t-distribution curves are flatter than the z-distribution curve, `the critical scores for t-distributions are higher than those for z-distributions`. You might remember that the appropriate z-score for a 95% confidence interval was 1.96. That's 1.96 standard deviations. How does that compare to t-scores? Well, it depends on the sample size. 

<img src="images/t-statistic_04.png" alt="" style="width: 300px;"/>

For a sample size of three, the t-score is 4.303. For a sample size of 10, the t-score is 2.262. For a sample size of 20, the t-score is 2.093. And by the time our sample size is equal to a hundred, our t-score goes to 1.98. As you can see, `the larger our sample size, the closer the t-score is to the z-score of 1.96`. So where do we get t-scores for all of the different possible sample sizes? Let's take a look at that next.

### T-score tables and degrees of freedom
Since `T-distributions rely on the standard deviation of a sample`, instead of the standard deviation of the population, there is a greater level of uncertainty when creating confidence intervals. As a result, the `z-scores we gather from a z-distribution chart are not sufficient`. Instead, we need to utilize `t-distribution charts`. There's not one single t-distribution chart, but rather multiple charts. Remember, the curve associated with a t-distribution is dependent on the sample size. The smaller the sample size, the flatter the distribution curve, and the greater the uncertainty. 

<img src="images/t-statistic_05.png" alt="" style="width: 300px;"/>

The larger the sample size, the closer the curve gets to the normal distribution. 

<img src="images/t-statistic_06.png" alt="" style="width: 300px;"/>

That is why for each sample size, we need a different t-score distribution table. Just in case you forgot, here's a snapshot of just one part of a z-score table. 

<img src="images/t-statistic_07.png" alt="" style="width: 600px;"/>

So imagine having a different table for each unique sample size. Yep, that would be a lot of really big tables. Most of the time, however, we're looking for the most common confidence intervals. 90%, 95%, 99%. That's why you're more likely to see t-distribution tables that look like this.

<img src="images/t-statistic_08.png" alt="" style="width: 600px;"/>

As you can see, along the top is the confidence level, which is also given as the equivalent of a one- or two-tailed test. Along the left side, you have a column labeled as `df`. This stands for `degrees of freedom`. What are degrees of freedom? The easy answer is that `degrees of freedom` is just our sample size minus one, also referred to as `n-1`. `N` is the amount of data points in our sample. 

<img src="images/t-statistic_09.png" alt="" style="width: 300px;"/>

So, for a sample size of five, we have four degrees of freedom. For a sample size of 10, we have nine degrees of freedom. There's a more complex answer, too, but let's leave that for another day. 

Let's go back to our t-distribution table. `Once we have our sample size, we have our degrees of freedom`. Let's suppose we want a 95% confidence interval, and that our sample size was four, which means we have three degrees of freedom. We isolate the column for 95%, we find the row for three degrees of freedom, and the intersection of that row and column bring us to our critical `t-score`, 3.182. 

<img src="images/t-statistic_10.png" alt="" style="width: 600px;"/>

How about if our sample size is 10? Now we look at the row for nine degrees of freedom. Our critical t-score is now 2.262. As you can see, `as our sample size gets larger, our critical t-score gets smaller, because the larger sample size is associated with the curve that is closer to our normal z-distribution`. So, now that you can find t-scores, let's use this to create some confidence intervals.

<img src="images/t-statistic_11.png" alt="" style="width: 600px;"/>

### Calculating confidence intervals using t-scores
Let's develop a confidence interval using t scores. Remember, that this confidence interval gives us a range of values for estimated population parameter. 

Imagine that a national testing organization has made some major changes to the standardized exam that most aspiring college students take. The exam scores range between 50 points and 200 points. The old exam typically had an average score of 130 points. They like to see how the average score for the updated exam compares to the old version of the exam. Further, this testing organization wants to create a **98% confidence interval** for the updated exam's mean score. 

In order to do so, they gave the updated test to a random sample of 10 aspiring college students. The scores for these 10 students are as follows. 

<img src="images/t-statistic_12.png" alt="" style="width: 300px;"/>

Our sample mean is 126, while we don't know the standard deviation of the exam scores for the entire population of aspiring college students, we can calculate the standard deviation for this sample, 29.51. We also know that since our sample size N was 10, our degrees of freedom is N minus one. In this case, that would be nine. 

We'd like to read a 98% confidence interval. So, when we go to our T distribution table, we find that the critical t score for a 98% confidence interval with nine degrees of freedom is 2.821. 

<img src="images/t-statistic_13.png" alt="" style="width: 600px;"/>

So, to calculate our confidence interval, we use these formulas. 

<img src="images/t-statistic_14.png" alt="" style="width: 300px;"/>

We have our sample mean, 126. We found our t score, 2.821. So now, we need to find our standard error. For this, we'll use this formula. 

<img src="images/t-statistic_15.png" alt="" style="width: 300px;"/>

Remember, our standard deviation of our sample was 29.51. Our sample size N is 10. Therefore, we can calculate our standard error, 9.33. 

Now, we have everything we need to calculate our 98% confidence interval. 

<img src="images/t-statistic_16.png" alt="" style="width: 300px;"/>

Our upper and lower control limits will be 126 plus and minus 2.821 times 9.33 which means our 98% confidence interval for the mean of the exam scores for the updated standardized exam stretches from about 99.7 to 152.3. Again, that means the we're 98% certain that the population mean lies between those two values. 

<img src="images/t-statistic_17.png" alt="" style="width: 300px;"/>

That's a rather big spend for an exam where scores can only be as low as 50 and where they can only be as high as 200. 

Suppose, we were content with a **95% confidence interval**, we can go back to our t distribution table. We find that the critical t score for a 95% confidence interval with nine degrees of freedom is 2.262. We plug in this new value into our confidence interval limit formulas and we get our upper and lower limits are 126 plus or minus our critical t score, 2.262 times 9.33. Our 95% confidence interval would be from about 105 to 147. 

<img src="images/t-statistic_18.png" alt="" style="width: 300px;"/>

Still pretty big but nonetheless, a bit smaller. You're probably thinking, that is still a huge confidence interval. Well, I think `the big lesson here is that without the availability of the population standard deviation, a much larger sample size is needed to provide us with a more meaningful confidence interval`. So, perhaps, this testing organization should go back and administer the updated exam to a much larger random sample.

### Example 1
```
What is the standard error for a sample size of 10 and a standard deviation of 3.64?
The standard error is the standard deviation divided by the square root of the sample size; it reflects uncertainty in the estimate of the mean.
SE = 3.64 / sqrt(10) = 1.15
```

## Comparing Two Populations (Proportions)

### Explanation of two populations
Confidence intervals and hypothesis testing. You've now been introduced to both of these important statistical foundations. `Confidence intervals` allow us to take a single sample and create an interval, which we're fairly confident contains the population proportion. `Hypothesis testing` allows us to see if this one sample was likely the result of chance, or if an external force may have impacted the sample data. 

We're now moving on to `comparing two populations`. We'll look to answer questions like: 
- Does taking aspirin reduce the chance of a heart attack? 
- Are young male drivers more likely to get into car accidents than young female drivers? 
- Are people in Los Angeles more likely to be victims of violent crime than people in New York City? 
- Are male high school teachers more likely to have higher salaries than female high school teachers? 

<img src="images/two_populations_prop_01.png" alt="" style="width: 600px;"/>

Notice, we keep using the wording **"more likely"**. Even with our comparisons, we can't be sure, but we can create confidence intervals. But, `what makes all of these questions similar is that each situation can be analyzed by comparing two independent random samples`. One from each population: an `experimental population` and a `control population`. 

<img src="images/two_populations_prop_02.png" alt="" style="width: 600px;"/>

- Those that take aspirin versus those that take a placebo. The placebo is the control group. 
- A sample of young male drivers versus a sample of young female drivers. In this case, either gender can be used as the control. 
- A sample of citizens of Los Angeles versus a sample of New Yorkers. In this case, either city could be used as a control. 
- And of course, a sample of male high school teacher salaries versus a sample of female high school teacher salaries. Again, either gender can be used as the control. 

In this first section, we will look at the comparison of two proportions of two independent populations. We'll use our knowledge of basic proportions. We'll work to create a confidence interval for the difference between these two population proportions, and finally, we will use hypothesis testing to compare the difference between the proportions for each independent sample. Yeah, I know, that sounds like a whole lot of work, but rather than just talk about it, let's walk through a problem.

### Set up a comparison
So, let's compare two independent populations in an effort to figure out if a new drug is effective in reducing the chance of a heart attack. In real life, this testing would take many years. There are several different phases for drug testing but let's ignore that for now. Imagine the a drug company gathers a large number of subjects. The subjects are randomly placed into two groups. One group of subjects is given this new drug. The other group of subjects is given a placebo, a pill with no medicinal value.

<img src="images/two_populations_prop_03.png" alt="" style="width: 600px;"/>

Suppose, these were the results of this long-term study. 

<img src="images/two_populations_prop_04.png" alt="" style="width: 600px;"/>

For a new drug group, we had a sample size of 2,219. 26 of those had a heart attack. So, our p-hat here is 26 divided by 2,219. For our placebo group, we had a sample size of 2,035. 46 of those people had a heart attack. So, in this case, our p-hat is 46 divided by 2,035. The sample group that took the drug had heart attacks at a rate of 0.0117 or 1.17%. The sample group that took the placebo had heart attacks at a rate of 0.0226 or 2.26%. 

`Remember, these are just samples. So, while the central limit theorem tell us these rather large samples are likely to be representative of the population, they are still only samples`. What we found was that the difference in the proportions of the samples was p-hat one minus p-hat two. 

<img src="images/two_populations_prop_05.png" alt="" style="width: 600px;"/>

In this case, that's 0.0226 minus 0.0117 which gives us 0.0109. A 1.09% difference between the two sample proportions. What we'd like to know is, what is the true difference between the rate of heart attacks for the population when they take the new drug versus when they take the placebo. Since the drug company does not likely have the means to do this type of test, they would have to create a confidence interval. The confidence interval formula looks just like the formulas we've already been using for confidence intervals. 

<img src="images/two_populations_prop_06.png" alt="" style="width: 600px;"/>

Let's start filling in our values. Suppose we want a 95% confidence interval, we go to a Z distribution chart and find a critical Z score of 1.96. 

<img src="images/two_populations_prop_07.png" alt="" style="width: 600px;"/>

That number probably looks familiar by now. We also know the observe difference from our samples. We calculated this to be 0.0109. So now, all we need is our standard error. 

<img src="images/two_populations_prop_08.png" alt="" style="width: 600px;"/>

We have everything we need, p-hat one is the sample proportion for our placebo group, 0.0226, n1 is the sample size for this group, 2,035, p-hat two is the sample proportion for our new drug group, 0.0117, and n2 is the sample size for this group, 2,219. When we plug in all of our numbers, we find that our standard error is 0.0040. 

<img src="images/two_populations_prop_09.png" alt="" style="width: 600px;"/>

And now, we can calculate the limits of our confidence interval. Our upper limit and lower limit are just 0.0109 plus or minus 1.96 or critical value times 0.004. Therefore, our upper limit is 0.0188. Our lower limit is 0.0030. 

<img src="images/two_populations_prop_10.png" alt="" style="width: 300px;"/>

What do these numbers mean? `It means that we are 95% confident that the new drug reduces the population's chance of having a heart attack by somewhere between 0.3% and 1.88%`. In other words, we are 95% confident this new drug is more effective than the placebo.

### Hypothesis testing
Before we prepare our hypothesis test, let's briefly recap our example. A company is trying to figure out if a new drug is effective in reducing the chance of a heart attack. The company gathers a large number of subjects. The subjects are randomly placed into two groups. One group of subjects is given this new drug. The other group of subjects is given a placebo. The people in the study are not to be told if they are getting the new drug or the placebo. The results of the study were as follows. 

<img src="images/two_populations_prop_04.png" alt="" style="width: 600px;"/>

For the new drug, we had a sample size 2,219. 26 of those people had a heart attack, so our p-hat was 26 divided by 2,219. For our placebo group, we had a sample size of 2,035. In this case, 46 people had a heart attack, so our p-hat here was 46 divided by 2,035. The results and the resulting 95% confidence interval both provide evidence that the new drug did help reduce the rate of heart attacks.

<img src="images/two_populations_prop_11.png" alt="" style="width: 600px;"/>

The question we have is, what's the probability that our results happened by chance? In other words, we had 4,254 people in the study. 72 of those people suffered a heart attack. `What's the probability that even without the drug or placebo these same people would have suffered heart attacks?` Perhaps these 4,254 people were randomly put into two groups and one group just happened to get a lot more of the people that would end up getting heart attacks. 

<img src="images/two_populations_prop_12.png" alt="" style="width: 600px;"/>

This is extremely important to think about, especially for `random sampling` and `random assignment`. So let's go ahead and perform a hypothesis test. 

<img src="images/two_populations_prop_13.png" alt="" style="width: 300px;"/>

A hypothesis test evaluates two mutually exclusive statements about a population to determine which statement is best supported by the sample data. 

<img src="images/two_populations_prop_14.png" alt="" style="width: 300px;"/>

So for step one, let's develop the hypotheses and state the significance level. Our null hypothesis, H-naught, population proportion of the placebo group minus the population proportion of our new drug group is equal to zero. In other words, the new drug had no effect. Both proportions are identical. Our alternative hypothesis, H sub one, here, our population proportion for the placebo group minus the population proportion for our new drug group is not equal to zero. This means that the proportion of heart attacks for people that took the new drug was different than that of those that took the placebo. This would indicate the new drug had some effect. Let's set our `significance level at 5%`. If there is less than a 5% chance the results of our study could have happened by chance, then we will reject our null hypothesis. 

<img src="images/two_populations_prop_14.png" alt="" style="width: 300px;"/>

So let's go to step two. Let's identify the test statistic. As in previous hypothesis tests the test statistic will be a Z-statistic. In this case, we are using the `Z-statistic for a hypothesis test for the difference between two proportions`. 

<img src="images/two_populations_prop_15.png" alt="" style="width: 300px;"/>

<img src="images/two_populations_prop_16.png" alt="" style="width: 300px;"/>

Yes, a very ugly formula, so let's start plugging in the numbers. The numerator of this formula is looking for the difference between our sample proportions and the true population proportions. Remember, we're testing the null hypothesis. The null hypothesis assumes that the difference between the two populations should be zero. So we can eliminate the second half of our numerator. 

<img src="images/two_populations_prop_17.png" alt="" style="width: 300px;"/>

The first part of the numerator is just the difference between our two samples. Proportion of heart attacks for the placebo was 0.0226. The proportion of heart attacks for the new drug group was 0.0117. Therefore, our numerator is 0.0109. 

<img src="images/two_populations_prop_18.png" alt="" style="width: 300px;"/>

How 'about that scary denominator with the square root? This is actually our standard error. Let's fill in our two sample sizes first, 2,219 for the new drug group, 2,035 for the placebo group. That leaves us with p-hat for the placebo and new drug, which again, are 0.0117 and 0.0226 respectively. When we calculate our denominator, which is the standard error, we get 0.004. Our numerator is 0.0109. Therefore, our Z-statistic is 2.725. 

<img src="images/two_populations_prop_19.png" alt="" style="width: 100px;"/>

So let's see what this looks like. We have our normal distribution. If the null hypothesis were true, the difference between the two populations would be zero. In our samples, we found that the new drug group had a lower proportion of people that suffered heart attacks. The difference between the two groups was 0.0109. 

<img src="images/two_populations_prop_20.png" alt="" style="width: 300px;"/>

This is a two-tailed test with a 5% significance level. The Z-score for this would be 1.96. This means that if our actual outcome were more than 1.96 standard deviations from the expected outcome, then we must reject our null hypothesis. Our result was 2.725 standard deviations from the expected outcome. In fact, by looking at a Z distribution chart, 2.725 corresponds with an outcome that is 0.3% likely. 

<img src="images/two_populations_prop_21.png" alt="" style="width: 600px;"/>

We have to reject our null hypothesis. In other words, we can feel fairly confident that the positive results exhibited by the group that took the new drug did not occur by chance.

### Example 1
The two hypotheses to be tested _____.
- must be logically equivalent
- (correct) cannot both be true
- must have differing probabilities of truth
- must both be true

The hypotheses must be mutually exclusive.

### Example 2
How should populations differ when we wish to investigate an effect?
- They should differ by at least five known variables.
- They should differ in size.
- (correct) They should differ by one or more known variables.
- They should differ by country of origin.

We should control at least one variable.

## Comparing Two Populations (Means)

### Basics of comparing two population means
Consider these situations: 
- A large national corporation has a hundred senior executives. About 40 of those senior executives are women. The other 60 are men. It is found that the average salary for male senior executives is about $15,000 per year greater than the salaries of the female senior executives. Why are male senior executives at this company paid higher salaries? Does the gender of the senior executives play a role? 

<img src="images/two_populations_mean_01.png" alt="" style="width: 400px;"/>

- A hundred obese males in their 20s are randomly assigned to two groups for a period of three months. One group of males exercise two hours a day but are allowed to eat whatever they please. The other group of males are not required to exercise, but they must adhere to a very strict diet. The males on the strict diet lose an average of four pounds more during the three month period versus the individuals that are required to exercise two hours per day. Is diet a better mechanism for influencing weight loss among young obese males versus daily exercise? 

<img src="images/two_populations_mean_02.png" alt="" style="width: 600px;"/>

- A group of 1,000 high school students are randomly assigned to two groups. One group is required to take a daily multivitamin pill. The other group is given a placebo, a pill with no medicinal value. The group that took the daily multivitamin scored an average of 3% higher on their math exams during their academic year. Does taking this daily vitamin improve a student's ability to perform well on math exams? 

<img src="images/two_populations_mean_03.png" alt="" style="width: 600px;"/>

In each scenario discussed, we had a group of people. They were assigned to two groups, either by their sex or by the stimuli to which they were exposed. In each case, the results of one group differed from the other group. One group had a higher mean salary. One group experienced a higher mean weight loss. And another group had higher mean exam scores on their math tests. 

The question is, did these measurable changes occur because of the stated differences in their groups or by chance? In other words, maybe the gender of the senior executives did not play a role in the salaries. Perhaps this group of female executives didn't hit their sales numbers that would warrant higher salaries. Maybe the weight loss program was not the differentiator in weight loss. It's possible that one group of obese men just happened to be assigned more of the men that were prone to lose weight under any program. And how would a vitamin help someone do better on a math exam? Perhaps the students that were better at math were randomly selected to be in the group that got the daily multivitamin. 

In this section, we'll look at different ways `to figure out whether population means for two populations can be attributed to actual differences or to chance`. Using data, charts, and randomizations, we'll look at different ways to figure out whether stimuli or chance influenced our statistical outcomes.

### Visualization (re-randomizing)
A certain school has 200 students. The students are randomly assigned to two different groups of 100 students. Each of the 100 students in the first group is given a math textbook to learn a certain math concept. We'll call this Group A. The second group of 100 students is asked to watch an online video to learn the same math concept. We'll call this Group B. Each group is given 30 minutes to learn the math concept. After the 30 minutes are up each of the 200 students takes an exam with 20 questions. 

<img src="images/two_populations_mean_04.png" alt="" style="width: 600px;"/>

The students that learn from the online video, Group B, they had a median test score that was five questions higher than the students that learned from the textbook, Group A. Students that watched the video had a median score of 17 out of 20. Students that learned the concept from the textbook had a median score of 12 out of 20. 

Is the video a more effective teaching tool or did this outcome just happen by chance? In other words, were most of the students that were better at math just happened to be assigned to the group that had access to the online video? 

One way to visualize the likelihood that this might've happened by chance is by taking all 200 test scores and then randomly assigning them to two different groups of 100. But if we randomize the 200 test scores into two groups we might find that one group of test scores has a median of 15 and the other a median of 14. What happens if we randomize the 200 test scores again? Maybe we'll get one group with a median score of 13 and the other group with a median score of 15. And what happens if we randomize these math test scores a total of a hundred times? 

<img src="images/two_populations_mean_05.png" alt="" style="width: 400px;"/>

For each randomization we record the difference between Group A, of 100 test scores, and Group B, and their hundred test scores. This allows us to visualize the difference between the averages of two randomized groups of these test scores. Suppose that this was the resulting distribution, where the X axis measures the difference between the medians of the two random groups. There are 100 dots on our distribution. Each dot represents the difference between the two group medians for a different randomization of the scores. 

<img src="images/two_populations_mean_06.png" alt="" style="width: 600px;"/>

The result of the experiment found that the group that used the online video had a median score five questions higher than the group that used the textbook. From the distribution chart for our 100 test score randomizations we can see that only on three occasions did the median test score for the online video group, Group B, exceed the median test score for Group A by five questions or more. If our significance level were 5% we can see that our experiment results were significant. According to this distribution chart this outcome is less than 5% likely to have occurred by chance. In fact, it's 3% likely to have occurred by chance. Granted, this was only 100 rerandomizations of the data. And this was a rather limited statistical exercise. But this simple example allowed us to, both, understand and visualize how statisticians test whether an outcome was potentially meaningful or if it may have occurred by chance.

### Set up a confidence interval
In the previous section, we discussed a situation where 200 students were randomly placed into two groups. These students were going to be given a test on a math concept. One group of students was allowed to use an online video to prepare for the exam. The second group used a traditional textbook to prepare for the same exam. The exam had 20 questions. 

For this section, let's provide some updated data. The group that used the online video to prepare ended up with 120 students. This group averaged 16.2 correct questions on the exam. The standard deviation of these test scores was 2.5. The group that used the traditional textbook to prepare for the exam, they had 80 students. The textbook group averaged 14.1 correct questions on the exam. The standard deviation of their scores was 3.6. As we can see, the average score for the online video group was 2.1 correct questions higher than that of the textbook group. 

<img src="images/two_populations_mean_07.png" alt="" style="width: 600px;"/>

This of course is only the difference for these two random groups of students from this one school. If the online video was available to the entire population of students that took this course across the country, what would be the difference in the average exam scores? Well, since we don't have data for every student, `we can use these two samples to create a confidence interval. A confidence interval that would contain the true difference between the population mean score` of students that prepared using the online video versus the population mean score of students that prepared using the textbook. Again, we go back to confidence intervals, and again, we see a familiar formula. Let's begin to fill in the numbers for our math test example. 

<img src="images/two_populations_mean_08.png" alt="" style="width: 600px;"/>

First, we know that the difference in mean scores for these two samples is 2.1. Next, our critical value. Let's say we want to build the typical 95% confidence interval, an interval that excludes 2.5% of the outcomes on either end of the distribution. We go to the Z distribution chart and `we find the appropriate critical score, the one that coincides with 0.9750`. 

<img src="images/two_populations_mean_09.png" alt="" style="width: 600px;"/>

Thus, our critical Z score is the very familiar 1.96. So we have 2.1, we have 1.96, we're just missing our standard error. Since our sample sizes are 80 and 120, `we can utilize the standard deviations of our samples as reasonable estimates for the population standard deviations`. So as you can see here, the standard error for this situation is the standard deviation of the video sample squared, divided by the sample size of the video sample plus the standard deviation of the textbook sample squared, divided by the sample size of the textbook sample. And once we have the sum, we take the square root. 

<img src="images/two_populations_mean_10.png" alt="" style="width: 300px;"/>

In this case, the standard deviation of the video sample is 2.5, and this sample size is 120, and the standard deviation of the textbook sample is 3.6 and this sample size is 80. Once we plug in our numbers, we can find the standard error for this problem. We get a standard error of about 0.462. 

<img src="images/two_populations_mean_11.png" alt="" style="width: 300px;"/>

<img src="images/two_populations_mean_12.png" alt="" style="width: 200px;"/>

And now we can calculate the limits of our confidence interval. Our upper and lower limits are 2.1 plus our critical value 1.96, times our standard error, 0.462. So we get an upper limit of 3.01. We get a lower limit of 1.19. 

<img src="images/two_populations_mean_13.png" alt="" style="width: 400px;"/>

So what does this mean? Well it means that we are 95% confident that the online video helps improve the exam scores for this particular test by at least 1.19 questions, and perhaps as high as 3.01 questions. Remember, we're only 95% confident, but since our lower limit is 1.19, we're pretty confident that the online video will improve the average test score of the population by at least one question versus the average score of this population if the students instead use the textbook to study.

###


<img src="images/two_populations_mean_13.png" alt="" style="width: 400px;"/>