# Power and Effect Size Warmup

![](viz/power.gif)

## Conceptual Writing

![](viz/writing.jpg)

#### 1) What is effect size, and what is its relationship to p-values and significance?  

#### 2) What are three elements that affect power, and how do they do so?

In [1]:
'''
1)
Effect size is the magnitude of the difference between two distributions.  While 
p-values highlight significant evidence towards whether there is a difference, 
effect size determines what the magnitude of that significance actually is.

THEIR SOLUTION:
Effect size quantifies the difference between two values under scrutiny.  We want to have a measure of this separate from
p-value, because p-value increases as sample size increases.  

In other words, in some sense a p-value can be specified with a large enough sample size.  We want to be able to quantify
a difference between two values even if they are deemed "significant" in order to help mitigate this effect.

2)
Power is affected by the chosen alpha value (prob of type I error), the effect size
and the sample size.  Since alpha and beta (prob of type II error) are closely related,
and beta determines power (since power = 1 - beta), reducing alpha will also reduce
power.  Sample size also has an effect on power - a small sample size will result in
a smaller power and visa versa.  Effect size influences power by..... I'm not sure 
actually...

THEIR SOLUTION:
alpha - the point at which we deem a test statistic to be significantly different from chance that it provides evidence against
the null hypothesis.  This relates to power in that power is the rate at which we reject the null hypothesis when it is false,
and so alpha is the "starting point" at which we determine whether the null hypothesis should be counted as "false"

sample size - the larger the sample size, the more power a given test has.  With an increased number of points, a measurement
erroneously accepting the null when it should be rejected is less likely.

effect size - as a measurement of how "different" two samples are, this reflects the underlying reality of whether or not 
a null hypothesis should be rejected.  Since power is a measure of how frequently the null hypothesis is accepted when it should
be rejected, a smaller effect size means a higher power is required to accurately reject null hypotheses at a given rate.

'''

'\n1)\nEffect size is the magnitude of the difference between two distributions.  While \np-values highlight significant evidence towards whether there is a difference, \neffect size determines what the magnitude of that significance actually is.\n\nTHEIR SOLUTION:\nEffect size quantifies the difference between two values under scrutiny.  We want to have a measure of this separate from\np-value, because p-value increases as sample size increases.  \n\nIn other words, in some sense a p-value can be specified with a large enough sample size.  We want to be able to quantify\na difference between two values even if they are deemed "significant" in order to help mitigate this effect.\n\n2)\nPower is affected by the chosen alpha value (prob of type I error), the effect size\nand the sample size.  Since alpha and beta (prob of type II error) are closely related,\nand beta determines power (since power = 1 - beta), reducing alpha will also reduce\npower.  Sample size also has an effect on powe

## Calculations 

![](viz/nine_thousand.gif)

# Note:

This is an exercise in demonstrating how power affects the success of a test, and does not represent "best practices"

See more at the [end of the exercise](#A-note-on-best-practices)

## A tiff has broken out in the media between city construction workers and human services departments

A federal construction agency believes there's too much bloat in city human services departments.  They point to generally more "junior positions" in construction than in human services, even though there are about the same number of positions total.  

Your task, as a scrappy young member of the Seattle `Human Services` department, is to de-fang that argument.

You consider that a good counter-argument would be that, even though there are more "junior" positions in construction than human services, the construction jobs pay more.

### Imports

In [None]:
#Run cell as-is

#data manip
import numpy as np 
import pandas as pd

#stats
from scipy import stats
from statsmodels.stats.power import TTestIndPower

### Import data from data folder

You know what data this is

In [None]:
#your code here

#### First: is the argument accurate about Seattle?  How does the % of "junior" positions in `Construction & Services` compare to that in the `Human Services Department`?

- Find all the jobs that are "senior" by selecting those that have "Sr" as the last two characters in `job_title`

- Create a dataframe of jobs in `Construction & Services` that are not "senior" jobs

- Create a dataframe of jobs in `Human Services Department` that are not "senior" jobs

- Calculate the %age of "junior" jobs in `Construction & Services` to see if it's numerically smaller than in the `Human Services Department`

In [None]:
#Your code here

#### So Seattle would make a good test case to see if human services "junior" jobs pay worse on average than construction jobs

#### What is our null and alternative hypothesis?

In [216]:
'''
Write hypotheses here
'''

#### To find evidence rejecting the idea there are no differences, find the sample size needed to generate an independent t-test w/ power = .8 and $\alpha$ = .05

In [189]:
#Your code here

#### If there are enough obs, sample that sample size from each of the `Construction & Inspections` and `Human Service Dept` junior employees

use `random_state=33` so we all get the same employees

In [229]:
#Your code here

#### Calculate a statistical test to determine whether to accept or reject the null hypothesis

- Determine what kind test is most appropriate

- Calculate

- Accept or reject the null?

In [231]:
#Your code here

#### Hm, you think

Maybe it is accurate to reject the null hypothesis, and our test lacks sufficient power to pick it up

#### Calculate, using 100 different samples (w/ `random_state`$\in$[0:99]), what %age of t-tests we would expect to see as not providing evidence sufficiently different from chance **when we should, in fact, reject the null**

In [None]:
#Your code here

#### Ah, we chose a power level admitting 20% of tests which incorrectly fail to reject the null, and our sample was part of the "unlucky" 20%

#### What sample size do we need for a power level of .99?  ($\alpha$ remains at .05)

In [None]:
#your code here

#### Sample that number and re-calculate a test (`random_state`==33)

In [None]:
#Your code here

#### With a test of sufficient power, what evidence have we found that allows us to counter the argument that, since there are a higher %age of "junior" workers in `Construction & Inspection` compared to `Human Services`, that `Construction & Inspection` workers are underpaid?

In [None]:
'''
Your answer here
'''

# A note on best practices

It is not often that we are able to continually sample in order to figure out a test of sufficient power.  Often, that decision must be made *in order to sample*, and you get one shot.

Additionally: adjusting a sample-size to get a test of sufficient power without making other adjustments can be a form of [p-hacking](https://www.textbook.ds100.org/ch/18/hyp_phacking.html)