In [1]:
import saspy
my_session = saspy.SASsession()

Using SAS Config named: oda
Pandas module not available. Setting results to HTML
SAS Connection established. Subprocess id is 286211



# Set 6
**1. What properties did we discuss that make an estimator good? What do each of those properties mean?**

"Ideally, an estimator would have a few properties"
- Be **unbiased**: on average, the estimator gives the (true) population value
- Be **consistent**: as the sample size grows, the estimator should generally get closed and closer to the (true) population value
- Have low variance/a small standard error: the estimator should have as little variation from dataset to dataset as possible
- Ideally, we can determine (at least approximately) the pattern (distribution) in which we observe the estimator
  - $\bar{Y}$ is a common estimator of the population mean, $\mu$
  - The Central Limit Theorem tells us that we often observe $\bar{Y}$ in a _normal_ distribution pattern

**2. What is the idea of method of moments estimation?**

Given a random sample, method of moments utilizes averages and population averages (expected values known as "moments") to create estimators.
- Usually easy to find
- Often are consistent
- Distribution of the created estimator may be difficult to determine though

Recap
- Provides a straightforward way to create an estimator
- Equate "sample" moments to "population" moments and solve
- Generally, good properties (but not as good as MLEs)

For a Binomial/Bernoulli MOM, set the sample average equal to the population average.

**3. What is the idea of maximum likelihood estimation?**

Maximum likelihood uses the assumed curve to find the "most likely" values of the parameters to produce the data we see.
- Mathematically more difficult
- ML estimators are generally consistent
- Distribution of ML estimators can often be approximated with a normal curve

**4. If we assume our data is a random sample from a normal population, what is the estimator of the mean? What is the estimator of the standard deviation? Is this the estimator you expect?**

$\bar{Y}$ is a common estimator of the population mean, $\mu$. The Central Limit Theorem tells us that we often observe $\bar{Y}$ in a normal distribution pattern.


**5. What is the central limit theorem and why is it useful?**

The CLT states that, for a random sample (iid) of size $n$ from a population with mean $\mu$ and variances $\sigma^2$, a good approximation to the distribution of the sample mean in "large" samples is 

$$
\bar{Y} \sim N(\mu, \frac{\sigma}{\sqrt{n}}) \\
\text{with } Z = \frac{\bar{Y} - \mu}{\frac{\sigma}{\sqrt{n}}}
$$

**6. What is the goal of a confidence interval?**

CI = range of values we are confident contain the true parameter (e.g. mean)
- Often use 95% confidence
  - Implies if we took 100 samples, each of the appropriate sample size, about 95 of intervals created would contain the population mean

**7. Suppose we find a random sample of 24 business and find a 95% confidence interval for the average number of employees to be 24.4 to 42.6. We would say we are 95% confident the average number of employees for our population of businesses is between 24.4 and 42.6. What is meant by confidence?**

In repeated samples, 95% of intervals created this way would capture the true average number of employees for our population of businesses.

**8. Is the confidence interval form point estimate +/- MOE always appropriate?**


**9. What is a confidence interval that can be used for the population mean in large samples? What interval can be used for small samples and what assumptions are needed?**


**10. How do we check an assumption of normality?**


**11. What is meant by the term two-sample t-interval? What are the two versions of this and which should we use generally?**

Approximate confidence interval for $\mu_{1}-\mu_{2}$.

- The samples must be independent of one another.
- This is an exact interval (sort of).
- If the sample size is large, you can replace the t distribution value with a Z value.
- The degrees of freedom are estimated by software for the unequal variance case.
- This might be referred to as a “two-sample t-interval.”

Two versions - with and without equal variance. Generall,y use the unequal variance test.

<center><img src="two_sample_t_interval.png" style="width:800px"/></center>

# Set 8
**1. What is the goal of a hypothesis test?**

We set up competing hypotheses (statements or claims about population parameters) to find evidence from our data to refute one of our hypotheses. 

Make claims about our population and see if our data refutes them. 

**2. What are the two conclusions we can make when doing a hypothesis test?**

We can reject the null in favor of the alternative or fail to reject the null.

Null hypothesis = "status quo", usually the one that says there is no difference, no new thing going on above and beyond what we already know. 

Alternative hypothesis: has a claim about the parameter that we would like to support or conclude. Assume that the null hypothesis is in effect and look at the data we obtained in an attempt to find evidence against that null hypothesis. 

**3. What are the two types of errors we can make? What symbols do we use for their probabilities?**
1. We can reject the null, $H_{0}$, when the null is actually true. This is a Type I error. The probability of making a Type I error is called $\alpha$. 
2. Fail to reject the null when the alternative is true. This is a Type II error. The probability of making a Type II error is called $\beta$. 

**4. What is meant by ‘controlling alpha’ in a hypothesis test? How does this relate to not being able to accept the null hypothesis?**

A Type I error is considered to be the worst, so we set the probability of making a Type I error (i.e. control $\alpha$) to something very small. By doing so, we make sure that this type of error will only happen a very small number of times. 

We set up $\alpha$ _assuming the null hypothesis is true_, so we're assuming the null is true when we're doing our calculations. Therefore, there is no way we can _accept_ the null hypothesis; we assumed it was true and did our calculations. Just because we got something that makes sense with what we assumed does not mean that our assumption was true (it just means we do not have evidence against it).  

**5. What is a p-value? How can we use it to make a decision about our hypotheses?**

p-values are one of the ways we make our decisions and one of the ways we give evidence against our null hypothesis. By definition, the p-value is the probability of seeing a sample result as extreme or more extreme than what we actually got _if the null hypothesis is true_. Everything we do with a p-value assumes the null is true. 

If we get a very, very small p-value, it means getting something like what we got (or worse) is very, very unlikely if the null is true. Very, very small p-values give evidence against the null hypothesis assumption that we made. Thus, we can make a rule that we will only reject the null hypothesis if the p-value is very small relative to $\alpha$.  

**6. What is power in hypothesis test?**

This is related to the Type II error rate. Power is the probability that we reject $H_[0}$ when $H_{A}$ is true (i.e. the probability that we reject the null when we should). We want to have a low $\alpha$, low $\beta$, and high power.

$$
Power = 1 - \beta
$$

We care most about controlling $\alpha$ because a Type I error is the most serious. To manipulate $\beta$ (or power), we usually change the sample size. As we increase the sample size, we should have a larger and larger power. 

**7. What is meant by the term two-sample t-test? Paired t-test? Z-test?**

Two-sample t-test means we have 2 populations or 2 samples that we want to compare, and we're using a t-based test (i.e. something that relies on a t-distribution) to do the comparison and to make conclusions about those two samples. With a two-sample t-test, we _look at the difference in means_ and we use a test statistic that follows a t-distribution. The two samples are
- independent of one another
- each sample comes from a normally distributed population

A paired t-test is the case where we're making the inference for means, but _we have paired data_. We still have 2 samples, but the samples are on the same units. These are no longer independent samples. We look at column 1 minus column 2 and we get a column of differences. We then do a t-test (something that uses a t-statistic) on that column to make an inference. The inferential objective is the average of the differences. The assumption we need is 
- the distribution of the differences is normally distributed

Z-test is more broad than the above two and very, very common. It corresponds to any test statistic or any test that relies on normality (e.g. test for sample proportion, sample mean)  