Skip to content

Commit

Permalink
typos
Browse files Browse the repository at this point in the history
  • Loading branch information
rudeboybert committed Nov 29, 2018
1 parent 7dc1bd4 commit 6376e15
Show file tree
Hide file tree
Showing 2 changed files with 3 additions and 3 deletions.
4 changes: 2 additions & 2 deletions 08-sampling.Rmd
Expand Up @@ -109,7 +109,7 @@ Let's now define some concepts and terminology important to understand sampling,
+ Above Ex: Is $\widehat{p}$ a "good guess" of $p$?
+ In other words, can we *infer* about the true proportion of the balls in the bowl that are red, based on the results of our sample of $n=50$ balls?
1. **Bias**: In a statistical sense, we say *bias* occurs if certain observations in a population have a higher chance of being sampled than others. We say a sampling procedure is *unbiased* if every observation in a population had an equal chance of being sampled.
+ Above Ex: Did each ball, irrespective of color, have an equal chance of being sampled, meaning the sampling was unbiased? We feel since the balls are all of the same size, there isn't any bias in the sampling. If, say, the red balls had a much larger diameter than the red ones then you might have have a higher or lower probability of now sampling red balls.
+ Above Ex: Did each ball, irrespective of color, have an equal chance of being sampled, meaning the sampling was unbiased? We feel since the balls are all of the same size, there isn't any bias in the sampling. If, say, the red balls had a much larger diameter than the white ones then you might have have a higher or lower probability of now sampling red balls.
1. **Random sampling**: We say a sampling procedure is *random* if we sample randomly from the population in an unbiased fashion.
+ Above Ex: As long as you mixed the bowl sufficiently before sampling, your samples of size $n=50$ balls would be random.

Expand Down Expand Up @@ -253,7 +253,7 @@ summary_stats %>%
Finally, it's important to keep in mind:

1. If the sampling is done in an unbiased and random fashion, in other words we made sure to stir the bowl before we sampled, then the sampling distribution will be guaranteed to be centered at the true unknown population proportion red $p$, or in other words the true number of balls out of 2400 that are red.
1. The spread of this histogram, as quantified by the standard deviation of `r summary_stats %>% pull(sd) %>% round(3)`, is called the **standard error**. It quantifies the variability of our estimates for $\widehat{p}$.
1. The spread of this histogram, as quantified by the standard deviation of `r summary_stats %>% pull(sd) %>% round(3)`, is called the **standard error**. It quantifies the uncertainty of our estimates of $p$, which recall are called $\widehat{p}$.
+ **Note**: A large source of confusion. All standard errors are a form of standard deviation, but not all standard deviations are standard errors.


Expand Down
2 changes: 1 addition & 1 deletion 09-confidence-intervals.Rmd
Expand Up @@ -79,7 +79,7 @@ We'll cover the first four scenarios in this chapter on confidence intervals and

* Scenario 2 about means. Ex: the average age of pennies.
* Scenario 3 about differences in proportions between two groups. Ex: the difference in high school completion rates for Canadians vs non-Canadians. We call this a situation of *two-sample* inference.
* Scenario 4 is similar to 3, but its about the means of two groups. Ex: the difference in average test scores for the morning section of a class versus the afternoon section of a class. This another situation of *two-sample* inference.
* Scenario 4 is similar to 3, but its about the means of two groups. Ex: the difference in average test scores for the morning section of a class versus the afternoon section of a class. This is another situation of *two-sample* inference.

In Chapter \@ref(inference-for-regression) on inference for regression, we'll cover Scenarios 5 & 6 about the regression line. In particular we'll see that the fitted regression line from Chapter \@ref(regression) on basic regression, $\widehat{y} = b_0 + b_1 \cdot x$, is in fact an estimate of some true population regression line $y = \beta_0 + \beta_1 \cdot x$ based on a sample of $n$ pairs of points $(x, y)$. Ex: Recall our sample of $n=463$ instructors at the UT Austin from the `evals` data set in Chapter \@ref(regression). Based on the results of the fitted regression model of teaching score with beauty score as an explanatory/predictor variable, what can we say about this relationship for *all* instructors, not just those at the UT Austin?

Expand Down

0 comments on commit 6376e15

Please sign in to comment.