# Chapter 26 - Comparing Counts

## Goodness-of-Fit

* How closely do a set of observed numbers (counts) across groups fit a "null" model of the expected numbers across those groups.
  * e.g. How closely does the set of numbers fit the proposed model that counts are uniformly distributed across groups?
* **goodness-of-fit** test: a hypothesis test to adress this question

## Assumptions and Conditions

* **Counted Data Condition**: data must be _counts_ for the categories of a categorical variable

### Independence Assumption

* the counts in the cell should be independent of each other
* **Randomization Condition**: the individuals who have been counted should be a random sample from the population of interest

### Sample Size Assumption

* **Expected Cell Frequency Condition**: we should expect to see at least 5 individuals in each cell

## Calculations

The test statistic, called the **chi-square statistic**, is found by adding up the sum of the squares of the deviations between the observed and expected counts divided by the expected counts:

\begin{equation}
\chi^2 = \sum_{\text{all cells}}{\frac{  (Obs - Exp)^2  }{  Exp }}
\end{equation}

* $\chi^2$ refers to a family of sampling distribution models, **chi-square models**
* The family of models differ in the number of degrees of freedom
  * For goodness of fit, $df = n - 1$

## Chi-Square P-Values

## Step-by-Step Example : A Chi-Square Test for Goodness-of-Fit

* Plan: 
  * State what you want to know.
  * Identify the variables and check the W's
* Hypotheses:
  * State the null and alternative hypotheses.  
  * For $\chi^2$ tests, it's usually easier to do that in words than in symbols.
* Model:
  * Make a picture; a bar chart is a good display
  * Think about the assumptions and check the conditions
  * Specify the sampling distribution model
  * Name the test you will use
* Mechanics:
  * Each cell contributes an $\frac{(Obs - Exp)^2}{Exp}$ value to the chi-square sum.
  * Add up these components
  * Determine the P-value
* Conclusion:
  * Link the P-value to your decision.
  * State your conclusion in terms of what the data mean

## The Chi-Square Calculation

1. Find the expected value
2. Compute the residuals
3. Square the residuals
4. Compute the components
5. Find the sum of components
6. Find the degrees of freedom
7. Test the hypothesis

## But I Believe the Model...

* We can never confirm that a theory (i.e. the null hypothesis) is in fact true.

## Comparing Observed Distributions

* **two-way table**
* The z-test for two proportions generalizes to a **chi-square test of homogeneity**

## Assumptions and Conditions

* **Counted Data Condition**: data must be counts
* often aren't interested in generalization, so don't need to check **Randomization Condition** or **10% Condition**
* **Expected Cell Frequency Condition**: expected count in each cell must be at least 5

## Calculations

* degrees of freedom: $df = (rowCount - 1)(columnCount - 1)$

## Step-by-Step Example: A Chi-Square Test for Homogeneity

* Plan: 
  * State what you want to know.
  * Identify the variables and check the W's
* Hypotheses:
  * State the null and alternative hypotheses.  
* Model:
  * Make a picture; a side-by-side or split bar chart is a good display
  * Think about the assumptions and check the conditions
  * Specify the sampling distribution model
  * Name the test you will use
* Mechanics:
  * Show the expected counts for each cell fo the table
  * Put both observed and expected counts in each cell of table (or in separate tables)
  * Calculate $\chi^2$
* Conclusion:
  * State your conclusion in terms of what the data mean
  * Specifically talk about whether the distributions for the groups appear to be different

## Examining the Residuals

* To standardize a cell's residual, we just divide by the square root of its expected value:

\begin{equation}
c = \frac{ (Obs - Exp)  }{ \sqrt{Exp}  }
\end{equation}

* Notice that these **standardized residuals** are just the square root of the **components** we calculated for each cell, and their sign indicates whether we observed more, or fewer, cases than we expected.

## Independence

* looks at data that categorizes subjects from a single group on two categorical variables
* **contingency tables** categorize counts on two (or more) variables
* calls for a **chi-square test for independence**

## Assumptions and Conditions

* expected values must be at least 5 in each cell
* check that data are representative:
  * random sample
  * fewer than 10% of population

## Step-by-Step Example: A Chi-Square Test for Independence

* Plan:
  * State what you want to know.
  * Identify the variables and check the W's
* Hypotheses:
  * State the null and alternative hypotheses.
* Model:
  * Make a picture: simple bar chart
  * Think about the assumptions and check the conditions
  * Warning: Be wary of proceeding when there are small expected counts.
  * Specify the model
  * Name the test you will use
* Mechanics:
  * Calculate $\chi^2$
* Conclusion:
  * Link the P-value to your decision
  * State your conclusion about the independence of the variables

## Examine the Residuals

## Chi-Square and Causation

## What Can Go Wrong?

* Don't use chi-square methods unless you have counts
* Beware large samples
* Don't say that one variable "depends" on the other just because they're not independent

## What Have We Learned?

* [p. 653-5 P]