# Contingency Tables and Chi-Square Test for Independence

<div class="alert alert-success">Learning Goals:</div>

1. Understand the concept and applications of contingency tables.
2. Learn how to create and interpret contingency tables.
3. Explore the chi-square test for independence.
4. Utilize Python code examples to perform analysis on contingency tables.

## Introduction
Contingency tables, also known as cross-tabulation or crosstab, are used to summarize and examine the relationship between two categorical variables. They display the frequency distribution of the variables across different categories.

## Applications of Contingency Tables
Contingency tables are commonly used in various scenarios, including:
- Analyzing survey data.
- Investigating association between variables.
- Assessing the distribution of data across categories.

## Creating and Interpreting Contingency Tables
A contingency table organizes data into rows and columns, representing the categories of the two categorical variables being studied. The cells of the table contain the frequency or count of observations that fall into each combination of categories.

Contingency tables allow us to observe patterns and relationships between the categorical variables. We can examine the distribution of frequencies across the cells to identify any associations or dependencies.

## Chi-Square Test for Independence
The chi-square test for independence is used to determine whether there is a statistically significant association between two categorical variables. It compares the observed frequencies in a contingency table with the frequencies that would be expected if the variables were independent.

In [4]:
import numpy as np
import scipy.stats as stats

# Example data
observed = np.array([[15, 5],
                     [3, 16]])

# Perform chi-square test for independence
chi2, p_value, dof, expected = stats.chi2_contingency(observed)
print("Chi-Square Test Results:")
print("Chi-square statistic:", round(chi2, 2))
print("p-value:", round(p_value, 2))
print("Degrees of Freedom:", round(dof, 2))
print("Expected Frequencies:")
print(expected)

Chi-Square Test Results:
Chi-square statistic: 11.47
p-value: 0.0
Degrees of Freedom: 1
Expected Frequencies:
[[ 9.23076923 10.76923077]
 [ 8.76923077 10.23076923]]


## Interpreting Chi-Square Statistic and p-value
- Chi-square statistic: Measures the overall association between variables.
- p-value: Determines the statistical significance of the association.
- If the p-value is below a predetermined significance level (e.g., 0.05), we reject the null hypothesis of independence and conclude that there is evidence of an association.