# Variables
- **Explanatory** → Independent  
- **Response Variable** → Dependent  

# Measurement Scale
- **Nominal**: Categorical, no order  
- **Ordinal**: Categorical, with order  
- **Interval**: Meaningful differences, but no true zero  
- **Ratio**: Both meaningful differences and a true zero  



# Tests
- **Chi-Square Test**: Tests categorical variable independence  
- **Linear Regression**: Type matters (relationship, correlation)  
  - Regression slope (β): equation slope ≠ 0  
    - **H₀**: β = 0 No relationship
    - **Hₐ**: β ≠ 0 Relationship exists
    - **Exam notation**: $H₀: β = 0$, $Hₐ: β < 0$
  - Correlation (ρ): Measures strength and direction  
    - **ρ = 1**: Perfect positive correlation
    - **ρ = -1**: Perfect negative correlation
    - **ρ = 0**: No correlation
    - **Exam notation**: $H₀: ρ = 0$, $Hₐ: ρ < 0$
- **One-sample t-test**: Compare sample mean to a value  
- **Z-test**: Compare proportions of two categories $H₀: \pi_1 = \pi_2$, $Hₐ: \pi_1 ≠ \pi_2$

# Hypothesis
- **Null Hypothesis (H₀)**: Assume no effect or difference  
- **Alternative Hypothesis (Hₐ)**: H₀ ≠ Hₐ  

## Regression Hypothesis
- **H₀**: The coefficient (β) of the variable being tested is zero  
- **Hₐ**: β ≠ 0  

- **t-test formula for regression coefficient**:
  - $$
    t = \frac{b \text{ (estimated coefficient)}}{SE \text{ (Standard Error)}}
    $$

- **Determine Critical Value**:
  - Find degrees of freedom  
    - $$
      df = n - (p + 1)
      $$
    - where:
      - *n* = number of observations  
      - *p* = number of explanatory variables: all variables in $\hat{y} = b_0 + b_1x_1 + b_2x_2 + ... + b_px_p$
  - Find critical *t*-value  

- **Compare |t| with t-critical**:
  - If $|t| > t_c$:
    - Reject H₀  
  - If $|t| \leq t_c$:
    - H₀ is valid  

- **P-value**:
  - If **p < α**:
    - Reject H₀  
  - If **p > α**:
    - H₀ is valid  


# Running Another Model on Regression
- **F-test** (If common variables): Compare **full model** with a reduced model(less variable)
- **(If no common variables)**: AIC, BIC, MDL

## P-value Interpretation
Liklihood of observing the data if the null hypothesis is true
- **Low (p ≤ 0.05)** → Reject H₀  
- **High (p > 0.05)** → H₀ valid  

## Hypothesis Test Type
- **False Positive**: H₀ is true, but the test rejects H₀  
- Example: If α = 0.01, then there's a **0.01 probability** of a false positive  


# Showing Randomness
**Z-score Formula:**
$$
Z = \frac{\hat{\pi} - \pi}{\sigma_{\hat{\pi}}}
$$

where:  
- $\hat{\pi}$ = observed occurrences  
- $\pi$ = sample size  
- $\sigma_{\hat{\pi}} = \sqrt{\frac{\pi(1 - \pi)}{n}}$  

# Publication Bias
- Studies with **non-significant p-values** are **less likely** to be submitted  

# Standard Residual Formula
$$
\text{Standard Residual} = \frac{O - E}{SE}
$$

where:  
- **O** = observed value  
- **E** = (row total × column total) / grand total  
- **Standard Error (SE)** = $\sqrt{E \times ( 1 - \frac{\text{row total}}{\text{grand total}}) \times (1 - \frac{\text{column total}}{\text{grand total}})}$
  - **Higher residuals** indicate **significant deviation**  

# Odds Ratio (OR)
- Measures **association between two categorical variables**  
$$
\text{Odds} = \frac{P(\text{event happens})}{P(\text{event does not happen})}
$$
- $\text{OR} = \frac{Pdds Group 1}{Odds Group 2}$ 
  - **OR = 1**: No association
  - **OR > 1**: Group 1 is OR times more likely to have the event  

# Simpson’s Paradox
- **Association between two variables (X & Y) reverses direction when a third variable (Z) is considered**  
- **Best visualized using a scatter plot**

### Observational vs Experimental studies

In observational studies, data is collected only, without the researcher
changing anything about the situation. In experimental studies, the researcher does
an intervention, exposing different groups to different conditions. By randomizing
the groups, the effects of different conditions can be compared. Causal relationships
can therefore only be demonstrated with experimental studies, not observational
studies.

- **Sampling frame**: Complete list or database of all individuals in a population.
