
## 1. **Descriptive vs Inferential Statistics**

* **Descriptive statistics** summarize data (mean, median, standard deviation, graphs).
* **Inferential statistics** allow us to make **predictions or decisions** about a population from a sample using probability theory.


---

## 2. **Sampling & Central Limit Theorem (CLT)**

* **Sampling distributions**: If we take many random samples and calculate a statistic (e.g., sample mean), it forms its own distribution.
* **Central Limit Theorem**: Regardless of the original population distribution, the distribution of the sample means approaches a **normal distribution** as sample size increases (n ‚â• 30 typically).

üìå This is the foundation for hypothesis testing and confidence intervals.

---

## 3. **Confidence Intervals (CI)**

* A **confidence interval** gives a range of values where the true population parameter is likely to fall.
* Example: ‚ÄúWe are 95% confident that the average weight of students lies between 58 and 62 kg.‚Äù

Formula (for mean when œÉ known):
$[
CI = \bar{x} \pm Z \times \frac{\sigma}{\sqrt{n}}
]$

---

## 4. **Hypothesis Testing**

Used to make decisions about a population using sample data.

* **Steps**:

  1. State hypotheses:

     * Null hypothesis $(H_0)$: no effect, no difference.
     * Alternative hypothesis $(H_1)$: effect exists.
  2. Choose significance level (Œ± = 0.05).
  3. Compute test statistic (t, z, œá¬≤, F).
  4. Compare p-value with Œ±.
  5. Make decision (reject or fail to reject $(H_0)$).

* **Examples**:

  * Z-test (large samples, known variance).
  * T-test (small samples, unknown variance).
  * Paired vs independent samples t-tests.

---

## 5. **Types of Statistical Tests**

* **Parametric tests** (assume normality, equal variances):

  * t-test (compare means).
  * ANOVA (compare means of 3+ groups).
  * Pearson correlation (linear relationship).
* **Non-parametric tests** (no strong assumptions):

  * Mann-Whitney U test.
  * Kruskal-Wallis test.
  * Spearman correlation.

---

## 6. **Analysis of Variance (ANOVA)**

* Used to test differences among **three or more groups**.
* Example: ‚ÄúDoes average exam score differ by teaching method?‚Äù
* Variability is split into:

  * **Between-group variance** (differences due to treatment).
  * **Within-group variance** (random error).
* F-ratio = Between variance √∑ Within variance.

---

## 7. **Chi-Square Tests**

* For categorical data.
* **Goodness of fit test**: Does observed frequency match expected?
* **Test of independence**: Are two categorical variables independent?

Example: ‚ÄúIs gender independent of preferred social media platform?‚Äù

---

## 8. **Correlation & Regression**

* **Correlation**: Strength & direction of relationship between two variables.

  * Pearson (linear, continuous).
  * Spearman (monotonic, ranked).
* **Simple Linear Regression**: Predict (Y) from (X).
  $[
  Y = \beta_0 + \beta_1 X + \epsilon
  ]$
* **Multiple Regression**: Multiple predictors.

---

## 9. **Effect Size & Power**

* **Effect size**: Magnitude of difference or relationship (Cohen‚Äôs d, R¬≤).
* **Statistical power**: Probability of correctly rejecting $(H_0)$. Higher with larger sample size, bigger effect, lower noise.

---

## 10. **Intermediate Applications**

* Comparing two medicines (t-test).
* Comparing exam scores across schools (ANOVA).
* Relationship between study hours and GPA (regression).
* Association between gender and job type (chi-square).

---

# üêç Python Examples



In [1]:
import numpy as np
import scipy.stats as stats

In [2]:
# Example 1: One-sample t-test
data = [12, 15, 14, 10, 13, 17, 14]
t_stat, p_val = stats.ttest_1samp(data, 12)
print("T-statistic:", t_stat, "P-value:", p_val)

T-statistic: 1.8682571063385824 P-value: 0.1109465553113496


In [3]:
# Example 2: Independent samples t-test
group1 = np.random.normal(50, 10, 30)
group2 = np.random.normal(55, 10, 30)
t_stat, p_val = stats.ttest_ind(group1, group2)
print("T-test between groups:", t_stat, p_val)

T-test between groups: -3.2838906026099455 0.001738585145597337


In [4]:
# Example 3: Chi-square test
observed = np.array([[10, 20], [30, 40]])
chi2, p, dof, expected = stats.chi2_contingency(observed)
print("Chi-square:", chi2, "p-value:", p)

Chi-square: 0.4464285714285714 p-value: 0.5040358664525046
