# Introduction to Statistics

Data Science is not just about coding; it's about **proving** things. Statistics provides the mathematical tools to say "This result is real, not just random luck."

## Learning Objectives
- **Descriptive vs. Inferential**: Knowing the difference between describing data you *have* vs. guessing about data you *don't have*.
- **The T-Test**: A famous test to check if two groups are actually different (e.g., "Does the new medicine work better than the old one?").
- **Correlation**: Measuring how strongly two things are related (e.g., "Do taller people weigh more?").

In [None]:
import numpy as np
from scipy import stats

# Generate two groups of data
group_a = np.random.normal(100, 10, 30)  # Mean 100
group_b = np.random.normal(110, 10, 30)  # Mean 110

## 2. The T-Test (Hypothesis Testing)

Imagine you have two groups of students. Group A used a new textbook, and Group B used the old one. Group A got slightly higher scores. Is the new book actually better, or was it just luck?

The **T-Test** answers this. It gives you a **P-Value**:
*   **P-Value < 0.05**: It is very unlikely to be luck. The difference is **statistically significant**. (We trust it).
*   **P-Value > 0.05**: It could easily be luck. We cannot say for sure that the groups are different.

In [None]:
t_stat, p_val = stats.ttest_ind(group_a, group_b)

print(f"P-Value: {p_val:.5f}")
if p_val < 0.05:
    print("Result: The groups are significantly different.")
else:
    print("Result: Not enough evidence to say they are different.")

## 3. Correlation

Correlation measures the relationship between two variables. It is a number between **-1** and **1**.

*   **1.0 (Perfect Positive)**: As X goes up, Y goes up (e.g., Height and Weight).
*   **0.0 (No Correlation)**: No pattern (e.g., Shoe size and IQ).
*   **-1.0 (Perfect Negative)**: As X goes up, Y goes down (e.g., Speed and Travel Time).

**Warning:** "Correlation does not imply Causation." Just because ice cream sales and shark attacks both go up in summer doesn't mean ice cream causes shark attacks!

In [None]:
x = np.arange(10)
y = x * 2 + np.random.normal(0, 1, 10)  # y is roughly 2x

corr, _ = stats.pearsonr(x, y)
print(f"Correlation: {corr:.2f}")