# ðŸ§ª Statistics with Python: Comprehensive Guide

This tutorial covers the foundational building blocks of data science and statistics using Python.

---

## 1. Probability Theory & Definitions

**Probability** is the measure of the likelihood that an event will occur.

### Core Concepts:
- **Experiment**: A process that leads to one of several possible outcomes (e.g., rolling a die).
- **Sample Space (S)**: The set of all possible outcomes. For a die, $S = \{1, 2, 3, 4, 5, 6\}$.
- **Event (A)**: A specific outcome or set of outcomes. 

### Formula:
$$P(A) = \frac{n(A)}{n(S)}$$

### Example: Probability of rolling an even number
- Favorable outcomes (A): $\{2, 4, 6\} \rightarrow n(A) = 3$
- Total outcomes (S): $\{1, 2, 3, 4, 5, 6\} \rightarrow n(S) = 6$
- $P(\text{Even}) = 3/6 = 0.5$

In [ ]:
import numpy as np

# Simulating the probability of an even roll
trials = 100000
rolls = np.random.randint(1, 7, trials)
evens = np.count_nonzero(rolls % 2 == 0)

print(f"Total Trials: {trials}")
print(f"Even Rolls: {evens}")
print(f"Probability Estimate: {evens/trials:.4f}")

## 2. Arrays & Data Structures

In statistics, we use **Arrays** (NumPy) and **DataFrames** (Pandas) to store and manipulate datasets.

### NumPy Arrays:
Efficient for numerical computations and vector operations.

In [ ]:
import numpy as np

# Creating a 1D Array (Vector)
scores = np.array([85, 90, 78, 92, 88, 76, 95])
print("Scores Array:", scores)

# Vectorized Operation: Adding 5 bonus points to everyone
bonus_scores = scores + 5
print("Bonus Scores:", bonus_scores)

### Pandas DataFrames:
Perfect for tabular data (like Excel/CSV).

In [ ]:
import pandas as pd

data = {
    'Student': ['Alice', 'Bob', 'Charlie', 'David'],
    'Math': [85, 70, 95, 80],
    'Science': [90, 88, 75, 82]
}

df = pd.DataFrame(data)
print("Student Records:")
print(df)

## 3. Basic Descriptive Statistics

These are numbers used to summarize and describe the main features of a dataset.

### Measures of Central Tendency:
1. **Mean**: The average value.
2. **Median**: The middle value when sorted.
3. **Mode**: The most frequent value.

### Measures of Dispersion:
1. **Variance**: How far data points are from the mean.
2. **Standard Deviation**: The square root of variance (gives the spread in original units).

In [ ]:
# Statistical Analysis with Pandas
math_scores = df['Math']

print(f"Mean Score: {math_scores.mean()}")
print(f"Median Score: {math_scores.median()}")
print(f"Standard Deviation: {math_scores.std():.2f}")
print(f"Max Score: {math_scores.max()}")

print("\nFull Summary Statistics:")
print(df.describe())