# Task 4 — Statistics & Practice (Khan Academy / StatQuest)
This notebook implements common statistical functions and demonstrates them with examples and tests.

**Functions implemented**
- `mean(numbers)`
- `median(numbers)`
- `mode(numbers)`
- `variance(numbers)` (population variance)
- `std_dev(numbers)` (population standard deviation)
- `euclidean_distance(x, y)` — supports scalars and lists (vectors)
- `sigmoid(x)`

Run each code cell (Shift+Enter) to see outputs and tests.


In [None]:
import math
from collections import Counter
https://youtube.com/shorts/MSFALhzwaKs?si=aU6RlK7JnLep6A-S
https://youtube.com/shorts/wiXPhTviQE8?si=TOUzNli7Zi0zWTV-

def mean(numbers):
    """Return the arithmetic mean of a non-empty list of numbers."""
    if not numbers:
        raise ValueError("numbers must be non-empty")
    return sum(numbers) / len(numbers)

def median(numbers):
    """Return the median value. For even-length lists returns the average of the two middle values."""
    nums = sorted(numbers)
    n = len(nums)
    if n == 0:
        raise ValueError("numbers must be non-empty")
    mid = n // 2
    if n % 2 == 0:
        return (nums[mid - 1] + nums[mid]) / 2
    return nums[mid]

def mode(numbers):
    """Return a list of mode(s). If every value appears the same number of times, return None (no mode)."""
    if not numbers:
        raise ValueError("numbers must be non-empty")
    freq = Counter(numbers)
    max_count = max(freq.values())
    modes = [k for k, v in freq.items() if v == max_count]
    # If all values occur equally often, we consider there to be no mode
    if len(modes) == len(freq):
        return None
    return modes

def variance(numbers):
    """Population variance (divide by N)."""
    if not numbers:
        raise ValueError("numbers must be non-empty")
    mu = mean(numbers)
    return sum((x - mu) ** 2 for x in numbers) / len(numbers)

def std_dev(numbers):
    """Population standard deviation."""
    return math.sqrt(variance(numbers))

def euclidean_distance(x, y):
    """If x and y are numbers, return absolute difference.
    If x and y are lists of equal length, return Euclidean distance between vectors."""
    if isinstance(x, (int, float)) and isinstance(y, (int, float)):
        return abs(x - y)
    if isinstance(x, list) and isinstance(y, list):
        if len(x) != len(y):
            raise ValueError("Lists must have the same length")
        return math.sqrt(sum((a - b) ** 2 for a, b in zip(x, y)))
    raise ValueError("Inputs must be both numbers or lists of the same length")

def sigmoid(x):
    """Return the sigmoid of x (1 / (1 + e^-x))."""
    return 1 / (1 + math.exp(-x))


In [None]:
# --- Example usage and tests ---
numbers = [2, 4, 4, 4, 5, 5, 7, 9]

print("numbers:", numbers)
print("Mean:", mean(numbers))
print("Median:", median(numbers))
print("Mode:", mode(numbers))
print("Variance:", variance(numbers))
print("Standard Deviation:", std_dev(numbers))
print("Euclidean Distance (scalars 5,2):", euclidean_distance(5, 2))
print("Euclidean Distance (lists [1,2,3],[4,5,6]):", euclidean_distance([1,2,3], [4,5,6]))
print("Sigmoid(0):", sigmoid(0))
print("Sigmoid(2):", sigmoid(2))

# --- Simple assertions (basic checks) ---
assert abs(mean(numbers) - (sum(numbers) / len(numbers))) < 1e-9
assert median([1,2,3]) == 2
assert mode([1,1,2,3]) == [1]
assert round(variance([1,2,3]), 10) == round(((1-2)**2 + (2-2)**2 + (3-2)**2)/3, 10)
assert abs(std_dev([1,2,3]) - math.sqrt(variance([1,2,3]))) < 1e-9
assert euclidean_distance(3, 7) == 4
assert abs(euclidean_distance([0,0], [3,4]) - 5) < 1e-9
assert abs(sigmoid(0) - 0.5) < 1e-9

print("\nAll example prints shown above and basic assertions passed.")


### What did you learn from these formulas?

- **Mean**: the arithmetic average — sensitive to outliers.
- **Median**: the middle value — robust to outliers, useful with skewed data.
- **Mode**: most frequent value(s) — useful for categorical data and to find repeated values.
- **Variance**: average squared deviation from the mean — measures spread but in squared units.
- **Standard Deviation**: square root of variance — spread in original units, easier to interpret.
- **Euclidean Distance**: geometric distance in 1D or nD — widely used in clustering and similarity.
- **Sigmoid**: maps real numbers to (0,1) — helpful for probabilities and binary classification.

These building blocks are fundamental for statistics, exploratory data analysis, and for many machine learning algorithms.
