# **Problem Statement**  
## **2. Manually calculate mean, median, mode, variance, and standard deviation.**

### Problem Statement
Manually calculate the following statistical measures for a given numerical dataset:
- Mean
- Median
- Mode
- Variance
- Standard Deviation
The goal is to understand how these metrics are computed from scratch, without relying on built-in statistics libraries.

### Constraints & Example Inputs/Outputs

### Constraints
- Input is a list of numerical values
- Dataset may be unsorted
- Dataset may contain duplicates
- Dataset size ≥ 1

### Example Input:
```python
data = [2, 4, 4, 4, 5, 5, 7, 9]

```

### Expected Output:
```python
Mean = 5.0
Median = 4.5
Mode = 4
Variance = 4.0
Standard Deviation = 2.0

```

### Solution Approach

**Step1: Mean**

Average value of the dataset.
```python
Mean=∑x / n
```
**Step2: Median**

Middle value after sorting:
- Odd length → middle element
- Even length → average of two middle elements

**Step 3: Mode**

Value that appears most frequently.

**Step 4: Variance**

Measures spread of data.

```python
Variance=∑(x−μ)^2 / n
```

**Step 5: Standard Deviation**

Square root of variance.

```python
Std Dev=(Variance)^1/2

```

### Solution Code

In [1]:
# Approach1: Brute Force Implementation

import math
from collections import Counter

def manual_statistics(data):
    n = len(data)
    
    # Mean
    mean = sum(data) / n
    
    # Median
    sorted_data = sorted(data)
    mid = n // 2
    if n % 2 == 0:
        median = (sorted_data[mid - 1] + sorted_data[mid]) / 2
    else:
        median = sorted_data[mid]
    
    # Mode
    freq = Counter(data)
    max_freq = max(freq.values())
    modes = [k for k, v in freq.items() if v == max_freq]
    mode = modes[0] if len(modes) == 1 else modes
    
    # Variance
    variance = sum((x - mean) ** 2 for x in data) / n
    
    # Standard Deviation
    std_dev = math.sqrt(variance)
    
    return mean, median, mode, variance, std_dev


### Alternative Solution

In [2]:
# Approach2: Optimized (Using Numpy)

import numpy as np

def optimized_statistics(data):
    data = np.array(data)
    
    mean = np.mean(data)
    median = np.median(data)
    
    values, counts = np.unique(data, return_counts=True)
    mode = values[np.argmax(counts)]
    
    variance = np.var(data)
    std_dev = np.std(data)
    
    return mean, median, mode, variance, std_dev


### Alternative Approaches

**Brute Force**
- Best for learning & interviews
- Demonstrates understanding of formulas
- Slower for large datasets

**Optimized**
- Uses vectorized operations
- Numerically stable
- Industry-standard

### Test Case

In [3]:
# Test Case1: Standard Dataset
data = [2, 4, 4, 4, 5, 5, 7, 9]
manual_statistics(data)


(5.0, 4.5, 4, 4.0, 2.0)

In [4]:
# Test Case2: Optimized Version
optimized_statistics(data)


(np.float64(5.0),
 np.float64(4.5),
 np.int64(4),
 np.float64(4.0),
 np.float64(2.0))

In [5]:
# Test Case3: Odd-Length Dataset
data = [1, 3, 5, 7, 9]
manual_statistics(data)


(5.0, 5, [1, 3, 5, 7, 9], 8.0, 2.8284271247461903)

In [7]:
# Test Case4: Single Element
data = [1, 3, 5, 7, 9]
manual_statistics(data)


(5.0, 5, [1, 3, 5, 7, 9], 8.0, 2.8284271247461903)

In [8]:
# Test Case 5: Multiple Modes
data = [1, 1, 2, 2, 3]
manual_statistics(data)


(1.8, 2, [1, 2], 0.56, 0.7483314773547883)

### Expected Outputs
- Correct statistical values
- Manual & NumPy outputs closely match
- Handles:
    - Even/odd length
    - Single element
    - Multiple modes

## Complexity Analysis

Let n be the size of the dataset.

### Time Complexity
| Operation        | Complexity |
| ---------------- | ---------- |
| Mean             | O(n)       |
| Median (sorting) | O(n log n) |
| Mode             | O(n)       |
| Variance         | O(n)       |

### Space Complexity

O(n) – due to sorting / frequency map


#### Thank You!!