<a href="https://colab.research.google.com/github/olenashte/Git-task/blob/master/%229_2_assignment_ipynb%22.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Assignment 9.2

> Replace all TODOs with your code. Do not change any other code.

In [1]:
# Do not edit this cell

from typing import List

## Descriptive statistics

In this assignment, we will write the functions to calculate the basic statistics from scratch, not using numpy.

### Task 1

Let's start simple: write a function `mean` that calculates the average of the list.

$$\mu = \frac{{\sum_{i=1}^n x_i}}{{n}}$$

In [5]:
def mean(li: List[float]) -> float:
    if len(li) == 0:
        raise ValueError("The list can't be empty!")
    return sum(li) / len(li)


assert mean([1., 2., 3.]) == 2.
assert mean([1., 1., 2., 0.]) == 1.

### Task 2

Now let's calculate variance (dispersion). You may use the `mean` function implemented before.

$$V = \frac{{\sum_{i=1}^n (x_i - \mu)^2}}{{n}}$$

In [7]:
def variance(li: List[float]) -> float:
    if len(li) == 0:
        raise ValueError("The list can't be empty!")

    mu = mean(li)

    squared_differences = [(x - mu) ** 2 for x in li]

    return sum(squared_differences) / len(li)


assert variance([1., 1., 1.]) == 0.
assert variance([1., 2., 3., 4.]) == 1.25

### Task 3

The standard deviation is easy once you get the variance:

$$\sigma = \sqrt{V}$$

In [8]:
def std(li: List[float]) -> float:
    return variance(li) ** 0.5


assert std([1., 1., 1.]) == 0.
assert std([1., 2., 3., 4.]) == 1.25**0.5

### Task 4

**Median**

The median is the middle value in a sorted dataset. If the dataset has an odd number of values, the median is the value at the center. If the dataset has an even number of values, the median is the average of the two middle values.

In [9]:
def median(li: List[float]) -> float:
    sorted_li = sorted(li)
    n = len(sorted_li)

    if n == 0:
        raise ValueError("List cannot be empty!")

    if n % 2 == 1:
        return sorted_li[n // 2]

    else:
        mid1 = sorted_li[n // 2 - 1]
        mid2 = sorted_li[n // 2]
        return (mid1 + mid2) / 2



assert median([1., 1., 1.]) == 1.
assert median([1., 4., 3., 2.]) == 2.5

## Measure performance

Sometimes, apart from theoretical, algorithmic complexity, it's a good idea to compare the runtime of two algorithms empirically, i.e., run the code many times and time it.

In Python's standard library, we have [timeit](https://docs.python.org/3/library/timeit.html) module that does exactly that.

Let's compare the runtime of your implementations and numpy. Use the provided setup code:

In [14]:
import timeit
setup = '''
import random
import numpy as np


arr = np.random.rand(10_000) * 100  # NumPy array
li = [random.random() * 100 for _ in range(10_000)]  # Python list
from __main__ import mean, variance, std, median
'''

# Functions to compare
funcs = {
    'mean': 'mean(li)',
    'variance': 'variance(li)',
    'std': 'std(li)',
    'median': 'median(li)',
    'numpy_mean': 'np.mean(arr)',
    'numpy_var': 'np.var(arr)',
    'numpy_std': 'np.std(arr)',
    'numpy_median': 'np.median(arr)',
}

# Measure execution time
print("Performance Comparison:")
for name, stmt in funcs.items():
    time = timeit.timeit(stmt, setup=setup, number=10)
    print(f"{name:15}: {time:.6f} seconds")

Performance Comparison:
mean           : 0.000509 seconds
variance       : 0.015363 seconds
std            : 0.015712 seconds
median         : 0.021615 seconds
numpy_mean     : 0.001297 seconds
numpy_var      : 0.003516 seconds
numpy_std      : 0.000754 seconds
numpy_median   : 0.009925 seconds


### Task 5

Complete Python statements to compare your functions to numpy. Use `li` for your function and `arr` for numpy functions.

In [16]:
stmt_mean_custom = 'mean(li)'
stmt_mean_np = 'np.mean(arr)'

stmt_var_custom = 'variance(li)'
stmt_var_np = 'np.var(arr)'

stmt_std_custom = 'std(li)'
stmt_std_np = 'np.std(arr)'

stmt_median_custom = 'median(li)'
stmt_median_np = 'np.median(arr)'


### Task 6

Measure average exec time of your statements with `timeit` module. As your submission, fill out the table with results (rounded to 2 decimal places)

In [18]:
import timeit

custom_mean_time = timeit.timeit('mean(li)', setup=setup, number=10_000)
numpy_mean_time = timeit.timeit('np.mean(arr)', setup=setup, number=10_000)

custom_var_time = timeit.timeit('variance(li)', setup=setup, number=10_000)
numpy_var_time = timeit.timeit('np.var(arr)', setup=setup, number=10_000)

custom_std_time = timeit.timeit('std(li)', setup=setup, number=10_000)
numpy_std_time = timeit.timeit('np.std(arr)', setup=setup, number=10_000)

custom_median_time = timeit.timeit('median(li)', setup=setup, number=10_000)
numpy_median_time = timeit.timeit('np.median(arr)', setup=setup, number=10_000)

print("Time per 10,000 executions, secs")
print("Func\tCustom\tNumpy")
print(f"mean\t{round(custom_mean_time, 2)}\t{round(numpy_mean_time, 2)}")
print(f"var\t{round(custom_var_time, 2)}\t{round(numpy_var_time, 2)}")
print(f"std\t{round(custom_std_time, 2)}\t{round(numpy_std_time, 2)}")
print(f"median\t{round(custom_median_time, 2)}\t{round(numpy_median_time, 2)}")


Time per 10,000 executions, secs
Func	Custom	Numpy
mean	0.92	0.19
var	17.07	0.39
std	16.84	0.42
median	18.12	1.32


Time per 10000 executions, secs

| Func       | Custom | Numpy |
| ---------- | ------ | ----- |
| mean       |        |       |
| var        |        |       |
| std        |        |       |
| median     |        |       |