<a href="https://colab.research.google.com/github/myroslava-martyniuk/Martyniuk_Assignments/blob/main/9_2_assignment.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Assignment 9.2

> Replace all TODOs with your code. Do not change any other code.

In [5]:
# Do not edit this cell

from typing import List

## Descriptive statistics

In this assignment, we will write the functions to calculate the basic statistics from scratch, not using numpy.

### Task 1

Let's start simple: write a function `mean` that calculates the average of the list.

$$\mu = \frac{{\sum_{i=1}^n x_i}}{{n}}$$

In [6]:
def mean(li: List[float]) -> float:
    sum_of_values = sum(li)
    number_of_elements = len(li)
    return sum_of_values / number_of_elements


assert mean([1., 2., 3.]) == 2.
assert mean([1., 1., 2., 0.]) == 1.

### Task 2

Now let's calculate variance (dispersion). You may use the `mean` function implemented before.

$$V = \frac{{\sum_{i=1}^n (x_i - \mu)^2}}{{n}}$$

In [7]:
def variance(li: List[float]) -> float:
    avg = mean(li)
    squared_deviations = [(x - avg)**2 for x in li]
    return sum(squared_deviations) / len(li)



assert variance([1., 1., 1.]) == 0.
assert variance([1., 2., 3., 4.]) == 1.25

### Task 3

The standard deviation is easy once you get the variance:

$$\sigma = \sqrt{V}$$

In [8]:
def std(li: List[float]) -> float:
    my_variance = variance(li)
    return my_variance**0.5


assert std([1., 1., 1.]) == 0.
assert std([1., 2., 3., 4.]) == 1.25**0.5

### Task 4

**Median**

The median is the middle value in a sorted dataset. If the dataset has an odd number of values, the median is the value at the center. If the dataset has an even number of values, the median is the average of the two middle values.

In [9]:
def median(li: List[float]) -> float:
    sorted_list = sorted(li)

    length = len(sorted_list)

    if length % 2 == 0:
      middle_num = length // 2 - 1
      middle_num2 = length // 2
      median_value = (sorted_list[middle_num] + sorted_list[middle_num2]) / 2
    else:
      middle_num = length // 2
      median_value = sorted_list[middle_num]

    return median_value


assert median([1., 1., 1.]) == 1.
assert median([1., 4., 3., 2.]) == 2.5

## Measure performance

Sometimes, apart from theoretical, algorithmic complexity, it's a good idea to compare the runtime of two algorithms empirically, i.e., run the code many times and time it.

In Python's standard library, we have [timeit](https://docs.python.org/3/library/timeit.html) module that does exactly that.

Let's compare the runtime of your implementations and numpy. Use the provided setup code:

In [10]:
import timeit
import numpy as np
import random

# generate data for tests
setup = '''
import random
import numpy as np

def mean(li) -> float:
    sum_of_values = sum(li)
    number_of_elements = len(li)
    return sum_of_values / number_of_elements

def variance(li) -> float:
    avg = mean(li)
    squared_deviations = [(x - avg)**2 for x in li]
    return sum(squared_deviations) / len(li)

def std(li) -> float:
    my_variance = variance(li)
    return my_variance**0.5

def median(li) -> float:
    sorted_list = sorted(li)
    length = len(sorted_list)

    if length % 2 == 0:
      middle_num = length // 2 - 1
      middle_num2 = length // 2
      median_value = (sorted_list[middle_num] + sorted_list[middle_num2]) / 2
    else:
      middle_num = length // 2
      median_value = sorted_list[middle_num]

arr = np.random.rand(10) * 100
li = [random.random() * 10 for _ in range(100)]
'''
# pass your function to timeit module
funcs = {
    'mean': mean,
    'variance': variance,
    'std': std,
    'median': median,
    'np.mean': mean,
    'np.var': variance,
    'np.std': std,
    'np.median': median,
}

number =1  # Number of repetitions for each timing

timing_results = {}
for name, func in funcs.items():
    stmt = f"{name}(arr.copy())"  # Ensure a fresh copy for each run
    timing = timeit.Timer(stmt, setup=setup)
    result = timing.timeit() / number
    timing_results[name] = result

print("Timing Results (seconds per run):")
for name, time in timing_results.items():
    print(f"{name}: {time:.6f}")


Timing Results (seconds per run):
mean: 3.106380
variance: 9.840023
std: 8.700641
median: 4.303879
np.mean: 7.521307
np.var: 27.551230
np.std: 30.260074
np.median: 25.567060


### Task 5

Complete Python statements to compare your functions to numpy. Use `li` for your function and `arr` for numpy functions.

In [11]:
stmt_mean_custom = 'mean(li)'
stmt_mean_np = 'np.mean(arr)'

stmt_var_custom = 'variance(li)'
stmt_var_np = 'np.var(arr)'

stmt_std_custom = 'std(li)'
stmt_std_np = 'np.std(arr)'

stmt_median_custom = 'median(li)'
stmt_median_np = 'np.median(arr)'

### Task 6

Measure average exec time of your statements with `timeit` module. As your submission, fill out the table with results (rounded to 2 decimal places)

In [12]:
import timeit

timeit.timeit(stmt=stmt_mean_custom, setup=setup, globals=funcs, number=10_000)

0.008831013000076382

Time per 10000 executions, secs

| Func       | Custom | Numpy |
| ---------- | ------ | ----- |
| mean       |        |       |
| var        |        |       |
| std        |        |       |
| median     |        |       |