# TECH 2 mandatory assignment - Part B
Solution

First, I need to import my functions, as well as numpy

In [355]:
import numpy as np
from part_A import std_builtin, std_loops
import timeit


Next, I need to read data from `data.csv` file. 
I start by creating 3 lists, `large`, `medium` and `small` and store respectively data in columns to the lists.
Where large is a 10 thousand large list.

In [356]:
import csv
large_list = []
medium_list = []
small_list = []

# read csv file
with open('data.csv', mode='r') as file:
    csv_reader = csv.reader(file)
    for row in csv_reader:
        # append non empty values
        if len(row) > 0 and row[0].strip(): 
            small_list.append(float(row[0].strip()))
        if len(row) > 1 and row[1].strip(): 
            medium_list.append(float(row[1].strip()))
        if len(row) > 2 and row[2].strip():  
            large_list.append(float(row[2].strip()))

In [357]:
assert len(large_list) == 10000 , 'Large list does not contain 10 000 elements'
assert len(medium_list) == 1000 , 'Medium list does not contain 1 000 elements'
assert len(small_list) == 100 , 'Small list does not containa 100 elements'

Afterwards I would calculate standard deviation using numpy, builtin functions and loops on the lists I created above.
I wouls store the results in a matrix, where functions would correspond to rows, and list sizes would be columns. 
So each element of my matrix  

 $a_{ij} = f_i(\text{list}_j)$


In [358]:
functions = [np.std, std_builtin, std_loops]
function_names = ['Numpy std', 'Builtin std', 'Loops std' ]
data_samples = {'small': small_list,
        'medium': medium_list,
        'large': large_list}

std_result_matrix = {size: [func(data) for func in functions] for size, data in data_samples.items()}


I will proceed in a like manner to calculate the running times of each element of my matrix above, 
and I would store the results in a matrix as well. 
Functions would be rows and list sizes would be columns 
and each element would be runtime of a function on a specific list  

$t_{ij} = t_{fi}(\text{list}_j)$


In [359]:
def get_time_matrix(functions, data):
    time_result_matrix = {
    size: [timeit.timeit(lambda: func(data), number=100) for func in functions] for size, data in data_samples.items()
    } 
    return time_result_matrix  

In [360]:
time_result_matrix = {
    size: [timeit.timeit(lambda: func(data), number=100) for func in functions] for size, data in data_samples.items()
}

I wish to see the result matrix, so I would create a function to display elements in the matrix in a table.

In [361]:
def print_matrix_pretty(matrix, header):
    print(header)
    print('-'* len(header))

    for i, func_name in enumerate(function_names):
        row = f'{func_name:<15}'
        for size in data_samples.keys():
            tij = round(matrix[size][i],7)
            row += f'{tij:<15}'
        print(row)    

In [362]:
header = f"{'FUNCTION':<15} {'SMALL':<15} {'MEDIUM':<15} {'LARGE':<15}"

Here I would show standard deviation calculated in a table.

In [363]:
print_matrix_pretty(std_result_matrix, header=header)

FUNCTION        SMALL           MEDIUM          LARGE          
---------------------------------------------------------------
Numpy std      0.2823721      0.2846744      0.2854045      
Builtin std    0.2823721      0.2846744      0.2854045      
Loops std      0.2823721      0.2846744      0.2854045      


Showing running times of different functions on different data sized lists as a table.    
In the table it is already possible to see which functions performs better on different sized lists.


In [364]:
print_matrix_pretty(time_result_matrix, header)

FUNCTION        SMALL           MEDIUM          LARGE          
---------------------------------------------------------------
Numpy std      0.0041403      0.0578373      0.0697238      
Builtin std    0.0043608      0.0531817      0.4206002      
Loops std      0.006523       0.0485195      0.4787263      


In search for how to present data nicely I discovered pandas tables. Let's try and see.

In [365]:
import pandas as pd
df = pd.DataFrame(time_result_matrix, index=function_names)

df.style \
  .format(precision=7, thousands=",", decimal=".") \
  .format_index(str.upper, axis=1) \
  .relabel_index(function_names, axis=0) \
.background_gradient(subset=pd.IndexSlice[:, :], cmap='YlOrRd')

Unnamed: 0,SMALL,MEDIUM,LARGE
Numpy std,0.0041403,0.0578373,0.0697238
Builtin std,0.0043608,0.0531817,0.4206002
Loops std,0.006523,0.0485195,0.4787263



From the table above is already visible which functions performs better for what size of lists.  


Runtimes using `%timeit`. I get more precise results, yet somewhat difficult to read.

In [351]:
%timeit np.std(large_list)
%timeit np.std(medium_list)
%timeit np.std(small_list)

757 μs ± 36.6 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
172 μs ± 77.9 μs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
50.5 μs ± 13.2 μs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)


In [352]:
%timeit std_loops(large_list)
%timeit std_loops(medium_list)
%timeit std_loops(small_list)

5.58 ms ± 896 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
524 μs ± 89.2 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
63.7 μs ± 19.4 μs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)


In [353]:
%timeit std_builtin(large_list)
%timeit std_builtin(medium_list)
%timeit std_builtin(small_list)

5.04 ms ± 1.1 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)
452 μs ± 77.8 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
74.9 μs ± 24.8 μs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
