## Q3: Are you faster than numpy?

Numpy of course has a standard deviation function, `np.std()`, but here we'll write our own that works on a 1-d array (vector).  The standard
deviation is a measure of the "width" of the distribution of numbers
in the vector.

Given an array, $a$, and an average $\bar{a}$, the standard deviation
is:
$$
\sigma = \left [ \frac{1}{N} \sum_{i=1}^N (a_i - \bar{a})^2 \right ]^{1/2}
$$

Write a function to calculate the standard deviation for an input array, `a`:

  * First compute the average of the elements in `a` to define $\bar{a}$
  * Next compute the sum over the squares of $a - \bar{a}$
  * Then divide the sum by the number of elements in the array
  * Finally take the square root (you can use `np.sqrt()`)
  
Test your function on a random array, and compare to the built-in `np.std()`. Check the runtime as well.

In [13]:
import numpy as np
import random
import time

def generate_norm_data(n):
    return np.random.randn(n)

data = generate_norm_data(int(1e6)) # 1 mega sample


def compute_std_manual(data):
    sum = 0
    N = len(data)
    for el in data:
        sum += el
    mean = sum / N

    for el in data:
        sum += (el - mean) ** 2
    
    std = np.sqrt(sum / N)

    return std

def compute_std_numpy(data):
    return np.std(data)

t0 = time.time()
print(compute_std_manual(data))
t1 = time.time()
t_manual = t1 - t0
print("time elapsed with custom written function: {} s".format(t_manual))

t0 = time.time()
print(compute_std_numpy(data))
t1 = time.time()
t_numpy = t1 - t0
print("time elapsed with numpy: {} s".format(t_numpy))

print(f"gain: {t_manual / t_numpy}")


0.9991289345905285
time elapsed with custom written function: 0.2362077236175537 s
0.9988160910617235
time elapsed with numpy: 0.0017621517181396484 s
gain: 134.04505479637396


## Q2: Histograms

Here we will read in columns of numbers from a file and create a histogram, using NumPy routines.  Make sure you have the data file
"`sample.txt`" in the same directory as this notebook. You download it from  https://raw.githubusercontent.com/sbu-python-summer/python-tutorial/master/day-3/sample.txt (and use python to download a file!)

  * Use `np.loadtxt()` to read this file in.  

  * Next, use `np.histogram()` to create a histogram array.  The output returns both the count and an array of edges.
  
  * Finally, loop over the bins and print out the bin center (averaging the left and right edges of the bin) and the count for that bin.

In [40]:
# Download the file sample.txt from the given URL
import requests
r = requests.get("https://raw.githubusercontent.com/sbu-python-summer/python-tutorial/master/day-3/sample.txt")
with open("sample.txt", "wb") as f:
    f.write(r.content)


# load the txt
data = np.loadtxt("sample.txt")

# create a histo udsinng np.histogram
hist, bins = np.histogram(data, bins=15)


#print bin centers
for n, lo, hi in zip(hist, bins[:-1], bins[1:]):
    # as an additional exercise, I print a number of * proportional to the number of elements in the bin
    print(f"{lo + (hi - lo) / 2 :+.4f}"[:5].ljust(5, '0'), f"{"*" * n}")
    



-26.2 ****
-17.6 ***
-8.99 **********************
-0.35 ***********************************
+8.28 ********************************
+16.9 **********************
+25.5 ***********
+34.2 *********
+42.8 **********
+51.4 ********
+60.1 *********
+68.7 *********
+77.4 ********
+86.0 *********
+94.6 *********
