# NumPy/SciPy

## Math Functions
Create a program that does the following:  
a. Create an ndarray containing 1,000,000 random numbers.  
b. Calculate the mean, median, mode, and standard deviation of the array. Compare the time spent running mean and standard deviation, vs. the previously implemented versions (from level 1). Is there a significant speedup? Why?  
c. Calculate the 10,20,30,…100 quantiles of the array.  

In [1]:
# a. Create an ndarray containing 1,000,000 random number
import numpy as np

arr = np.random.random(1000000)
arr.size

1000000

In [11]:
# b.
from IPython.core.interactiveshell import InteractiveShell

InteractiveShell.ast_node_interactivity = "all"

# Mean
print(f'Mean = {np.mean(arr)}')

# Median
print(f'Median = {np.median(arr)}')

# Standard Deviation
print(f'Stdev = {np.std(arr)}')

# Timer Comparison
from utils.timer import timer
import random
import statistics
import math

num = [random.random() for i in range(1000000)]  # Generate a list of 1,000,000 random numbers

# Return mean of a list
def mean(list):
    return sum(list)/ len(list)

# Return variance of a list
def variance(list):
    mean_list = mean(list) # Calculate mean
    l_temp = [(i - mean_list)**2 for i in list] # Initialize list of (x - mean)**2
    return sum(l_temp) / len(list)


# Benchmarking mean method with NumPy and regular method
with timer('Numpy Mean timer') as t:
    np.mean(arr)

with timer('Regular mean timer') as t:
    mean(num)
    
# Benchmarking median method with NumPy and regular method
with timer('Numpy Median timer') as t:
    np.median(arr)

with timer('Regular median timer') as t:
    statistics.median(num)

# Benchmarking standard deviation method with NumPy and regular method
with timer('Numpy Std timer') as t:
    np.std(arr)

with timer('Regular Std timer') as t:
    math.sqrt(variance(num))

Mean = 0.4999289634736707
Median = 0.49981662173658836
Stdev = 0.28869552741184323


0.4999289634736707

Numpy Mean timer: 0.005999565124511719 seconds.


0.49997562020379577

Regular mean timer: 0.03200125694274902 seconds.


0.49981662173658836

Numpy Median timer: 0.023998260498046875 seconds.


0.4996467030304417

Regular median timer: 0.42100024223327637 seconds.


0.28869552741184323

Numpy Std timer: 0.01299905776977539 seconds.


0.2887329351669634

Regular Std timer: 0.404987096786499 seconds.


Remarks for b.

There is a significant speed improvement using NumPy compared to regular Python function because NumPy arrays are homogeneous, causing it to free up memory faster. Python lists are flexible--able to store multiple types--at the cost of speed and memory.

In [13]:
# Calculate the 10,20,30,…100 quantiles of the array.

count = 10
while not count > 100:
    print(f'The {count} quantiles of the array is {np.percentile(arr, 100/count)}')
    count += 10

The 10 quantiles of the array is 0.09971665566488043
The 20 quantiles of the array is 0.0497564673287291
The 30 quantiles of the array is 0.033142415663113454
The 40 quantiles of the array is 0.024792504291286425
The 50 quantiles of the array is 0.019866456510269623
The 60 quantiles of the array is 0.01648277018458167
The 70 quantiles of the array is 0.01416989241981835
The 80 quantiles of the array is 0.012381130698864366
The 90 quantiles of the array is 0.01099395858143829
The 100 quantiles of the array is 0.00987336794430954
