# Pure Python vs. Numpy - Lab

## Introduction 

Numpy, Scipy and Pandas libraries provide a significant increase in computational efficiency with complex mathematical operations as compared to Python's built in arithmatic functions.  In this lab we shall calculate and compare the processing speed required for calculating a dot product both using basic arithmatic operations in Python and Numpy's `.dot()` method. 

## Objectives
You will be able to:
* Compare the performance of high dimensional matrix operations in Numpy vs. pure Python

In [1]:
import numpy as np
def array_ind(array):
    try:
        import itertools as it
    except NameError as e:
        return "Install itertools"
    sh = array.shape
    length = len(array.shape)
    lst = list(range(max(sh)))
    ind_comb_1 = set(it.combinations_with_replacement(lst, length))
    ind_comb_2 = set(it.permutations(lst*2, length))
    indices = list(ind_comb_1|ind_comb_2)
    idx_final = indices.copy()
    for i in range(length):
        for perm in indices:
            if perm[i] >= sh[i]:
                if perm in idx_final:
                    idx_final.remove(perm)
            idx_final.sort()
    return idx_final   

def create_random_tensor(tensor, range_min, range_max):
    try:
        result = tensor.copy()
        indices = array_ind(tensor)
        for ind in indices:
            result[ind] = np.random.randint(range_min, range_max)
        return result
    except:
        return "Need array_ind to run"
    
def create_random_matrix(zero_matrix, range_min, range_max):
    result = zero_matrix.copy()
    sh = result.shape
    for i in range(sh[0]):
        for j in range(sh[1]):
            result[i,j] += np.random.randint(range_min, range_max)
    return result

## Problem
> **Write a routine to calculate the dot product between two 200 x 200 dimensional matrices using:**

> **a) Pure Python**

> **b) Numpy's `.dot()`**


### Create two 200 x 200 matrices in Python and fill them with random values using `np.random.rand()` 

In [None]:
# Compare 200x200 matrix-matrix multiplication speed
import numpy as np

# Set up the variables
A = create_random_tensor(np.zeros([200, 200]), 1, 10)
B = create_random_tensor(np.zeros([200, 200]), 1, 10)

In [None]:
# A = create_random_matrix(np.zeros([200, 200]), 1, 20)
# B = create_random_matrix(np.zeros([200, 200]), 1, 20)

print(A, B)

### Pure Python

* Initialize an zeros filled numpy matrix with necessary rows and columns for storing result. 
* In Python Calculate the dot product using the formula 
![](formula.png)
* Use Python's `timeit` library to calculate the processing time. 
* [Visit this link](https://www.pythoncentral.io/time-a-python-function/) for an indepth explanation on how to time a function or routine in python. 

**Hint**: Use nested for loop for accessing, calculating and storing each scalar value in the result matrix.

In [25]:
import timeit
SIZE=200
results = np.zeros([SIZE, SIZE])
# Start the timer
start = timeit.default_timer()

# Matrix multiplication in pure Python
for i in range(SIZE):
    for j in range(SIZE):
        for k in range(SIZE):
            results[i,k] += A[i,j]*B[j, k]

time_spent = timeit.default_timer() - start

print('Pure Python Time:', time_spent, 'sec.')

Pure Python Time: 5.423821089090779 sec.


## Numpy 
Set the timer and calculate the time taken by `.dot()` function for multiplying A and B 


In [19]:
# start the timer
start = start = timeit.default_timer()

# Matrix multiplication in numpy
A.dot(B)

time_spent = timeit.default_timer() - start
print('Numpy Time:', time_spent, 'sec.')

Numpy Time: 0.033634579041972756 sec.


In [27]:
results

array([[5111., 5543., 5290., ..., 5210., 5128., 5253.],
       [5499., 5575., 5411., ..., 5473., 5395., 5404.],
       [4982., 5262., 5000., ..., 5083., 4902., 5138.],
       ...,
       [5193., 5772., 5329., ..., 5194., 5271., 5456.],
       [4855., 5111., 4702., ..., 4847., 4908., 4786.],
       [4605., 5239., 4773., ..., 4881., 4844., 4770.]])

In [24]:
A.dot(B)

array([[5111., 5543., 5290., ..., 5210., 5128., 5253.],
       [5499., 5575., 5411., ..., 5473., 5395., 5404.],
       [4982., 5262., 5000., ..., 5083., 4902., 5138.],
       ...,
       [5193., 5772., 5329., ..., 5194., 5271., 5456.],
       [4855., 5111., 4702., ..., 4847., 4908., 4786.],
       [4605., 5239., 4773., ..., 4881., 4844., 4770.]])

In [28]:
results == A.dot(B)

array([[ True,  True,  True, ...,  True,  True,  True],
       [ True,  True,  True, ...,  True,  True,  True],
       [ True,  True,  True, ...,  True,  True,  True],
       ...,
       [ True,  True,  True, ...,  True,  True,  True],
       [ True,  True,  True, ...,  True,  True,  True],
       [ True,  True,  True, ...,  True,  True,  True]])

In [29]:
5.423821089090779/0.033634579041972756

161.2572906687004

### Your comments 

```
Numpy was much faster than working out in Python.
```

NumPy provides support for large multidimensional arrays and matrices along with a collection of mathematical functions to operate on these elements. 

Numpy relies on well-known packages implemented in other languages (like Fortran) to perform efficient computations, bringing the user both the expressiveness of Python and a performance similar to MATLAB or Fortran.

## Summary

In this lab, we performed a quick comparison between calculating a dot product in numpy vs python built in function. We saw that Numpy is computationally much more efficient that Python code due to highly sophisticated implementation of Numpy source code. You are encouraged to always perform such tests to fully appreciate the use of an additional library in Python. 