# Numpy

NumPy and SciPy are the twin pillars of doing data science in Python.  Early on in Python's history, it became clear that Python's list data structures weren't ideal for doing heavy-duty number crunching on vectors and matrices. 

So, numpy was born to try to solve the problem, and introduce an array-type data structure into Python.

References
- [Python Data Science Handbook, Ch 02.1](https://jakevdp.github.io/PythonDataScienceHandbook/02.01-understanding-data-types.html)
- [Python Data Science Handbook, Ch 02.2](https://jakevdp.github.io/PythonDataScienceHandbook/02.02-the-basics-of-numpy-arrays.html)

## 1 - Creating Numpy Arrays

In [None]:
# here is the import

import numpy as np
np.__version__

---

Notice that we have to pass in a list of numbers rather than 

```python
np.array(1,2,3)  ## ERROR: won't work

np.array([1,2,3]) ## Correct
```

In [None]:
# integer array:
a1 = np.array([1, 4, 2, 5, 3])
print (a1)
print (type(a1))

---

In [None]:
## Create a float array

a2 = np.array([1, 2, 3, 4], dtype='float32')
print (a2)
print (type(a2))
print (a2.dtype)

---

Let's do a sequence of numbers with arange

In [None]:
a3 = np.arange(100)
a3

---

## 2 - Accessing Numpy Arrays

zero based index

In [None]:
a = np.arange(10)
a

In [None]:
# first element
a[0]

In [None]:
# last elemnt
a[9]

In [None]:
# this is an error!
a[10]

In [None]:
# Using negative indexes to walk backwards!
# This is very cool, try it out

a[-1]  # last element

In [None]:
a [-2]  # second to last element

---

## 3 - Manipulating numpy Arrays

In [None]:
# Try to multiply sequence by a scalar
a = np.arange(10)
a

In [None]:
a * np.pi

---

## 4 - Multidimensional Arrays

We can make multi-dimensional arrays from single dimensions with shape.

In [None]:
a = np.array([1,2,3,4,5,6])
a

In [None]:
b = a.reshape(2,3)
b

In [None]:
c = a.reshape (3,2)
c

---

We can access elements of the 2-D array using array subscription operators.

Remember we do **this**:

```python
b[1,2]
```

not **this**

```python
b[1][2] # BAD!
```

In [None]:
b[1,2]

---

## 5 - Handy NP Arrays

We can create 
- array of zeros
- array of random numbers 
- ..etc

In [None]:
## Zero
a = np.zeros(10)
a

In [None]:
## Ones
b = np.ones(10, dtype=int)
b

In [None]:
b = np.ones (10, dtype=float)
b

In [None]:
# Random numbers between 0 to 1
np.random.random(10)

In [None]:
# Create a  array of 10 random integers in the interval [0, 100)
np.random.randint(0, 100, 10)

In [None]:
# Create a 3x3 array of random integers in the interval [0, 10)
np.random.randint(0, 10, (3, 3))

---

## 6 - NP Datatypes

![](../assets/images/np-data-types.png)

---

## 7 - Indexing and Slicing Arrays

It is imporant to know how to **slice** arrays.  Understand the concept and it will be easier

In [8]:
import numpy as np

a = np.arange(0,10)
a

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [9]:
# Print first element
a[0]

0

In [10]:
# print last element
a[-1]

9

In [13]:
# print every thing after element 5
a[5:]

array([5, 6, 7, 8, 9])

In [14]:
# print upto element 7
a[:7]

array([0, 1, 2, 3, 4, 5, 6])

In [15]:
# print elements 3 - 8
a[3:8]

array([3, 4, 5, 6, 7])

In [16]:
# or 
a[3 : 8+1]
# which is correct ? :-) 

array([3, 4, 5, 6, 7, 8])

In [17]:
# print every thing from element 5 upto last element
a[5:-1]

array([5, 6, 7, 8])

In [18]:
# print every thing from element 5 upto second last element
a[5:-2]

array([5, 6, 7])

---

## FUN: Benchmark NP arrays vs Python lists

So let's do a quick benchmark comparing performance between np array vs python standard lists

### `%timeit`

`timeit` is a handy utilility.  It will measure a code snippet's performance.  It will typically run it a million times and calculate averager time taken

References:
- [1](https://docs.python.org/3/library/timeit.html)

### Small sized arrays

Let's start with an array size of 10

In [None]:
import random

py_list = range(0,10)
np_arr = np.arange (0,10)

In [None]:
# python small list
%timeit  random.choices(py_list, k=1)

In [None]:
# np small array
%timeit np.random.choice(np_arr, size=1)

### Large Sized Arrays

In [None]:
size = 1000000
py_list = range(0,size)
np_arr = np.arange (0,size)

In [None]:
# python large list
%timeit  random.choices(py_list, k=1000)

In [None]:
# np large array
%timeit np.random.choice(np_arr, size=1000)

### Sum Operation on large array

In [None]:
size = 1000000
py_list = range(0,size)
np_arr = np.arange (0,size)

In [None]:
%timeit sum(py_list)

In [None]:
%timeit np.sum(np_arr)

### Comparing Memory footprint

This is an approximate measurement of memory footprint.  For accurate measurements we should use memory profilers like heapy.

In [None]:
import sys

size = 1000000
py_list = range(0,size)
np_arr = np.arange (0,size)

print ("python list of {:,} elements takes up {:,} bytes".format(size, sys.getsizeof(py_list) * size))
print ("numpy array of {:,} elements takes up {:,} bytes".format(size, np_arr.size * np_arr.itemsize))

### Discussion

Discus your findings

References
- [Speed comparison. numpy vs python standard](https://stackoverflow.com/questions/52603487/speed-comparison-numpy-vs-python-standard)
- [What are the advantages of NumPy over regular Python lists?](https://stackoverflow.com/questions/993984/what-are-the-advantages-of-numpy-over-regular-python-lists)
- [Difference between list and NumPy array memory size](https://stackoverflow.com/questions/67549486/difference-between-list-and-numpy-array-memory-size)

---
---

## Bonus Exercises

- [guided-machine-learning/python-data-analysis/np-1__numpy-intro.md](https://github.com/elephantscale/guided-machine-learning/blob/master/python-data-analysis/np-1__numpy-intro.md)