# A hands-on tutorial on NumPy

### Task 1: Let's add a few numbers

Creating an array/list in vanilla Python

In [1]:
num_list = list(range(5000))

In [2]:
num_list[:5]

[0, 1, 2, 3, 4]

#### C like indexing

In [3]:
len_list = len(num_list)
sum_list = 0
for i in range(len_list):
    sum_list+= num_list[i]

#### Let's make it a function

In [4]:
def c_like_sum(l):
    len_list = len(num_list)
    sum_list = 0
    for i in range(len_list):
        sum_list+= num_list[i]
    return sum_list


In [5]:
c_like = %timeit c_like_sum(num_list)

336 µs ± 6.91 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


In [6]:
c_like

#### Pythonic iteration to find sum

In [7]:
def python_like_sum(l):
    sum_list = 0
    for x in l:
        sum_list+= x
    return sum_list

In [8]:
%timeit python_like_sum(num_list)

168 µs ± 2.36 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)


#### Well, just use `sum`

In [9]:
%timeit sum(num_list)

28.1 µs ± 184 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)


In [10]:
import numpy as np

In [11]:
y = np.array(num_list)

### What does a numpy array look like?

In [12]:
y

array([   0,    1,    2, ..., 4997, 4998, 4999])

In [13]:
%timeit np.sum(y)

6.29 µs ± 171 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


### Some attributes of numpy arrays

In [14]:
type(y)

numpy.ndarray

In [15]:
len(y)

5000

In [16]:
y.shape

(5000,)

In [17]:
y.size

5000

In [18]:
y.dtype

dtype('int64')

In [19]:
y

array([   0,    1,    2, ..., 4997, 4998, 4999])

In [20]:
y_32 = np.array(num_list, dtype='int32')
y_32

array([   0,    1,    2, ..., 4997, 4998, 4999], dtype=int32)

In [21]:
y_32_as_type = y.astype('int32')
y_32_as_type

array([   0,    1,    2, ..., 4997, 4998, 4999], dtype=int32)

### Comparing two numpy arrays

#### Plain check for equals?

In [22]:
y_32_as_type == y_32

array([ True,  True,  True, ...,  True,  True,  True])

This tells us if all the values in the arrays are the same. But, we need a single answer!

In [23]:
(y_32_as_type == y_32).astype('int')

array([1, 1, 1, ..., 1, 1, 1])

In [24]:
np.sum((y_32_as_type == y_32).astype('int'))==y_32.size

True

#### use in-built checker!

In [25]:
np.array_equal(y_32, y_32_as_type)

True

### Find sum of first `n` numbers

In [26]:
n = 10
y.cumsum()[n]

55

What does `cumsum` do?

In [27]:
np.array([0, 1, 2, 4]).cumsum()

array([0, 1, 3, 7])

### Exercise 1: Find n! using numpy in a single line of code

#### Hint: Similar to `cumsum`, there is a `cumprod` function

In [28]:
y[1:].cumprod()[:n]

array([      1,       2,       6,      24,     120,     720,    5040,
         40320,  362880, 3628800])

### Exercise 2: Given the starting number, the ratio and the length of a sequence, use cumprod to generate a Geometric Progression

In [29]:
a = 3
r = 2
np.array([a]+[r]*(n-1)).cumprod()

array([   3,    6,   12,   24,   48,   96,  192,  384,  768, 1536])

## Initialisation 

In [30]:
np.zeros(100)

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

In [31]:
np.ones(100)

array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])

In [32]:
np.ones(100).dtype

dtype('float64')

### Exercise 3: Initialise an array of ones of length 100 with int16 datatype

In [33]:
np.ones(100, dtype='int')

array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])

#### Initialising multi dimensional arrays

In [34]:
multi_dim_ones = np.ones((5,2))
multi_dim_ones

array([[1., 1.],
       [1., 1.],
       [1., 1.],
       [1., 1.],
       [1., 1.]])

#### Let us re-run some of the previous attributes for a numpy array

In [35]:
multi_dim_ones.shape

(5, 2)

In [36]:
multi_dim_ones.size

10

#### Initialising empty array

In [37]:
np.empty((10, 2))

array([[6.90568555e-310, 1.76603409e-316],
       [0.00000000e+000, 0.00000000e+000],
       [0.00000000e+000, 6.01433264e+175],
       [6.93885958e+218, 5.56218858e+180],
       [5.30276956e+180, 5.05117710e-038],
       [6.98071760e-076, 4.06333704e-086],
       [3.35559918e-143, 3.94356143e+180],
       [2.21649068e-056, 1.65636677e-047],
       [1.27778276e+161, 1.20941043e+161],
       [4.41197019e-143, 1.50008929e+248]])

But, this is not empty! What is empty and why is it different from `np.zero`?

From the documentation

    
 >Array of uninitialized (arbitrary) data of the given shape, dtype, and
    order.  Object arrays will be initialized to None.

#### Let's compare the relative speed of initialisation!

In [38]:
%timeit np.zeros((1000, 1000))

522 µs ± 58.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


In [39]:
%timeit np.empty((1000, 1000))

333 ns ± 29.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


Initialising `np.empty` is way quicker! This is due to the fact that we do not have to set the values to be zeros!

#### Initialising using `ones_like`

In [40]:
np.ones_like((y))

array([1, 1, 1, ..., 1, 1, 1])

In [41]:
#### Initialising using evenly spaced numbers over a range

In [42]:
np.linspace(0, 10, 11, dtype=int)
# start, stop, number of elements

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

#### Initialising similar to `range` for Python lists

In [43]:
np.arange(0, 11, 1, dtype=int)

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

### Exercise 4: Using both linspace and arange, create a numpy array of all numbers till 30 divisble by 3

In [44]:
np.linspace(3, 30, (30//3), dtype=int)

array([ 3,  6,  9, 12, 15, 18, 21, 24, 27, 30])

### Reshaping!

In [45]:
a = np.array([[1, 3], [4, 5]])
a

array([[1, 3],
       [4, 5]])

In [46]:
a.shape

(2, 2)

In [47]:
a.reshape(4, 1)

array([[1],
       [3],
       [4],
       [5]])

### Exercise 5: Using reshape, generate the array 
      [[1],
       [4],
       [3],
       [5]]
    



In [48]:
a.reshape(4, 1, order='F')

array([[1],
       [4],
       [3],
       [5]])

In [49]:
a.T.reshape(4, 1)

array([[1],
       [4],
       [3],
       [5]])

### Exercise 6: Write five numbers in each row

In [50]:
np.linspace(1, 100, 100, dtype=int).reshape(20, 5)

array([[  1,   2,   3,   4,   5],
       [  6,   7,   8,   9,  10],
       [ 11,  12,  13,  14,  15],
       [ 16,  17,  18,  19,  20],
       [ 21,  22,  23,  24,  25],
       [ 26,  27,  28,  29,  30],
       [ 31,  32,  33,  34,  35],
       [ 36,  37,  38,  39,  40],
       [ 41,  42,  43,  44,  45],
       [ 46,  47,  48,  49,  50],
       [ 51,  52,  53,  54,  55],
       [ 56,  57,  58,  59,  60],
       [ 61,  62,  63,  64,  65],
       [ 66,  67,  68,  69,  70],
       [ 71,  72,  73,  74,  75],
       [ 76,  77,  78,  79,  80],
       [ 81,  82,  83,  84,  85],
       [ 86,  87,  88,  89,  90],
       [ 91,  92,  93,  94,  95],
       [ 96,  97,  98,  99, 100]])

#### Without using the `20` in the reshape

In [51]:
np.linspace(1, 100, 100, dtype=int).reshape(-1, 5)

array([[  1,   2,   3,   4,   5],
       [  6,   7,   8,   9,  10],
       [ 11,  12,  13,  14,  15],
       [ 16,  17,  18,  19,  20],
       [ 21,  22,  23,  24,  25],
       [ 26,  27,  28,  29,  30],
       [ 31,  32,  33,  34,  35],
       [ 36,  37,  38,  39,  40],
       [ 41,  42,  43,  44,  45],
       [ 46,  47,  48,  49,  50],
       [ 51,  52,  53,  54,  55],
       [ 56,  57,  58,  59,  60],
       [ 61,  62,  63,  64,  65],
       [ 66,  67,  68,  69,  70],
       [ 71,  72,  73,  74,  75],
       [ 76,  77,  78,  79,  80],
       [ 81,  82,  83,  84,  85],
       [ 86,  87,  88,  89,  90],
       [ 91,  92,  93,  94,  95],
       [ 96,  97,  98,  99, 100]])

### Exercise 7

Reshape the array to 5, 4, 5

In [52]:
np.linspace(1, 100, 100, dtype=int).reshape(5,4,5)

array([[[  1,   2,   3,   4,   5],
        [  6,   7,   8,   9,  10],
        [ 11,  12,  13,  14,  15],
        [ 16,  17,  18,  19,  20]],

       [[ 21,  22,  23,  24,  25],
        [ 26,  27,  28,  29,  30],
        [ 31,  32,  33,  34,  35],
        [ 36,  37,  38,  39,  40]],

       [[ 41,  42,  43,  44,  45],
        [ 46,  47,  48,  49,  50],
        [ 51,  52,  53,  54,  55],
        [ 56,  57,  58,  59,  60]],

       [[ 61,  62,  63,  64,  65],
        [ 66,  67,  68,  69,  70],
        [ 71,  72,  73,  74,  75],
        [ 76,  77,  78,  79,  80]],

       [[ 81,  82,  83,  84,  85],
        [ 86,  87,  88,  89,  90],
        [ 91,  92,  93,  94,  95],
        [ 96,  97,  98,  99, 100]]])

In [53]:
np.arange(3, 33, 3)

array([ 3,  6,  9, 12, 15, 18, 21, 24, 27, 30])

## Arithmetic operations

### Adding two vectors

In [54]:
y1 = y.copy()


In [55]:
y1

array([   0,    1,    2, ..., 4997, 4998, 4999])

In [56]:
y1+y

array([   0,    2,    4, ..., 9994, 9996, 9998])

In [57]:
y1.shape==y.shape

True

In [58]:
ones_a = np.ones((2,2))
ones_b = np.ones((2,2))

In [59]:
ones_a+ones_b

array([[2., 2.],
       [2., 2.]])

In [60]:
np.add(ones_a, ones_b)

array([[2., 2.],
       [2., 2.]])

### Exercise 8: Find the additive inverse of an array 

>x + add_inv(x) = O

In [61]:
def additive_inverse(x):
    zeroes = np.zeros_like(x)
    return zeroes - x

In [62]:
additive_inverse((np.ones((3, 3))))

array([[-1., -1., -1.],
       [-1., -1., -1.],
       [-1., -1., -1.]])

### Elementwise mutiplication

In [63]:
a1 = np.array([[3, 4], [5, 1]])
a2 = np.array(([4, 3], [5, 7]))

In [64]:
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

ImportError: No module named 'seaborn'

In [None]:
sns.heatmap(a1, annot=True, cmap='Blues')
plt.title("A1")

In [None]:
sns.heatmap(a2, annot=True, cmap='Blues')
plt.title("A2")

In [None]:
a1_elem_mul_a2 = np.multiply(a1, a2)

In [None]:
a1_elem_mul_a2

In [None]:
sns.heatmap(a1_elem_mul_a2, annot=True, cmap='Blues')
plt.title("A1 Element Wise Multiplication with A2")

#### Square root

In [None]:
sns.heatmap(np.sqrt(a1), annot=True, cmap='Blues')
plt.title("SQRT(A1)")

#### Find $n^{th}$ number in fibonacci sequence using matrix math

![](https://www.geeksforgeeks.org/wp-content/uploads/fibonaccimatrix.png)

In [None]:
base_array = np.array([[1, 1], [1,0]])
base_array

In [None]:
n = 10
nth_power = np.power(base_array, n)
nth_power

In [None]:
np.dot(base_array, base_array)

#### Another method is to use numpy matrix

In [None]:
base_matrix = np.matrix([[1, 1], [1,0]])
base_matrix

In [None]:
base_matrix ** 2

### Broadcasting

In [None]:
one_d = np.arange(1, 10, 1)
one_d

#### Add 1 elementwise to each number in the array

In [None]:
np.add(one_d, np.ones_like(one_d))

#### Let's do something different!

In [None]:
one_d + 1

In [None]:
np.tile(1, one_d.shape)

In [None]:
#### Exercise 

In [None]:
one_d*2

In [None]:
(one_d*2)+2

In [None]:
two_d = np.arange(1, 11, 1).reshape(-1, 2)
two_d

In [None]:
two_d + 1

In [None]:
two_d + np.array([1, 2])

In [None]:
# Works with list too!
two_d + np.array([1, 2])

In [None]:
two_d + np.array([[1], [1], [1], [1], [1]])

### Exercise 9: Find the mean of each row of a matrix and subtract the mean from the corresponding row

In [None]:
two_d = np.array([[1,2,3],[4,5,6],[7,8,9],[1,4,9]])
two_d

In [None]:
two_d - np.mean(two_d, axis = 1).reshape(-1,1)

### Exercise 10: Find the softmax of an array.
If x is the array and p is its softmax, then
$ p_i =\frac{e^{(x_i)}}{\sum_{t=1}^{n}e^{(x_t)}} $

In [None]:
x = np.array([1,2,3,4,5,6,7,8])
x

In [None]:
np.exp(x) / np.sum(np.exp(x))

### Indexing

In [None]:
test_array = np.arange(1, 11, 1).reshape(5, 2)
test_array

#### Syntax

arr[outer selector, inner selector, ...]

#### Get a row

In [None]:

test_array[0, :]

#### Exercise : Print the last column

In [None]:
test_array[:, 1]

We knew that there are 2 columns and thus we used the index [:, 1]. Could we use some other way?

In [None]:
test_array[:, -1]

#### Exercise find the element in last row and last column

In [None]:
test_array[-1, -1]

#### Exercise: Find the first two rows and first column

In [None]:
test_array[:2, 0]

Wait, this is a lower rank array. What if we want the same rank array?

In [None]:
test_array[:2, 0:1]

### Simple condition checking and indexing

In [None]:
test_array%4==0

In [None]:
test_array[test_array%4==0]

#### Exercise: Print all numbers greater than 5

In [None]:
test_array[test_array>5]

### Indexing and assignment

#### Set the last column to be all 1s

In [65]:
test_array[:, 1] = 1
test_array

NameError: name 'test_array' is not defined

#### Exercise: Set the first column of last two rows to be all -1

In [66]:
test_array[-2:, 0] = -1
test_array

NameError: name 'test_array' is not defined

#### Exercise: Set all negative numbers to 0

In [67]:
test_array[test_array<0] = 0
test_array

NameError: name 'test_array' is not defined

### Interfacing with files - Persistence and Loading

#### Saving data to disk in binary format

In [68]:
np.save('test-arr.npy', test_array)

NameError: name 'test_array' is not defined

In [69]:
!ls -lah test-arr.npy

-rw-rw-r-- 1 atishay atishay 208 Oct 24 15:24 test-arr.npy


In [70]:
!cat test-arr.npy

�NUMPY v {'descr': '<i8', 'fortran_order': False, 'shape': (5, 2), }                                                          
                                                                        

#### Loading binary data from disk

In [71]:
test_loaded = np.load('test-arr.npy')

In [72]:
test_loaded

array([[1, 1],
       [3, 1],
       [5, 1],
       [0, 1],
       [0, 1]])

In [73]:
np.array_equal(test_loaded, test_array)

NameError: name 'test_array' is not defined

#### Saving data to disk in text format

In [74]:
big_array = np.random.random_integers(1, 100, size=(10,10))
big_array

  """Entry point for launching an IPython kernel.


array([[90,  4, 76, 61, 50, 86, 41, 10, 51, 23],
       [49, 23, 76, 48, 60, 70, 16, 57, 71, 94],
       [79, 72, 51, 56, 41, 26, 97, 49, 25, 87],
       [69, 74, 97, 48, 18, 28, 54, 74, 38, 14],
       [84, 39, 79, 53, 13, 40, 87, 22, 62, 92],
       [76, 61, 88, 22, 22, 63, 70, 73, 56, 71],
       [56, 91, 51, 43, 18, 41, 30, 12, 99,  6],
       [63, 90, 69, 78, 40, 74, 19,  4, 76, 95],
       [52, 95, 39, 23,  5, 40, 75, 33, 22, 19],
       [15,  4, 60, 82, 34, 41, 29, 62, 68, 43]])

In [75]:
np.savetxt('big-array.csv', big_array, delimiter=',', fmt='%d')

In [76]:
!head big-array.csv

90,4,76,61,50,86,41,10,51,23
49,23,76,48,60,70,16,57,71,94
79,72,51,56,41,26,97,49,25,87
69,74,97,48,18,28,54,74,38,14
84,39,79,53,13,40,87,22,62,92
76,61,88,22,22,63,70,73,56,71
56,91,51,43,18,41,30,12,99,6
63,90,69,78,40,74,19,4,76,95
52,95,39,23,5,40,75,33,22,19
15,4,60,82,34,41,29,62,68,43


#### Loading text data from disk

In [77]:
np.loadtxt('big-array.csv', delimiter=',')

array([[90.,  4., 76., 61., 50., 86., 41., 10., 51., 23.],
       [49., 23., 76., 48., 60., 70., 16., 57., 71., 94.],
       [79., 72., 51., 56., 41., 26., 97., 49., 25., 87.],
       [69., 74., 97., 48., 18., 28., 54., 74., 38., 14.],
       [84., 39., 79., 53., 13., 40., 87., 22., 62., 92.],
       [76., 61., 88., 22., 22., 63., 70., 73., 56., 71.],
       [56., 91., 51., 43., 18., 41., 30., 12., 99.,  6.],
       [63., 90., 69., 78., 40., 74., 19.,  4., 76., 95.],
       [52., 95., 39., 23.,  5., 40., 75., 33., 22., 19.],
       [15.,  4., 60., 82., 34., 41., 29., 62., 68., 43.]])

### Miscellaneous exercises

1. Will add some questions from https://www.labri.fr/perso/nrougier/teaching/numpy/numpy.html
2. Will add some questions based on things I have required in my scientific programming. Like, subset sum for appliance energy consumption. 
3. Can also include examples like KMeans etfc

1. Swap two rows of a 2-d array.
2. Given two vectors (1-d arrays), find the projection of the first vector on the second.
3. Given n arrays, create a matrix where each array is a column of the matrix.
4. Given a system of equations (say 3 variable, 3 equation system), solve it.
   Hint: Use the numpy.linalg sub-module. It contains an inverse function.
##### The linalg sub-module is a very useful module which has linear algebra functions.

In [109]:
!wget https://raw.githubusercontent.com/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/master/styles/custom.css --no-check-certificate

--2018-10-20 08:54:35--  https://raw.githubusercontent.com/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/master/styles/custom.css
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.152.133
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.152.133|:443... connected.
  Unable to locally verify the issuer's authority.
HTTP request sent, awaiting response... 200 OK
Length: 2283 (2.2K) [text/plain]
Saving to: ‘custom.css’


2018-10-20 08:54:36 (3.25 MB/s) - ‘custom.css’ saved [2283/2283]



In [110]:
from IPython.core.display import HTML


def css_styling():
    styles = open("custom.css", "r").read()
    return HTML(styles)
css_styling()