# Why Numpy over Lists ?

### Lists vs Arrays Efficiency

In [1]:
# Importing numpy
import numpy as np

list_1 = [i for i in range(1,5000001)]
list_2 = [i for i in range(1,5000001)]

array_1 = np.arange(1,5000001)
array_2 = np.arange(1,5000001)

## Basic Algebra Operation Comparisons

In [2]:
from time import time

# Elementwise Add
time1 = time()
add_res = [a+b for a,b in zip(list_1,list_2)]
time2 = time()
list_add_time = time2-time1

# Elementwise Subtract
time1 = time()
sub_res = [a-b for a,b in zip(list_1,list_2)]
time2 = time()
list_sub_time = time2-time1

# Elementwise Multiply
time1 = time()
mult_res = [a*b for a,b in zip(list_1,list_2)]
time2 = time()
list_mul_time = time2-time1

# Elementwise Division
time1 = time()
div_res = [a/b for a,b in zip(list_1,list_2)]
time2 = time()
list_div_time = time2-time1

In [3]:
from time import time

# Elementwise Add
time1 = time()
add_res = array_1 + array_2
time2 = time()
np_add_time = time2-time1

# Elementwise Sub
time1 = time()
sub_res = array_1 - array_2
time2 = time()
np_sub_time = time2-time1

# Elementwise Mult
time1 = time()
mult_res = array_1*array_2
time2 = time()
np_mul_time = time2-time1

# Elementwise Div
time1 = time()
div_res = array_1/array_2
time2 = time()
np_div_time = time2-time1

In [4]:
print("Add ==> List : {}, Numpy : {}".format(list_add_time,np_add_time))
print("Sub ==> List : {}, Numpy : {}".format(list_sub_time,np_sub_time))
print("Mult ==> List : {}, Numpy : {}".format(list_mul_time,np_mul_time))
print("Div ==> List : {}, Numpy : {}".format(list_div_time,np_div_time))

Add ==> List : 0.2737443447113037, Numpy : 0.061591148376464844
Sub ==> List : 0.17657804489135742, Numpy : 0.023220062255859375
Mult ==> List : 0.2674520015716553, Numpy : 0.04921674728393555
Div ==> List : 0.26050353050231934, Numpy : 0.05540132522583008


***There's a significant difference in the two times***

## Numpy

Numpy is the core library for scientific computing in Python. It provides a high-performance multidimensional array object, and tools for working with these arrays. If you are already familiar with MATLAB, you might find this [tutorial](http://wiki.scipy.org/NumPy_for_Matlab_Users) useful to get started with Numpy.

### 1. Array Creation

We can initialize numpy arrays from nested Python lists, and access elements using square brackets:

In [5]:
# Create a one dimensional array
a = np.array([1, 2, 3])
print(a[0], a[1], a[2])
# Change an element of the array
a[0] = 5
print(a)

1 2 3
[5 2 3]


In [6]:
# Create a two dimensional array
b = np.array([[1,2,3],[4,5,6]])
print(b[0, 0], b[0, 1], b[1, 0])

1 2 4


Numpy also provides many functions to create arrays:

In [7]:
# Create an array of all zeros
a = np.zeros((2,2))
print(a)

[[0. 0.]
 [0. 0.]]


In [8]:
# Create an array of all ones
b = np.ones((1,2))
print(b)

[[1. 1.]]


In [9]:
# Create a constant array
c = np.full((2,2), 7)
print(c)

[[7 7]
 [7 7]]


In [10]:
# Create a 2x2 identity matrix
d = np.eye(2)
print(d)

[[1. 0.]
 [0. 1.]]


In [11]:
# Create an array filled with random values
e = np.random.random((2,2))
print(e)

[[0.12651441 0.67238298]
 [0.13708    0.51541242]]


### 2. Array Indexing

In [12]:
a = np.array([[1,2,3,4],
              [5,6,7,8],
              [9,10,11,12]])
a.shape

(3, 4)

### Indexing type 1 : Fixed number

In [13]:
# The complete 3rd row - as the the second dimension is not mentioned
# it is assumed that you want all elements.
temp = a[2]
temp.shape
temp

array([ 9, 10, 11, 12])

Note something interesting - **a** was two dimensional and **temp** is one dimensional. So if you use a **fixed** number at any index position the dimensionality reduces by 1

### Indexing type 2 : Range of Numbers

In [14]:
# The complete 1st and 2nd rows - as the the second dimension is
# not mentioned it is assumed that you want all elements.
temp = a[0:2]
temp.shape
temp

array([[1, 2, 3, 4],
       [5, 6, 7, 8]])

Note something interesting - **a** and **temp** are both two dimensional. So if you use a **range** of numbers at any index position the dimensionality remains the same

### Indexing type 2 : Combination of Fixed and Range of Numbers

In [15]:
# The 2nd row's 2nd and 3rd elements - the number (2) fixes the row
# and the range (1:3) fixes the numbers in the row
temp = a[2,1:3]
temp.shape
temp

array([10, 11])

Note - any modification done to the numbers in **temp** will directly modify the numbers in the **a**

In [16]:
# "temp" holds all the values actually held by "a"
temp[0] = 100
a

array([[  1,   2,   3,   4],
       [  5,   6,   7,   8],
       [  9, 100,  11,  12]])

### 3. Fancy Indexing

In [17]:
a = np.array([[1,2,3,4],
              [5,6,7,8],
              [9,10,11,12]])
a.shape

(3, 4)

### Fancy type 1 : Arbitrary or Special Arrays
Say we want to access the elements **2**, **6**, **4**, **12** from **a**. The normal indexing patterns we learnt will not be able to access these elements in one go.

In [18]:
# Option 1 - Specify the indices individually
temp = [ a[0,1], a[1,1], a[0,3], a[2,3] ]
temp

[2, 6, 4, 12]

In [19]:
# Option 2 - Pass the indices as a list. (All the indices of dimension 1 together and all the indices of dimension 2 together)
temp = a[ [0,1,0,2], [1,1,3,3] ]
temp

array([ 2,  6,  4, 12])

Note that the two lists that are passed must be of the same size. Numpy takes the 1st element of the the 1st and 2nd lists and creates the indexing tuple [0,1]. It repeats this process for all the elements for both the lists.

Why is the use of arrays in indexing useful ? Let **data** be the array as given below. **How do I increment all the digonal elements in the array by 1 ?**

In [20]:
data = np.reshape(np.arange(0,10000),newshape=(100,100))
data

array([[   0,    1,    2, ...,   97,   98,   99],
       [ 100,  101,  102, ...,  197,  198,  199],
       [ 200,  201,  202, ...,  297,  298,  299],
       ...,
       [9700, 9701, 9702, ..., 9797, 9798, 9799],
       [9800, 9801, 9802, ..., 9897, 9898, 9899],
       [9900, 9901, 9902, ..., 9997, 9998, 9999]])

Manually specifying all the indices is very difficult. We realise that the elements we are interested in are **[0,0]**,**[1,1],[2,2] ... [99,99]**. Let's take a look at all the first indices - if we group them together they are **[0,1,2, ... ,99]**. The group of second indices is also **[0,1,2, ... ,99]**. Both these lists are identical and can be easily be represented as **np.arange(100)**

In [21]:
diagonals = data[np.arange(100),np.arange(100)]
diagonals

array([   0,  101,  202,  303,  404,  505,  606,  707,  808,  909, 1010,
       1111, 1212, 1313, 1414, 1515, 1616, 1717, 1818, 1919, 2020, 2121,
       2222, 2323, 2424, 2525, 2626, 2727, 2828, 2929, 3030, 3131, 3232,
       3333, 3434, 3535, 3636, 3737, 3838, 3939, 4040, 4141, 4242, 4343,
       4444, 4545, 4646, 4747, 4848, 4949, 5050, 5151, 5252, 5353, 5454,
       5555, 5656, 5757, 5858, 5959, 6060, 6161, 6262, 6363, 6464, 6565,
       6666, 6767, 6868, 6969, 7070, 7171, 7272, 7373, 7474, 7575, 7676,
       7777, 7878, 7979, 8080, 8181, 8282, 8383, 8484, 8585, 8686, 8787,
       8888, 8989, 9090, 9191, 9292, 9393, 9494, 9595, 9696, 9797, 9898,
       9999])

In [22]:
# Increment all the diagonal elements by 1
data[np.arange(100),np.arange(100)] += 1
data

# Googly Question - Will diagonals+=1 work ??

array([[    1,     1,     2, ...,    97,    98,    99],
       [  100,   102,   102, ...,   197,   198,   199],
       [  200,   201,   203, ...,   297,   298,   299],
       ...,
       [ 9700,  9701,  9702, ...,  9798,  9798,  9799],
       [ 9800,  9801,  9802, ...,  9897,  9899,  9899],
       [ 9900,  9901,  9902, ...,  9997,  9998, 10000]])

### Fancy type 2 : Bool Indexing

Sometimes we require to modify only those elements in an array that satisfy a certain condition. One way is to apply a for loop over all elements and then modify the relevant elements only.

ex. Change all the even elements in **data** to 3.

In [23]:
data = np.reshape(np.arange(0,10000),newshape=(100,100))
data

array([[   0,    1,    2, ...,   97,   98,   99],
       [ 100,  101,  102, ...,  197,  198,  199],
       [ 200,  201,  202, ...,  297,  298,  299],
       ...,
       [9700, 9701, 9702, ..., 9797, 9798, 9799],
       [9800, 9801, 9802, ..., 9897, 9898, 9899],
       [9900, 9901, 9902, ..., 9997, 9998, 9999]])

In [24]:
# Brute Force Method
time1 = time()
for i in range(100) :
    for j in range(100) :
        if data[i,j]%2 == 0:
            data[i,j]=3
time2 = time()
print("Time Difference : {}".format(time2-time1))
data

Time Difference : 0.008337974548339844


array([[   3,    1,    3, ...,   97,    3,   99],
       [   3,  101,    3, ...,  197,    3,  199],
       [   3,  201,    3, ...,  297,    3,  299],
       ...,
       [   3, 9701,    3, ..., 9797,    3, 9799],
       [   3, 9801,    3, ..., 9897,    3, 9899],
       [   3, 9901,    3, ..., 9997,    3, 9999]])

The much faster way is to create a **bool index**. Take a look below, the **data%2==0** is explained in depth after this example.

In [25]:
# Faster Method
data = np.reshape(np.arange(0,10000),newshape=(100,100))
time1 = time()
data[data%2==0] = 3
time2 = time()
print("Time Difference : {}".format(time2-time1))
data

Time Difference : 0.00030517578125


array([[   3,    1,    3, ...,   97,    3,   99],
       [   3,  101,    3, ...,  197,    3,  199],
       [   3,  201,    3, ...,  297,    3,  299],
       ...,
       [   3, 9701,    3, ..., 9797,    3, 9799],
       [   3, 9801,    3, ..., 9897,    3, 9899],
       [   3, 9901,    3, ..., 9997,    3, 9999]])

The output of **data%2==0** is given below. As shown it is an array full of bools where **True** indicates that the element satisfies the condition.

In [26]:
data = np.reshape(np.arange(0,10000),newshape=(100,100))
idxs = (data%2==0)
idxs

array([[ True, False,  True, ..., False,  True, False],
       [ True, False,  True, ..., False,  True, False],
       [ True, False,  True, ..., False,  True, False],
       ...,
       [ True, False,  True, ..., False,  True, False],
       [ True, False,  True, ..., False,  True, False],
       [ True, False,  True, ..., False,  True, False]])

### 4. More Algebraic Operations

In [27]:
x = np.array([[1,2],[3,4]])

In [28]:
# Elementwise square root; produces the array
print(np.sqrt(x))

[[1.         1.41421356]
 [1.73205081 2.        ]]


Numpy provides many useful functions for performing computations on arrays; one of the most useful is `sum`:

In [29]:
# Sum Function
print(np.sum(x))  # Compute sum of all elements; prints "10"
print(np.sum(x, axis=0))  # Compute sum of each column; prints "[4 6]"
print(np.sum(x, axis=1))  # Compute sum of each row; prints "[3 7]"

10
[4 6]
[3 7]


In [30]:
# Transpose Function
print(x.T)

[[1 3]
 [2 4]]


Note that unlike MATLAB, `*` is elementwise multiplication, not matrix multiplication. We instead use the dot function to compute **inner products of vectors, to multiply a vector by a matrix, and to multiply matrices**. dot is available both as a function in the numpy module and as an instance method of array objects:

In [31]:
# Data
x = np.array([[1,2],[3,4]])
y = np.array([[5,6],[7,8]])
v = np.array([9,10])
w = np.array([11, 12])

#### Dot Product between a vector and a vector

In [32]:
# Dot Product between 2 vectors - inner product = 219
print(v.dot(w))
print(np.dot(v, w))

219
219


#### Dot Product between a vector and a matrix

In [33]:
# Matrix / vector product; both produce the rank 1 array [29 67]
print(x.dot(v))
print(np.dot(x, v))

[29 67]
[29 67]


#### Dot Product between a matrix and a matrix

In [34]:
# Matrix / matrix product; both produce the rank 2 array
# [[19 22]
#  [43 50]]
print(x.dot(y))
print(np.dot(x, y))

[[19 22]
 [43 50]]
[[19 22]
 [43 50]]


### 5. Broadcasting

Broadcasting is a powerful mechanism that allows numpy to work with arrays of different shapes when performing arithmetic operations. Frequently we have a smaller array and a larger array, and we want to use the smaller array multiple times to perform some operation on the larger array.

For example, suppose that we want to add a constant vector to each row of a matrix. We could do it like this:

In [35]:
# We will add the vector v to each row of the matrix x,
# storing the result in the matrix y
x = np.array([[1.,2.,3.], [4.,5.,6.], [7.,8.,9.], [10., 11., 12.]])
v = np.array([1., 0., 1.])

# Create an empty matrix with the same shape as x
res = np.zeros(x.shape)   

# Add the vector v to each row of the matrix x with an explicit loop
for i in range(4):
    res[i, :] = x[i, :] + v

print(res)

[[ 2.  2.  4.]
 [ 5.  5.  7.]
 [ 8.  8. 10.]
 [11. 11. 13.]]


This works; however when the matrix `x` is very large, computing an explicit loop in Python could be slow. Note that adding the vector v to each row of the matrix `x` is equivalent to forming a matrix `vv` by stacking multiple copies of `v` vertically, then performing elementwise summation of `x` and `vv`. We could implement this approach like this:

In [36]:
vv = np.tile(v, (4, 1))  # Stack 4 copies of v on top of each other
print(vv)                # Prints "[[1 0 1]
                         #          [1 0 1]
                         #          [1 0 1]
                         #          [1 0 1]]"

[[1. 0. 1.]
 [1. 0. 1.]
 [1. 0. 1.]
 [1. 0. 1.]]


In [37]:
y = x + vv  # Add x and vv elementwise
print(y)

[[ 2.  2.  4.]
 [ 5.  5.  7.]
 [ 8.  8. 10.]
 [11. 11. 13.]]


Numpy broadcasting allows us to perform this computation without actually creating multiple copies of v. Consider this version, using broadcasting:

In [38]:
# We will add the vector v to each row of the matrix x,
# storing the result in the matrix y
x = np.array([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]])
v = np.array([1, 0, 1])
y = x + v  # Add v to each row of x using broadcasting
print(y)

[[ 2  2  4]
 [ 5  5  7]
 [ 8  8 10]
 [11 11 13]]


The line `y = x + v` works even though `x` has shape `(4, 3)` and `v` has shape `(3,)` due to broadcasting; this line works as if v actually had shape `(4, 3)`, where each row was a copy of `v`, and the sum was performed elementwise.

Broadcasting two arrays together follows these rules:

1. If the arrays do not have the same rank, prepend the shape of the lower rank array with 1s until both shapes have the same length.
2. The two arrays are said to be compatible in a dimension if they have the same size in the dimension, or if one of the arrays has size 1 in that dimension.
3. The arrays can be broadcast together if they are compatible in all dimensions.
4. After broadcasting, each array behaves as if it had shape equal to the elementwise maximum of shapes of the two input arrays.
5. In any dimension where one array had size 1 and the other array had size greater than 1, the first array behaves as if it were copied along that dimension

If this explanation does not make sense, try reading the explanation from the [documentation](http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html) or this [explanation](http://wiki.scipy.org/EricsBroadcastingDoc).

Functions that support broadcasting are known as universal functions. You can find the list of all universal functions in the [documentation](http://docs.scipy.org/doc/numpy/reference/ufuncs.html#available-ufuncs).

Here are some applications of broadcasting:

In [39]:
# Addition - Try 1
M = np.ones((4, 3))
a = np.arange(3)

Let's consider the shape of these arrays:
* M.shape = (4,3)
* a.shape = (3,)

By rule 1, prepend the rank of a by 1. The new shapes thus are:
* M.shape = (4,3)
* a.shape = (1,3)

Now by rule 5, copy the dimensions to make the arrays compatible
* M.shape = (4,3)
* a.shape = (4,3)

In [40]:
M+a

array([[1., 2., 3.],
       [1., 2., 3.],
       [1., 2., 3.],
       [1., 2., 3.]])

In [41]:
# Addition Try 2 : Why does this not work ?
M = np.ones((4, 3))
a = np.arange(3)
a = a.reshape((3,1))
M+a

ValueError: operands could not be broadcast together with shapes (4,3) (3,1) 

In [42]:
# Addition Try 3 : Why does this not work ?
m  = np.ones((4,4,8,9))
n = np.ones((4,9))
m+n

ValueError: operands could not be broadcast together with shapes (4,4,8,9) (4,9) 

In [43]:
# Compute the outer product of vectors (Explain the steps again)
v = np.array([[1],
              [2],
              [3]])  # v has shape (3,1)
w = np.array([4,5])    # w has shape (2,)
print(v*w)

[[ 4  5]
 [ 8 10]
 [12 15]]
