# Numerical Python

Python does not have an array implementation, instead it has the less efficient but more versatile list. Numerical computations require an efficient array implementation and this is provided by the **NumPy package**.

An array has these properties, which distinguish it from a List:

1. All elements in an array are of the same type
2. Memory for an array is contiguous
3. If an array is multi-dimensioned, say two dimensioned, each row contains the same number of columns. This concept can be extended to arrays of higher dimensions

Base Python programming language does not have the array data structure, instead it has List, which is more generalized than an array. NumPy module adds the n-dimensioned array (**`numpy.ndarray`**) data structure to Python

## Creating Arrays

Arrays can be created from an existing container objects, such as, lists. One can manually define the lists to create arrays out of them, or use functions which return lists, such as **`range()`**.

In [7]:
from __future__ import division, print_function
import numpy as np

a = np.array([1, 2, 3, 4, 5])
print(type(a), a, type(a[0]))
s = a.shape
print(type(s), s)

b = np.array([2*i for i in a])  # List comprehension)
print(b, type(b), b.shape)

<type 'numpy.ndarray'> [1 2 3 4 5] <type 'numpy.int32'>
<type 'tuple'> (5L,)
[ 2  4  6  8 10] <type 'numpy.ndarray'> (5L,)


In [8]:
b = np.array([[1, 2, 3], [4, 5, 6]], dtype=float)
print(type(b), type(b[0,0]))
print(b)

c = [2*i for i in b]
print(type(c), c)

d = 2 * b
print(type(d))
print(d.T)  # Transpose of d

<type 'numpy.ndarray'> <type 'numpy.float64'>
[[ 1.  2.  3.]
 [ 4.  5.  6.]]
<type 'list'> [array([ 2.,  4.,  6.]), array([  8.,  10.,  12.])]
<type 'numpy.ndarray'>
[[  2.   8.]
 [  4.  10.]
 [  6.  12.]]


## NumPy Array Metadata
An array has a shape, which is a tuple. Number of items in the shape tuple represents the dimension of the array. Value of each element of the shape tuple represents the size along that dimension.

In [9]:
print(type(a), type(a.shape), len(a.shape), a.shape, a.ndim)
print(type(b), type(b.shape), len(b.shape), b.shape, b.ndim)

<type 'numpy.ndarray'> <type 'tuple'> 1 (5L,) 1
<type 'numpy.ndarray'> <type 'tuple'> 2 (2L, 3L) 2


## Array Operations
Mathematical operations such as addition (+), subtraction (-), multiplication (\*) and division (/) are all performed elementwise. However matrix multiplication is not an elementwise operation and is performed using a function **`dot()`**.

In [10]:
a = np.array([[1, 2, 3, 4], [5, 6, 7, 8]], dtype=float)
b = np.array([[2, 4, 6, 8], [10, 12, 14, 16]], dtype=float)
c = a + b
print(a)
print(b)
print 
(c)
d = b - a
print(d)
x = a * b
print(x)
y = b / a
print(y)
z = b.T
print(z)
p = np.dot(a, z)
print(p)

[[ 1.  2.  3.  4.]
 [ 5.  6.  7.  8.]]
[[  2.   4.   6.   8.]
 [ 10.  12.  14.  16.]]
[[ 1.  2.  3.  4.]
 [ 5.  6.  7.  8.]]
[[   2.    8.   18.   32.]
 [  50.   72.   98.  128.]]
[[ 2.  2.  2.  2.]
 [ 2.  2.  2.  2.]]
[[  2.  10.]
 [  4.  12.]
 [  6.  14.]
 [  8.  16.]]
[[  60.  140.]
 [ 140.  348.]]


In [28]:
x = np.array([range(1,6), range(6,11), range(11,16)])
print(x)
y = np.array([range(11,16), range(16,21), range(21, 26)])
print(y)
z = x + y
print(z)
print(y - x)
print(x * y)
print(np.dot(x, y.T))
print(np.sin(x))
print(np.dot(x, y)) # ERROR

[[ 1  2  3  4  5]
 [ 6  7  8  9 10]
 [11 12 13 14 15]]
[[11 12 13 14 15]
 [16 17 18 19 20]
 [21 22 23 24 25]]
[[12 14 16 18 20]
 [22 24 26 28 30]
 [32 34 36 38 40]]
[[10 10 10 10 10]
 [10 10 10 10 10]
 [10 10 10 10 10]]
[[ 11  24  39  56  75]
 [ 96 119 144 171 200]
 [231 264 299 336 375]]
[[ 205  280  355]
 [ 530  730  930]
 [ 855 1180 1505]]
[[ 0.84147098  0.90929743  0.14112001 -0.7568025  -0.95892427]
 [-0.2794155   0.6569866   0.98935825  0.41211849 -0.54402111]
 [-0.99999021 -0.53657292  0.42016704  0.99060736  0.65028784]]


ValueError: shapes (3,5) and (3,5) not aligned: 5 (dim 1) != 3 (dim 0)

## Functions that Operate on Arrays
NumPy also provides functions that operate on arrays. They are analogous to functions in **`math`**, except that they are capable of operating on arrays as well as single values.

In [11]:
import math

print(a)
# Try x = math.sqrt(a)
x = np.sqrt(a)
print(x)
print(np.cos(a))

[[ 1.  2.  3.  4.]
 [ 5.  6.  7.  8.]]
[[ 1.          1.41421356  1.73205081  2.        ]
 [ 2.23606798  2.44948974  2.64575131  2.82842712]]
[[ 0.54030231 -0.41614684 -0.9899925  -0.65364362]
 [ 0.28366219  0.96017029  0.75390225 -0.14550003]]


In [12]:
c = np.array([[[1, 2], [3, 4], [5, 6]], [[7, 8], [9, 10], [11, 12]]])
print(type(c), type(c.shape), len(c.shape), c.shape, c.ndim)
print(c)

<type 'numpy.ndarray'> <type 'tuple'> 3 (2L, 3L, 2L) 3
[[[ 1  2]
  [ 3  4]
  [ 5  6]]

 [[ 7  8]
  [ 9 10]
  [11 12]]]


In [13]:
print(len(a), len(b), len(c))
print(a.dtype)

2 2 2
float64


## Quick Creation of Common Arrays
Commonly required arrays such as those with all zeros, all ones, identity matrix and such can be created easily.

In [14]:
a = np.zeros((3,4), dtype=float)
print(a)

[[ 0.  0.  0.  0.]
 [ 0.  0.  0.  0.]
 [ 0.  0.  0.  0.]]


In [15]:
b = np.ones((2, 3, 4), dtype = int)
print(b)

[[[1 1 1 1]
  [1 1 1 1]
  [1 1 1 1]]

 [[1 1 1 1]
  [1 1 1 1]
  [1 1 1 1]]]


In [16]:
c = np.eye(4, dtype=int)
print(c)

[[1 0 0 0]
 [0 1 0 0]
 [0 0 1 0]
 [0 0 0 1]]


In [17]:
z = np.random.randn(3, 4)
print(z)

[[-0.66001357  0.68824462  1.409131    1.71253093]
 [-1.8589439  -0.20197068  1.90282863 -0.08021415]
 [-0.7289203  -1.14864007  0.11494438 -1.12937947]]


## diag()
**`diag()`** generates a square array with the given argument placed on the main diagonal, or on a diagonal above or below the main diagonal.

In [18]:
d = np.diag([1, 2, 3, 4])
print(d)

[[1 0 0 0]
 [0 2 0 0]
 [0 0 3 0]
 [0 0 0 4]]


In [19]:
x = np.diag([10, 20, 30], 1)  # List placed one place above the main diagonal
print(x)

[[ 0 10  0  0]
 [ 0  0 20  0]
 [ 0  0  0 30]
 [ 0  0  0  0]]


In [20]:
y = np.diag([10, 20, 30], -1)  # List placed one place below the main diagonal
print(y)

[[ 0  0  0  0]
 [10  0  0  0]
 [ 0 20  0  0]
 [ 0  0 30  0]]


### Random Number Arrays

In [21]:
a = np.random.random((3,4))
print(a)
b = a * 100
print(b)

[[ 0.47728888  0.05950243  0.48813849  0.24359009]
 [ 0.95941596  0.65437824  0.79965035  0.71109664]
 [ 0.79434576  0.4086556   0.85469238  0.98870638]]
[[ 47.7288883    5.9502434   48.81384877  24.35900917]
 [ 95.94159585  65.43782432  79.96503537  71.10966401]
 [ 79.43457641  40.86555974  85.46923831  98.870638  ]]


# Slicing Arrays
It is possible to access subarrays inside an array, either to copy their contents or replace them. The operation of accessing parts of an array is called **slicing**. Slicing is a powerful operation and requires some effort to master.
## Copying Subarrays

In [22]:
a = np.zeros((4, 5), dtype=float)
print(a)
a[0,:] = [1, 2, 3, 4, 5]
print(a)
a[:, 2] = [10, 20, 30, 40]
print(a)
print(a[:4, :2])
print(a[:,3:])

[[ 0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.]]
[[ 1.  2.  3.  4.  5.]
 [ 0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.]]
[[  1.   2.  10.   4.   5.]
 [  0.   0.  20.   0.   0.]
 [  0.   0.  30.   0.   0.]
 [  0.   0.  40.   0.   0.]]
[[ 1.  2.]
 [ 0.  0.]
 [ 0.  0.]
 [ 0.  0.]]
[[ 4.  5.]
 [ 0.  0.]
 [ 0.  0.]
 [ 0.  0.]]


## Rules for Slicing
1. If start value is not specified, it defaults to 0
2. If end value is not specified, it defaults to the last
3. Index -1 represents the index of the last element and -2 represents the index of the last but one element

In [23]:
b = np.array(range(9), dtype=float)
print(b)
print(b[0:4])   # b[0] to b[3]. b[4] NOT included
print(b[:4])    # Same as b[0:4]
print(b[2:9])   # b[2] to b[8], b[9] NOT included
print(b[2:])    # Same as b[2:9]
print(b[2:-1])  # b[2] to b[-2]. b[-1] NOT included
print(b[-1:-3]) # empty array

[ 0.  1.  2.  3.  4.  5.  6.  7.  8.]
[ 0.  1.  2.  3.]
[ 0.  1.  2.  3.]
[ 2.  3.  4.  5.  6.  7.  8.]
[ 2.  3.  4.  5.  6.  7.  8.]
[ 2.  3.  4.  5.  6.  7.]
[]


## Replace Elements in an Array

In [24]:
x = np.array(range(11))
print(x)
x[0:5] = np.zeros(5)
print(x)
x[0:5] = np.ones(5)
print(x)

[ 0  1  2  3  4  5  6  7  8  9 10]
[ 0  0  0  0  0  5  6  7  8  9 10]
[ 1  1  1  1  1  5  6  7  8  9 10]


In [25]:
y = np.zeros((4,5))
print(y)
y[1:3, 2:] = np.ones((2, 3))
print(y)

[[ 0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.]]
[[ 0.  0.  0.  0.  0.]
 [ 0.  0.  1.  1.  1.]
 [ 0.  0.  1.  1.  1.]
 [ 0.  0.  0.  0.  0.]]


## Reshaping Arrays

In [26]:
a = np.array(range(1, 13))
print(a)
b = np.reshape(a, (3, 4))
print(b)
x = np.array(range(1, 25))
print(x)
y = np.reshape(x, (2, 3, 4))
print(y)

[ 1  2  3  4  5  6  7  8  9 10 11 12]
[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]
[ 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24]
[[[ 1  2  3  4]
  [ 5  6  7  8]
  [ 9 10 11 12]]

 [[13 14 15 16]
  [17 18 19 20]
  [21 22 23 24]]]


In [27]:
y = np.reshape(y, (4, 3, 2)) # In place reshaping
print(y)

[[[ 1  2]
  [ 3  4]
  [ 5  6]]

 [[ 7  8]
  [ 9 10]
  [11 12]]

 [[13 14]
  [15 16]
  [17 18]]

 [[19 20]
  [21 22]
  [23 24]]]


# Execution Speeds
NumPy module has operations that are optimized for speed. Therefore, using the NumPy operations is faster than writing your own loops. For example, if we have two vectors of the the same size, and we want to create a new vector that is the product of the corresponding elements of the given arrays. That is, **`c = a * b`** where $a$ and $b$ are two vectors and we want an element-wise product $c_i = a_i * b_i$. This can be performed in two ways - we can use the operator **`*`** supported by NumPy or we can write our own loop for the purpose.

In [29]:
a = np.arange(1., 6)
b = np.arange(11., 16)
print(a)
print(b)
c = a * b
print(c)

def prod(a, b):
    c = np.zeros(a.shape)
    for i in range(len(a)):
        c[i] = a[i] * b[i]
    return c

d = prod(a, b)
print(d)

[ 1.  2.  3.  4.  5.]
[ 11.  12.  13.  14.  15.]
[ 11.  24.  39.  56.  75.]
[ 11.  24.  39.  56.  75.]


Now, let us compare their execution speeds. IPython has a **magic** command **`%timeit`** that executes Python code in a loop several times and determine the average time per loop. We will use this command to time the execution of the two different ways of finding the elementwise product.

In [32]:
%timeit c = a * b

The slowest run took 23.58 times longer than the fastest. This could mean that an intermediate result is being cached 
1000000 loops, best of 3: 887 ns per loop


In [33]:
%timeit d = prod(a, b)

The slowest run took 10.46 times longer than the fastest. This could mean that an intermediate result is being cached 
100000 loops, best of 3: 5.4 µs per loop


From the execution times, we can see that the NumPy operator is much faster than our function using a loop. The main reason the loop is slow because of the fact that the data type of loop variable **`i`** is dynamic. If the data type of **`i`** were static, it would be possible to speed up the execution of this function. There is a way in which this can be donw, using a Python module named **Cython**, but we will not cover it here.