## Using NumPy

NumPy is the standard Python package for scientific computing. It provides support for multidimensional arrays which are more efficient than standard Python data structures.

To start off, we import the NumpPy package. We can import it as *np* for shorthand.

In [3]:
import numpy as np

### NumPy 1D Arrays

The fundamental NumPy data structure is an *array*: a memory-efficient container that provides fast numerical operations.
Unlike standard Python lists, NumPy arrays only contain a single type of value (e.g. only floats; only integers etc). 

The simplest type of array is 1-dimensional (1D). We can create an array from an existing Python list

In [4]:
mylist = [5,18,3,12,20,0,24]
a = np.array(mylist)
print(a)

[ 5 18  3 12 20  0 24]


In [5]:
a.shape

(7,)

Unlike standard Python lists, NumPy arrays only contain a single type of value, such as an integer or a float.

In [6]:
a.dtype

dtype('int64')

In [7]:
b = np.array( [0.3, 0.12, 1.4, 2.3, 4.5] )
b

array([ 0.3 ,  0.12,  1.4 ,  2.3 ,  4.5 ])

In [8]:
b.dtype

dtype('float64')

### NumPy 2D Arrays

An array can have more than 1 dimension. A 2D array can be viewed as a matrix, with rows and columns. Arrays can also have > 2 dimensions.

We can create 2D arrays from a list containing other Python lists. These lists must contain the same number of values. Also, make sure to include the outer [ ] brackets!

In [9]:
r1 = [ 4, 3, 2, 3 ]
r2 = [ 3, 5, 6, 4 ]
m = np.array( [ r1, r2 ] )
m

array([[4, 3, 2, 3],
       [3, 5, 6, 4]])

The *rank* of an array is the number of dimensions it has.

In [10]:
m.ndim

2

The *shape* of an array is a tuple of integers giving the length of the array in each dimension.

In [11]:
m.shape

(2, 4)

The *size* of an array is the total number of elements it has. In the below, this is number of rows X number of columns.

In [12]:
m.size

8

### Array Creation Alternatives 

np.zeros  
np.ones  
np.arrange  
np.linespace  

Rather than using Python lists, a variety of functions are available for conveniently creating and populating arrays.

Use the *np.zeros()* function to create an array full of 0s with required shape 

In [13]:
x = np.zeros(4)
x

array([ 0.,  0.,  0.,  0.])

In [14]:
y = np.zeros( (3,2) )
y

array([[ 0.,  0.],
       [ 0.,  0.],
       [ 0.,  0.]])

Use the *np.ones()* function to create an array full of 1s with required shape 

In [15]:
v = np.ones(5)
v

array([ 1.,  1.,  1.,  1.,  1.])

In [16]:
np.ones((2,2))

array([[ 1.,  1.],
       [ 1.,  1.]])

The default type for the above functions is float. Use the *dtype* parameter to tell NumPy we want an array of ints, not floats.

In [17]:
np.ones((2,4),dtype=int)

array([[1, 1, 1, 1],
       [1, 1, 1, 1]])

We can create an array corresponding to a sequence using the *arange()* function. For instance, create an array containing values starting at 2, ending before 9, in steps of 1.

In [18]:
v = np.arange(2,9)
v

array([2, 3, 4, 5, 6, 7, 8])

We can also use a different step size. For instance, create an array starting at 5, ending before 60, in steps of size 10.

In [19]:
v = np.arange( 5, 60, 10 )
v

array([ 5, 15, 25, 35, 45, 55])

The range and step sizes do not have to be integers. We can also specify floats:

In [20]:
x = np.arange(0.5, 9.4, 1.3)
x

array([ 0.5,  1.8,  3.1,  4.4,  5.7,  7. ,  8.3])

The *linspace()* function creates an array with a specified number of evenly-spaced samples in a given range. For example, we can divide up the range [1,10] into 4 evenly-spaced values, including the endpoints:

In [21]:
np.linspace(1, 10, 4)

array([  1.,   4.,   7.,  10.])

In [22]:
np.linspace(1, 20, 7)

array([  1.        ,   4.16666667,   7.33333333,  10.5       ,
        13.66666667,  16.83333333,  20.        ])

### Array Shape Manipulation

We can change the shape of an array. The original values are copied to a new array with the specified shape, so the original array is not affected.

In [23]:
x = np.arange(0,12)
x

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])

In [24]:
m = x.reshape(3,4)
m

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

Note that the size of the reshaped array has to be same as the original.

In [25]:
y = np.ones(4)
y

array([ 1.,  1.,  1.,  1.])

In [26]:
y.reshape(2,2)

array([[ 1.,  1.],
       [ 1.,  1.]])

### Accessing Values

To access a value in a 1D array, specify the position *[i]* counting from 0.

In [27]:
a = np.array( [5,18,3,12,20,0,24] )
a[2]

3

Using a negative number allows to access values from the end of the array in reverse:

In [28]:
a[-1]

24

We can also use this notation to change the values in an existing array.

In [29]:
a[0] = 100
a

array([100,  18,   3,  12,  20,   0,  24])

When working with arrays with more than 1 dimension, use the notation *[i,j]*, where the position in each dimension is separated by commas.

In [30]:
r1 = [ 5, 9, 2, 11 ]
r2 = [ 0, 5, 6, 4 ]
m = np.array( [ r1, r2 ] )
m

array([[ 5,  9,  2, 11],
       [ 0,  5,  6,  4]])

In [31]:
m[0,1]

9

In [32]:
m[1,3]

4

In [33]:
m[0,3] = 200
m

array([[  5,   9,   2, 200],
       [  0,   5,   6,   4]])

NumPy provides concise syntax to access sub-arrays via slicing. This creates a "view" on the original array, not a copy. Slicing 1D NumPy arrays works just like slicing Python lists, using the *[i:j]* notation:

In [34]:
a = np.array([4,7,3,5,1,8])
a

array([4, 7, 3, 5, 1, 8])

In [35]:
# Start at position 2, end before 4
a[2:4]

array([3, 5])

In [36]:
# From position 2 onwards
a[2:]

array([3, 5, 1, 8])

In [37]:
# Stop before position 4
a[:4]

array([4, 7, 3, 5])

Again we can also use this notation to change values in a slice of the array.

In [38]:
# Set everything from position 3 onwards to 0
a[3:] = 0
a

array([4, 7, 3, 0, 0, 0])

For multidimensional arrays, we specify the slices for each dimension, separated by commas - e.g. for 2D *[i:j,p:q]*

In [39]:
r1 = [ 5, 9, 2, 11 ]
r2 = [ 0, 5, 6, 4 ]
r3 = [ 1, 8, 13, 16 ]
m = np.array( [ r1, r2, r3 ] )
m

array([[ 5,  9,  2, 11],
       [ 0,  5,  6,  4],
       [ 1,  8, 13, 16]])

In [40]:
m[0:2,1:3]

array([[9, 2],
       [5, 6]])

In [41]:
# Get a full row
m[0,:]

array([ 5,  9,  2, 11])

In [44]:
# Get a full column
m[:,2]

array([ 2,  6, 13])

### Basic Array Operations

We can run batch operations on NumPy arrays without writing for loops. These operations create a new copy of the original array.

In [45]:
d = np.array([[1,4,2], [9,8,2]])
d

array([[1, 4, 2],
       [9, 8, 2]])

In [46]:
d * 5

array([[ 5, 20, 10],
       [45, 40, 10]])

In [47]:
d / 2

array([[ 0.5,  2. ,  1. ],
       [ 4.5,  4. ,  1. ]])

In [48]:
d + 1

array([[ 2,  5,  3],
       [10,  9,  3]])

In [49]:
1.0/d

array([[ 1.        ,  0.25      ,  0.5       ],
       [ 0.11111111,  0.125     ,  0.5       ]])

In [50]:
# note this is multiplying corresponding elements together
d * d

array([[ 1, 16,  4],
       [81, 64,  4]])

We can also apply functions to all elements in an array.

In [51]:
# calculate the log of every element in d
np.log(d)

array([[ 0.        ,  1.38629436,  0.69314718],
       [ 2.19722458,  2.07944154,  0.69314718]])

In [52]:
# apply square root to every element in d
np.sqrt(d)

array([[ 1.        ,  2.        ,  1.41421356],
       [ 3.        ,  2.82842712,  1.41421356]])

We can use standard boolean expressions in batch to all elements in an array. The result is a new boolean array of the same shape.

In [53]:
# which elements are greater than 2?
d > 2

array([[False,  True, False],
       [ True,  True, False]], dtype=bool)

In [54]:
# return the values of the elements that are greater than 2
d[d>2]

array([4, 9, 8])

In [55]:
# update the values that are less than 3
d[d<3] = -1
d

array([[-1,  4, -1],
       [ 9,  8, -1]])

### Basic Statistics

NumPy arrays also have basic descriptive statistics functions.

In [56]:
m = np.linspace(1, 20, 5)
m

array([  1.  ,   5.75,  10.5 ,  15.25,  20.  ])

In [57]:
m.mean()

10.5

In [58]:
m.max()

20.0

In [59]:
m.min()

1.0

For multidimensional arrays, the above can also take an optional axis parameter. If this is specified, calculations are only performed along that axis (dimension) and the result is a new array.

In [60]:
d = np.array([[5,4,0],[0,1,2]])
d

array([[5, 4, 0],
       [0, 1, 2]])

In [61]:
# mean of all values
d = np.array([[5,4,0],[0,1,2]])
d.mean()

2.0

In [62]:
# Mean for each of the 3 columns
d.mean(axis=0)

array([ 2.5,  2.5,  1. ])

In [65]:
# Mean for each of the 2 rows
d.mean(axis=1)

array([ 3.,  1.])