# NumPy Tutorial

NumPy (short for Numerical Python) is the foundational package for scientific computing in Python. It provides:
- Multidimensional array object called *ndarray*
- Functions for element-wise computations with arrays or mathematical operations between arrays
- Tools for reading and writinga array-based data sets to disk
- Linear algebra operations, Fourier transform and random number generation

First we are going to import the numpy library

In [1]:
import numpy as np

In [8]:
n = 8
m = 9

In [10]:
n/m

0.8888888888888888

# The NumPy ndarray

An ndarray is a generic multidimensional container for homogeneous data; that is, all of the elements must be the same type. Every array has a *shape*, a tuple indicating the size of each dimension, and a *dtype*, an object describing the data type of the array.

## Creating ndarrays

The easiest way is using the *np.array* function, which inputs any sequence-like object and outputs new ndarray containing the passed data

In [3]:
# Convert simple sequence of numbers to ndarray
data1 = [6, 7.5, 8, 0, 1]
ndarray1 = np.array(data1)

# Convert a list of equal length lists to ndarray
data2 = [[1,2,3,4], [5,6,7,8]]
ndarray2 = np.array(data2)

The created ndarray has three important attributes:
- Data Type: array.dtype
- Number of dimensions: array.ndim
- Shape: array.shape

In [14]:
### data type (inferred from input)
print('Data type of first array: {}'.format(ndarray1.dtype))
print('Data type of second array: {}'.format(ndarray2.dtype))

Data type of first array: float64
Data type of second array: int64


In [15]:
### number of dimensions
print('Number of dims of first array: {}'.format(ndarray1.ndim))
print('Number of dims of second array: {}'.format(ndarray2.ndim))

Number of dims of first array: 1
Number of dims of second array: 2


In [17]:
print('Shape of first array: {}'.format(ndarray1.shape))
print('Shape of second array: {}'.format(ndarray2.shape))

Shape of first array: (5,)
Shape of second array: (2, 4)


There are other functions to create ndarrays:

- *np.array*: Convert input data (list, tuple, array, or other sequence type) to an ndarray either by
inferring a dtype or explicitly specifying a dtype. Copies the input data by default.

In [44]:
np.array([1,2,3], dtype='float64')

array([1., 2., 3.])

- *np.asarray*: Convert input to ndarray, but do not copy if the input is already an ndarray (of the same dtype if specified)

In [42]:
data = [1,2,3]
np.asarray(data)

array([1, 2, 3])

- *np.arange*: Like the built-in range but returns an ndarray instead of a list.

In [41]:
np.arange(15)

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

- *np.ones*: Produce an array of all 1’s with the given shape and dtype.
- *np.ones_like*: Takes another array and produces a ones array of the same shape and dtype

In [45]:
np.ones(shape = (2,3), dtype='float64')

array([[1., 1., 1.],
       [1., 1., 1.]])

In [46]:
np.ones_like(np.array([1,2,3]))

array([1, 1, 1])

- *np.zeros, np.zeros_like*: same as *np.ones, np.ones_like*
- *np.empty, np.empty_like*: same as *np.ones, np.ones_like* but does not populate with any values
- *np.eye, np.identity*: create a square NxN identity matrix

In [50]:
np.eye(6)

array([[1., 0., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0., 0.],
       [0., 0., 1., 0., 0., 0.],
       [0., 0., 0., 1., 0., 0.],
       [0., 0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 0., 1.]])

## Data Types for ndarrays 

- int8, int6, int32, int64: n-bit integer types
- float8, ... , float64: n-bit float types
- bool: boolean type storing True and False values
- object: Python object type
- string_: fixed-length string type (1 byte per character)
- unicode_: fixed-length unicode type

In [58]:
numeric_strings = np.array(['1.255', '-9.6', '42'], dtype='string_')
numeric_strings

array([b'1.255', b'-9.6', b'42'], dtype='|S5')

- *astype*: copies the old array into a new array changing the data type

In [60]:
floats = numeric_strings.astype('float64')
floats

array([ 1.255, -9.6  , 42.   ])

You can also specify the data type or shape of a new array by refering to the data type or shape of another array

In [61]:
array = np.array([1,2,3], dtype='float32')
floats.astype(array.dtype)

array([ 1.255, -9.6  , 42.   ], dtype=float32)

# Operations between arrays and scalars 

Arrays are important because they enable you to express batch operations on data without writing any for loops. This is usually called vectorization.

- array-array: any arithmetic operation between equal-size arrays applies the operations element-wise

In [62]:
arr = np.array([[1,2,3], [4,5,6]], dtype='float32')

In [63]:
arr

array([[1., 2., 3.],
       [4., 5., 6.]], dtype=float32)

In [65]:
arr * arr

array([[ 1.,  4.,  9.],
       [16., 25., 36.]], dtype=float32)

In [66]:
arr - arr

array([[0., 0., 0.],
       [0., 0., 0.]], dtype=float32)

- array-scalar: propagates the value to each element

In [67]:
1 / arr

array([[1.        , 0.5       , 0.33333334],
       [0.25      , 0.2       , 0.16666667]], dtype=float32)

In [71]:
arr * 0.5

array([[0.5, 1. , 1.5],
       [2. , 2.5, 3. ]], dtype=float32)

In [73]:
# a ** b means a to the power b
arr ** 0.5

array([[1.       , 1.4142135, 1.7320508],
       [2.       , 2.236068 , 2.4494898]], dtype=float32)

# Indexing and slicing

Let's start with one-dimensional arrays (similar to lists). Indexing refers to accessing a position in the array, which is specified with a scalar counting the position (starting with 0). Slicing refers to viewing a segment of the array and it is done by specifying the start and end index like this: array[0:4].

In [75]:
arr = np.arange(10)
arr

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [87]:
# Arrays start indexing in zero
arr[0]

0

In [76]:
# View the 5th, 6th, 7th element
arr[5:8]

array([5, 6, 7])

In [83]:
# View the first 4 elements
arr[:4]

array([0, 1, 2, 3])

In [82]:
# View all except the first 4 elements
arr[4:]

array([4, 5, 6, 7, 8, 9])

In [84]:
# View the last 4 elements
arr[-4:]

array([6, 7, 8, 9])

In [86]:
# View all except last 4 elements
arr[:-4]

array([0, 1, 2, 3, 4, 5])

In [88]:
# Assigning a value to a slice propagates it to the entire selection
arr[5:8] = 12
arr

array([ 0,  1,  2,  3,  4, 12, 12, 12,  8,  9])

An important first distinction from lists is that array slices are views on the original array. This means that the data is not copied, and any modifications to the view will be reflected in the source array:

In [91]:
arr = np.arange(12)
arr

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])

In [93]:
arr_slice = arr[5:8]
arr_slice

array([5, 6, 7])

In [94]:
# Modifying arr_slice will also modify arr
arr_slice[0] = 17
arr

array([ 0,  1,  2,  3,  4, 17,  6,  7,  8,  9, 10, 11])

Now let's move on to higher dimensional arrays. In 2-dim arrays, the elements at each index are not scalars, but one-dimensional arrays:

In [96]:
arr2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
arr2d

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [97]:
# Acessing the second row
arr2d[1]

array([4, 5, 6])

In [98]:
# Acessing the first element of the second row
arr2d[1][0]

4

In [99]:
# Alternative
arr2d[1,0]

4

In [100]:
# Acessing first column
arr2d[:,0]

array([1, 4, 7])

Higher dimensional objects give you more options as you can slice one or more axes
and also mix integers. Consider the 2D array above, arr2d . Slicing this array is a bit
different:

In [101]:
# Slice with the last element in the first two rows
arr2d[:2, -1:]

array([[3],
       [6]])

Let’s consider an example where we have some data in an array and an array of names with duplicates. I’m going to use here the randn function in numpy.random to generate some random normally distributed data:

In [102]:
names = np.array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'])

In [105]:
data = np.random.randn(7, 4)

In [108]:
names.shape

(7,)

Suppose each name corresponds to a row in the data array. If we wanted to select all the rows with corresponding name 'Bob' . Like arithmetic operations, comparisons (such as == ) with arrays are also vectorized. Thus, comparing names with the string 'Bob' yields a boolean array:

In [110]:
names == 'Bob'

array([ True, False, False,  True, False, False, False])

This boolean array can be passed when indexing the array:

In [111]:
data[names == 'Bob']

array([[ 0.36579341,  0.68272488,  0.8669413 , -0.16915819],
       [ 0.56578582,  0.45330712,  0.31845381,  0.75427909]])

The boolean array must be of the same length as the axis it’s indexing. You can even mix and match boolean arrays with slices or integers (or sequences of integers, more on this later):

In [113]:
data[names == 'Bob', :2] # select the first two items of the rows that correspond to Bob 

array([[0.36579341, 0.68272488],
       [0.56578582, 0.45330712]])

To select everything but Bob:

In [114]:
data[names != 'Bob']

array([[-0.85706009, -0.79362786,  1.53013811, -1.86416916],
       [ 1.6205412 ,  0.12010595,  0.56295024, -0.24123117],
       [ 0.12253316,  0.23203907,  1.03778728,  0.30614161],
       [-1.39856729,  1.1760835 ,  1.70454921,  0.64196264],
       [ 0.89771368,  0.94430259, -0.39174445,  1.27815526]])

In [116]:
data[~(names == 'Bob')]

array([[-0.85706009, -0.79362786,  1.53013811, -1.86416916],
       [ 1.6205412 ,  0.12010595,  0.56295024, -0.24123117],
       [ 0.12253316,  0.23203907,  1.03778728,  0.30614161],
       [-1.39856729,  1.1760835 ,  1.70454921,  0.64196264],
       [ 0.89771368,  0.94430259, -0.39174445,  1.27815526]])

Selecting two of the three names to combine multiple boolean conditions, use boolean arithmetic operators like & (and) and | (or):

In [119]:
mask = (names == 'Bob') | (names == 'Will')
mask

array([ True, False,  True,  True,  True, False, False])

In [120]:
data[mask]

array([[ 0.36579341,  0.68272488,  0.8669413 , -0.16915819],
       [ 1.6205412 ,  0.12010595,  0.56295024, -0.24123117],
       [ 0.56578582,  0.45330712,  0.31845381,  0.75427909],
       [ 0.12253316,  0.23203907,  1.03778728,  0.30614161]])

More examples of indexing with boolean arrays:

In [123]:
# Setting all the negative numbers to zero
data[data < 0] = 0
data

array([[0.36579341, 0.68272488, 0.8669413 , 0.        ],
       [0.        , 0.        , 1.53013811, 0.        ],
       [1.6205412 , 0.12010595, 0.56295024, 0.        ],
       [0.56578582, 0.45330712, 0.31845381, 0.75427909],
       [0.12253316, 0.23203907, 1.03778728, 0.30614161],
       [0.        , 1.1760835 , 1.70454921, 0.64196264],
       [0.89771368, 0.94430259, 0.        , 1.27815526]])

In [125]:
data[names != 'Joe'] = 19
data

array([[19.        , 19.        , 19.        , 19.        ],
       [ 0.        ,  0.        ,  1.53013811,  0.        ],
       [19.        , 19.        , 19.        , 19.        ],
       [19.        , 19.        , 19.        , 19.        ],
       [19.        , 19.        , 19.        , 19.        ],
       [ 0.        ,  1.1760835 ,  1.70454921,  0.64196264],
       [ 0.89771368,  0.94430259,  0.        ,  1.27815526]])

# Fancy indexing 

Fancy indexing is a term adopted by NumPy to describe indexing using integer arrays. 
**Fancy indexing (unlike slicing) always copies the data into a new array!**

In [145]:
# Define array
arr = np.empty( (8,4) )
for i in range(8):
    arr[i] = i
arr

array([[0., 0., 0., 0.],
       [1., 1., 1., 1.],
       [2., 2., 2., 2.],
       [3., 3., 3., 3.],
       [4., 4., 4., 4.],
       [5., 5., 5., 5.],
       [6., 6., 6., 6.],
       [7., 7., 7., 7.]])

In [129]:
# Select rows number 4, 3, 1
arr[[4,3,1]]

array([[4., 4., 4., 4.],
       [3., 3., 3., 3.],
       [1., 1., 1., 1.]])

In [130]:
# Select rows number 3, 5, 7 (counting backwards)
arr[[-3,-5,-7]]

array([[5., 5., 5., 5.],
       [3., 3., 3., 3.],
       [1., 1., 1., 1.]])

In [147]:
# Define another array
arr = np.arange(32).reshape((8,4))
arr

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15],
       [16, 17, 18, 19],
       [20, 21, 22, 23],
       [24, 25, 26, 27],
       [28, 29, 30, 31]])

In [134]:
# This returns position [1,0], [5,3], [7,1], [2,2]
arr[[1,5,7,2], [0,3,1,2]]

array([ 4, 23, 29, 10])

In [138]:
# This returns rows 1,5,7,2 with columns reordered
arr[[1,5,7,2]][:, [0,3,1,2]]

array([[ 4,  7,  5,  6],
       [20, 23, 21, 22],
       [28, 31, 29, 30],
       [ 8, 11,  9, 10]])

# Transposing arrays and swapping axes 

In [151]:
arr = np.arange(15).reshape((3,5))
arr

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

Transposing is a special case of reshaping. Neither transposing nor reshaping copy the data, they only make a view of the same array.

In [152]:
arr.T

array([[ 0,  5, 10],
       [ 1,  6, 11],
       [ 2,  7, 12],
       [ 3,  8, 13],
       [ 4,  9, 14]])

In [153]:
arr.reshape((5,3))

array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11],
       [12, 13, 14]])

It can be used for example to do the inner matrix product X^T·X using np.dot:

In [161]:
# np.random.randn returns a sample of the standard normal distribution
# (univariate)
x = np.random.randn(6,3)
np.dot(x.T, x)

array([[ 8.38274644,  1.84519496, -5.72734767],
       [ 1.84519496,  0.90777276, -1.38798996],
       [-5.72734767, -1.38798996,  4.88874501]])

For higher dimensional arrays, transpose will accept a tuple of axis numbers to permute the axes (for extra mind bending):

In [162]:
arr = np.arange(16).reshape((2,2,4))
arr

array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7]],

       [[ 8,  9, 10, 11],
        [12, 13, 14, 15]]])

In [163]:
arr.transpose((1,0,2))

array([[[ 0,  1,  2,  3],
        [ 8,  9, 10, 11]],

       [[ 4,  5,  6,  7],
        [12, 13, 14, 15]]])

To swap two axes there's also a method called *.swapaxes* (it doesn't copy data either)

In [165]:
arr.swapaxes(0,1)

array([[[ 0,  1,  2,  3],
        [ 8,  9, 10, 11]],

       [[ 4,  5,  6,  7],
        [12, 13, 14, 15]]])

# Universal Functions: Fast Element-Wise Array Functions 

A universal function, or *ufunc*, is a function that performs elementwise operations on data in ndarrays.

Many *ufuncs* are simple elementwise transformations. Some *unary* functions (only take one element) are sqrt or exp:

In [166]:
arr = np.arange(10)

In [167]:
# square root
np.sqrt(arr)

array([0.        , 1.        , 1.41421356, 1.73205081, 2.        ,
       2.23606798, 2.44948974, 2.64575131, 2.82842712, 3.        ])

In [168]:
# exponential function
np.exp(arr)

array([1.00000000e+00, 2.71828183e+00, 7.38905610e+00, 2.00855369e+01,
       5.45981500e+01, 1.48413159e+02, 4.03428793e+02, 1.09663316e+03,
       2.98095799e+03, 8.10308393e+03])

Some *binary* functions (take two arrays and return one single array) are:

In [169]:
x = np.random.randn(8)
y = np.random.randn(8)

In [170]:
 x

array([ 0.12139115,  0.10433766, -0.94962749,  1.66274984, -0.76287765,
        1.69046458, -0.08698191,  0.81301035])

In [171]:
y

array([-0.95621874, -0.22907944, -1.00866468,  0.68175572, -1.02737577,
       -0.70570079, -1.90704188,  2.2602543 ])

In [172]:
np.maximum(x, y) # element-wise maximum

array([ 0.12139115,  0.10433766, -0.94962749,  1.66274984, -0.76287765,
        1.69046458, -0.08698191,  2.2602543 ])

In [173]:
np.add(x, y) # element-wise addition

array([-0.8348276 , -0.12474178, -1.95829217,  2.34450557, -1.79025342,
        0.98476379, -1.9940238 ,  3.07326465])

There are also *unary* functions that 