# NumPy Basics: Arrays and Vectorized Computation

In [1]:
# print all the outputs in a cell
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

Let us import numpy as <i>np</i>. From now on, we can use np as a shortcut for the numpy package

In [3]:
import numpy as np

Let us show at most 4 decimals

In [4]:
np.set_printoptions(precision=4, suppress=True)

## numpy.ndarray

The main object type in numpy is the <b>ndarray</b>. It is an array (or matrix) where the items are of the same type. Internally, it is optimized to store and retrieve data faster than a Python list. It can have more than one dimension (called axis); for example, a 2-dimensional matrix has 2 dimensions: axis 0 and axis 1. The type of the data stored in a ndarray is detected automatically.

## Create a ndarray

You can create a ndarray with any array-like object. Let us create a <b>one-dimensional</b> ndarray.

In [5]:
arr1 = np.array([1, 1.4, -0.5, 4.3, -3.9, -5.9, .1]) # with a list
arr1

array([ 1. ,  1.4, -0.5,  4.3, -3.9, -5.9,  0.1])

Let us now create a 2-by-4 ndarray

In [6]:
data2 = [[1, 2, 3, 4], [5, 6, 7, 8]] # data2 is a list of two elements, which are lists
arr2 = np.array(data2)
arr2

array([[1, 2, 3, 4],
       [5, 6, 7, 8]])

The rest of this notebook will be based on the 1-dimensional array <i>arr1</i> and the 2-dimensional array <i>arr2</i>

The attribute <b>shape</b> returns the shape of a ndarray

In [10]:
arr2.shape

(2L, 4L)

The attribute <b>dtype</b> returns the data type

In [11]:
arr2.dtype

dtype('int32')

The attribute <b>ndim</b> returns the number of dimensions.

In [12]:
arr2.ndim

2

We can cast the dtype of the ndarray to float64 with the method <b>astype</b>

In [12]:
arr2.astype(np.float64)

array([[ 1.,  2.,  3.,  4.],
       [ 5.,  6.,  7.,  8.]])

astype is also used to cast from strings to numbers

In [13]:
s = np.array(['4.3', '-0.8', '2.1'])
s

array(['4.3', '-0.8', '2.1'], 
      dtype='|S4')

In [14]:
s.astype(np.float64)

array([ 4.3, -0.8,  2.1])

The numpy functions <b>numpy.zeros</b> creates an array/matrix of zeros.

In [17]:
np.zeros(4)

array([ 0.,  0.,  0.,  0.])

In [19]:
np.zeros((2,4,3))

array([[[ 0.,  0.,  0.],
        [ 0.,  0.,  0.],
        [ 0.,  0.,  0.],
        [ 0.,  0.,  0.]],

       [[ 0.,  0.,  0.],
        [ 0.,  0.,  0.],
        [ 0.,  0.,  0.],
        [ 0.,  0.,  0.]]])

The numpy function <b>arange</b> creates a ndarray with a given start, stop, and step values, just like the Python function <i>range</i>. But unlike range, arange works with floats.

In [19]:
np.arange(15)

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

In [20]:
np.arange(2.2, 7.89, 1.25)

array([ 2.2 ,  3.45,  4.7 ,  5.95,  7.2 ])

## Operations between ndarrays and scalars

The operations are applied to all component of the ndarray (broadcasting)

In [21]:
arr2

array([[1, 2, 3, 4],
       [5, 6, 7, 8]])

In [22]:
arr2 + 5

array([[ 6,  7,  8,  9],
       [10, 11, 12, 13]])

In [23]:
arr2 * 5

array([[ 5, 10, 15, 20],
       [25, 30, 35, 40]])

In [24]:
1/arr2.astype(np.float64)

array([[ 1.    ,  0.5   ,  0.3333,  0.25  ],
       [ 0.2   ,  0.1667,  0.1429,  0.125 ]])

In [25]:
arr2 ** 2

array([[ 1,  4,  9, 16],
       [25, 36, 49, 64]])

### Comparison operators

Important: we can compare arrays and scalar. We'll get an array of boolean back.

In [26]:
arr2

array([[1, 2, 3, 4],
       [5, 6, 7, 8]])

In [27]:
arr2 > 5

array([[False, False, False, False],
       [False,  True,  True,  True]], dtype=bool)

Example: find the positions of the elements of arr2 that are equal to 2

In [29]:
arr2 == 2

array([[False,  True, False, False],
       [False, False, False, False]], dtype=bool)

Example: find positions of elements of arr2 that are >= 2 and <= 6

In [28]:
(arr2 >= 2) & (arr2 <=6)

array([[False,  True,  True,  True],
       [ True,  True, False, False]], dtype=bool)

### Indexing and slicing on one-dimensional ndarrays

Indexing and slicing works similarly to indexing and slicing with regular array

In [29]:
arr1

array([ 1. ,  1.4, -0.5,  4.3, -3.9, -5.9,  0.1])

In [30]:
arr1[5]

-5.9000000000000004

In [31]:
arr1[3:]

array([ 4.3, -3.9, -5.9,  0.1])

The only big difference is that we can modify elements through broadcasting.

<b>Exercise (together)</b>: Let's make a copy first of <i>arr1</i> and let's call it <i>b</i>

In [32]:
# wrong way
b = arr1

In [33]:
b

array([ 1. ,  1.4, -0.5,  4.3, -3.9, -5.9,  0.1])

In [34]:
arr1

array([ 1. ,  1.4, -0.5,  4.3, -3.9, -5.9,  0.1])

In [35]:
b[0] = -100

In [36]:
b

array([-100. ,    1.4,   -0.5,    4.3,   -3.9,   -5.9,    0.1])

In [37]:
arr1

array([-100. ,    1.4,   -0.5,    4.3,   -3.9,   -5.9,    0.1])

Restore arr1

In [38]:
arr1[0] = 1

In [39]:
arr1

array([ 1. ,  1.4, -0.5,  4.3, -3.9, -5.9,  0.1])

The correct way to make a copy is as follows:

In [40]:
b = arr1.copy()

<b>In class exercise</b>: Modify <i>b</i> by setting its last 3 elements to -100

In [41]:
b[-3:] = -100
b

array([   1. ,    1.4,   -0.5,    4.3, -100. , -100. , -100. ])

### Indexing and slicing with multi-dimensional arrays

Let us now create the following 2-dimensional array, called <i>arr2d</i>: <br/>
1  2  3<br/>
4  5  6<br/>
7  8  9<br/>

In [42]:
arr2d = np.array([ [1,2,3], [4,5,6], [7,8,9]  ])

Select element (0,2)

In [43]:
arr2d[0,2]

3

Select the third row

In [44]:
arr2d[2,:]

array([7, 8, 9])

Select the third column

In [51]:
arr2d[:,2]

array([3, 6, 9])

In [52]:
type(arr2d[:,2])

numpy.ndarray

<b>In class exercise</b>: select the highlighted submatrix:<br/>
1  2  3<br/>
4  <b>5</b>  <b>6</b><br/>
7  <b>8</b>  <b>9</b><br/>

In [46]:
arr2d[1:,1:]

array([[5, 6],
       [8, 9]])

In [47]:
arr2d[1:3,1:3]

array([[5, 6],
       [8, 9]])

<b>In class exercise</b>: select the highlighted submatrix:<br/>
1  <b>2</b>  <b>3</b><br/>
4  5  6<br/>
7  <b>8</b>  <b>9</b><br/>

In [48]:
arr2d[ [0,2],1:]

array([[2, 3],
       [8, 9]])

### Boolean indexing

We can also select elements with boolean arrays. This is extremely important for filtering data

In [49]:
arr1

array([ 1. ,  1.4, -0.5,  4.3, -3.9, -5.9,  0.1])

The idea is to:
<ol>
<li> apply a comparison operator to the ndarray arr1. This operation will return a ndarray of bool</li>
<li> use the array if bools to select items of <i>arr1</i></li>
</ol>

<b>Example</b>: find all numbers greater than 0.5

In [50]:
arr1

array([ 1. ,  1.4, -0.5,  4.3, -3.9, -5.9,  0.1])

In [51]:
mask = arr1 > 0.5

In [52]:
mask

array([ True,  True, False,  True, False, False, False], dtype=bool)

In [53]:
arr1[mask]

array([ 1. ,  1.4,  4.3])

We are simply going to type

In [54]:
arr1[arr1 > 0.5]

array([ 1. ,  1.4,  4.3])

Before solving the next problem, let us make a copy of arr1 and let's call it b

In [55]:
b = arr1.copy()

<b>In class exercise</b>: modify b so that its negative elements are set to 0 (one line of code)

In [56]:
b

array([ 1. ,  1.4, -0.5,  4.3, -3.9, -5.9,  0.1])

In [58]:
b[b<0]=0
b

array([ 1. ,  1.4,  0. ,  4.3,  0. ,  0. ,  0.1])

#### Boolean indexing based on another ndarray

Let us generate another array which, like arr1, has length 7, but, unlike arr1, contains person names

In [60]:
names = np.array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'])
names

array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'], 
      dtype='|S4')

We can use conditions on the array <i>names</i> to select data in the array <i>arr1</i>. We proceed as follows:

1) We can compare a ndarray to a single value, and get an array of bool back

In [61]:
names == 'Bob'

array([ True, False, False,  True, False, False, False], dtype=bool)

2) We can use a ndarray of bool to select elements of arr1

In [62]:
names

array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'], 
      dtype='|S4')

In [63]:
arr1

array([ 1. ,  1.4, -0.5,  4.3, -3.9, -5.9,  0.1])

In [64]:
arr1[names == 'Bob']

array([ 1. ,  4.3])

#### Expressing conditional logic as array operations (where)

We saw that <i>arr1 > 0</i> returns a ndarray of bools. But what if instead of True/False I want +1 and -1?

In [65]:
arr1 > 0

array([ True,  True, False,  True, False, False,  True], dtype=bool)

In [66]:
np.where(arr1 > 0, 1, -1)

array([ 1,  1, -1,  1, -1, -1,  1])

In [67]:
arr1

array([ 1. ,  1.4, -0.5,  4.3, -3.9, -5.9,  0.1])

A more complicated use of where

In [68]:
np.where(arr1 > 0, [1,2,3,4,5,6,7],[-10,-100,-1000,-10000,-50000,-60000,-70000])

array([     1,      2,  -1000,      4, -50000, -60000,      7])

### Mathematical and statistical methods

The method <b>mean</b> computes the mean among all of the elements (or just on a given axis) <br/>
Other methods are <b>sum</b>, <b>min</b>, <b>max</b>, <b>cumsum</b>, <b>cumprod</b>, <b>std</b>

In [77]:
arr2

array([[1, 2, 3, 4],
       [5, 6, 7, 8]])

In [78]:
arr2.mean()

4.5

In [79]:
arr2.sum()

36

In [80]:
arr2.std()

2.2912878474779199

In [84]:
arr2.cumsum()

array([ 1,  3,  6, 10, 15, 21, 28, 36])

In [72]:
arr2.cumsum(axis = 1)

array([[ 1,  3,  6, 10],
       [ 5, 11, 18, 26]])

In [73]:
arr2.mean(axis=0)

array([ 3.,  4.,  5.,  6.])

In [74]:
arr2.mean(axis=1)

array([ 2.5,  6.5])

<b>In class exercise</b>: In one line of code, compute the average among those elements of arr2 greater than 3.0

In [78]:
arr2[arr2 > 3]

array([4, 5, 6, 7, 8])

In [90]:
arr2[arr2 > 3].mean()

6.0

<b>In class exercise</b>: 
<ol>
<li>Make a copy of arr1 and call it <i>x</i></li>
<li>Considering the ndarrays <i>names</i> and <i>x</i>, compute the maximum among the elements of x that correspond to Bob, and then subtract it to all elements of x.

In [81]:
x = arr1.copy()

In [82]:
names

array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'], 
      dtype='|S4')

In [83]:
x

array([ 1. ,  1.4, -0.5,  4.3, -3.9, -5.9,  0.1])

In [84]:
names == 'Bob'

array([ True, False, False,  True, False, False, False], dtype=bool)

In [85]:
x[names == 'Bob']

array([ 1. ,  4.3])

In [86]:
x[names == 'Bob'].max()

4.2999999999999998

In [87]:
x = x - x[names == 'Bob'].max()

In [88]:
x

array([ -3.3,  -2.9,  -4.8,   0. ,  -8.2, -10.2,  -4.2])