# Introduction

It turns out that lists are not perfect for numerical computations. Due to the way they are implemented, computations with lists are sometimes too slow, not taking advantage of some characteristics of modern CPUs.

The module `numpy` ("numerical python") contains data structures and functions better suited for scientific computing. The most important of such structures is `array`, that is used to represent an array with one or more dimensions.

To use the module `numpy`, it must first be imported. This is automatically done when we load `pylab`:

In [1]:
import numpy as np

# Constructing arrays

The most straightforward method to construct an array is to define it from a regular Python list:

In [2]:
u = np.array([2,5,-1,4,7,0])
u

array([ 2,  5, -1,  4,  7,  0])

There is a very important point in which arrays are different from lists. Arrays ara _homogeneous_ data structures, that is, it's elements must be all of the same type. This is indicated by the arrays `dtype`. The `d` in `dtype` stands for "data", that is, `dtype` indicates what is the data type of elements of the array:

In [3]:
u.dtype

dtype('int64')

There is another important point here: usually we _don't_ want to work with integers when doing numerical work. To create an array of floats, we have two possible syntaxes:

In [4]:
v = np.array([2.,5.,-1.,4.,7.,0.])
v

array([ 2.,  5., -1.,  4.,  7.,  0.])

In [5]:
w = np.array([2,5,-1,4,7,0], dtype=np.float64)
w

array([ 2.,  5., -1.,  4.,  7.,  0.])

If you are getting unexpected results when working with arrays, the first thing to check is if the arrays in your code have the correct `dtype`.

Arrays use the same syntax as lists to refer to elements and slices:

In [6]:
v[3]

4.0

In [7]:
v[2] = 10
v

array([ 2.,  5., 10.,  4.,  7.,  0.])

(Notice the automatic conversion to `float`.)

In [8]:
v[1:4]

array([ 5., 10.,  4.])

In [9]:
v[0:5:2]

array([ 2., 10.,  7.])

Arrays have a powerful indexing facility that is not available for lists. Let's say we need to extract the elements with indexes 2, 5 and 1, in this order. This is how it can be done:

In [10]:
v[ [2, 5, 1] ]

array([10.,  0.,  5.])

Notice how the indexes are specified in the input cell. We are using the _list_ `[2, 5, 1]` as the index for the array. We can actually even use an array as the index for an array with similar results.

As with lists, we can define array ranges, with the `arange` function:

In [11]:
t = np.arange(2, 10)
t

array([2, 3, 4, 5, 6, 7, 8, 9])

Notice that this is an array of integers. We can specify the `dtype` to make this an array of floats:

In [12]:
t = np.arange(2, 10, dtype=np.float64)
t

array([2., 3., 4., 5., 6., 7., 8., 9.])

This is actually one of the main advantages of an `arange` over a `range`: we can have floats in an `arange`. As with a `range`, we can specify a step with a third argument:

In [13]:
r = np.arange(3, 7, .4)
r

array([3. , 3.4, 3.8, 4.2, 4.6, 5. , 5.4, 5.8, 6.2, 6.6])

Notice that we didn't need to specify the `dtype`. `numpy` is smart enough to realize that this should be an array of floats.

Finaly, there is `linspace`, which is very useful to construct graphs. `linspace` generates a specified number of equally spaced points between two given values. For example, to get 5 equally spaced points between 2 and 7, use:

In [14]:
p = np.linspace(2, 7, 5)
p

array([2.  , 3.25, 4.5 , 5.75, 7.  ])

Notice that the endpoint 7 _is included_ in the array. This is an exception to the normal rule for sequence-type data structures in Python, but it is a convenient one. `linspace` is designed for case in which we need a regular grid of points in an interval, which is very common in numerical computing.

# Array functions and methods

Many of the list functions and methods have array counterparts:

In [15]:
len(v)

6

In [16]:
v.sort()
v

array([ 0.,  2.,  4.,  5.,  7., 10.])

However, some don't:

In [17]:
v.append(3)

AttributeError: 'numpy.ndarray' object has no attribute 'append'

There is a good reason for that: arrays can be multidimensional, and it does not make much sense to append a single element to a multidimensional data structure. (To which dimension should it be appended?) Somewhat confusingly, one of these methods is `append`, but it does not work as expected:

In [18]:
np.append(v,3)
v

array([ 0.,  2.,  4.,  5.,  7., 10.])

The `numpy` function `append` requires the elements to be appended to be given in a list or array. So, to append a single element to a one dimensional array, we still need do place it in a list:

In [19]:
np.append(v,[3])

array([ 0.,  2.,  4.,  5.,  7., 10.,  3.])

# Higher dimensional arrays

Arrays can have any number of dimensions. The following defines a two-dimensional array (also known as a matrix, of course):

In [20]:
A = np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12]])
A

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

When accessing the elements of a multidimensional array, remember that all dimensions start at the index 0:

In [21]:
A[0][2]

3

We can also use the more familiar matrix notation for indexing

In [22]:
A[1,2]

7

Arrays have a _shape_, which is a tuple containing the size of each dimension:

In [23]:
A.shape

(3, 4)

The _size_ of an array is the total number of elements:

In [24]:
A.size

12

Slices in multidimensional arrays are very powerful. We can select almost any kind of subarray we can imagine.

In [25]:
A[1:3, 0:2]

array([[ 5,  6],
       [ 9, 10]])

The following are common idioms for extracting rows and columns:

In [26]:
A[1,:]  # Second row

array([5, 6, 7, 8])

In [27]:
A[:,2]  # Third column

array([ 3,  7, 11])

Notice that the row seems to be transposed. There is a good reason for that, having to do with how `numpy` interprets vectors.

`numpy` has several predefined functions designed to create commonly used arrays:

In [28]:
np.zeros((2,3))   # 2 x 3 matrix of zeros

array([[0., 0., 0.],
       [0., 0., 0.]])

In [29]:
np.ones((4,3))   # 4 x 3 matrix of ones

array([[1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.]])

In [30]:
np.eye(2)   # 2 x 2 identity matrix

array([[1., 0.],
       [0., 1.]])

# Array operations

`numpy` defines most arithmetic operations between arrays. Here are a few examples:

In [31]:
u = np.array([2.,-1.,0.,2.])
v = np.array([3.,6.,-4.,1.])

In [32]:
u + v

array([ 5.,  5., -4.,  3.])

In [33]:
u * v

array([ 6., -6., -0.,  2.])

In [34]:
u / v

array([ 0.66666667, -0.16666667, -0.        ,  2.        ])

In [35]:
-u

array([-2.,  1., -0., -2.])

In [36]:
3 * u

array([ 6., -3.,  0.,  6.])

In [37]:
v ** 2

array([ 9., 36., 16.,  1.])

Notice that all operations are element by element. In particular, multiplication is _not_ the vector dot product. For this, use the function `dot`:

In [38]:
np.dot(u,v)

2.0

This is also available as a method, which is useful sometimes:

In [39]:
u.dot(v)

2.0

Using the operator `*` with multidimensional arrays is tricky, it uses a convention called _broadcasting_ that is not completely intuitive. So, if you want to do matrix multiplication, always use the `dot()` method. The matrices must be of compatible size, of course. For example:

In [40]:
A = np.array([[1,-1,-2,0],[2,0,3,-4]], dtype=np.float64)
A

array([[ 1., -1., -2.,  0.],
       [ 2.,  0.,  3., -4.]])

In [41]:
B = np.array([[1,3],[-1,4],[0,5],[-3,2]], dtype=np.float64)

In [42]:
A.dot(B)

array([[  2., -11.],
       [ 14.,  13.]])

There also is a subclass `matrix` of `array` that defines operations and methods that are useful for doing linear algebra. It is less efficient than plain `array`s, but it is handy for quick computations.

# Vectorized functions

A very useful characteristic of arrays are _vectorized functions_. These are functions that operate on every element of an array individually. `numpy` defines vectorized versions of all elementary functions. For example, let's say we need the sines of the angles $0$, $\pi/3$, $2\pi/3$ up to $\pi$. We first generate an array with the points we need: 

In [43]:
xvalues = np.arange(0, np.pi + 0.1, np.pi/3)
xvalues

array([0.        , 1.04719755, 2.0943951 , 3.14159265])

(Question: why did we use `np.pi + 0.1` as the upper bound for the `arange`?)

Now, to compute the sines of these angles, simply use:

In [44]:
yvalues = np.sin(xvalues)
yvalues

array([0.00000000e+00, 8.66025404e-01, 8.66025404e-01, 1.22464680e-16])

(Question: shouldn't $\sin(\pi)$ be zero? Why is it not?)

Some of the vectorized functions defined by `numpy` are:

- Square, square root, absolute value: `square`, `sqrt`, `abs`.
- Trigonometric and inverse trigonometric functions: `sin`, `cos`, `tan`, `arcsin`, `arccos`, `arctan`.
- Exponential and logarithmic functions: `exp`, `log`, `log10`, `log2`.
- Hyperbolic functions: `sinh`, `cosh`, `tanh`, `arcsinh`, `arccos`, `arctanh`.
- Rounding: `round`, `trunc`, `floor`, `ceil`.

Check the documentation for a complete list, as well as specifics for individual functions.

Notice that `log` is the natural logarithm:

In [45]:
np.log(10)

2.302585092994046

`log10` is the logarithm in base 10:

In [46]:
np.log10(10)

1.0

Finally, `log2` is the logarithm in base two. To compute a logarithm in an arbitrary base, you have to use the change of base formula. For example, the logarithm of 531441
in base 3, do:

In [47]:
np.log(531441)/np.log(3)

12.0

If you ever used the `math` module in plain Python, notice that the `log` function behaves differently.

Operations are also vectorized. For example:

In [48]:
u = np.array([2, 5, 1, -3], dtype=np.float64)
v = np.array([4, 1, -2, 0], dtype=np.float64)
u, v

(array([ 2.,  5.,  1., -3.]), array([ 4.,  1., -2.,  0.]))

In [49]:
u + v

array([ 6.,  6., -1., -3.])

In [50]:
u - v

array([-2.,  4.,  3., -3.])

In [51]:
u * v

array([ 8.,  5., -2., -0.])

In [52]:
2 * u

array([ 4., 10.,  2., -6.])

In [53]:
u ** 3

array([  8., 125.,   1., -27.])

In [54]:
3 ** u

array([9.0000000e+00, 2.4300000e+02, 3.0000000e+00, 3.7037037e-02])

In [55]:
u ** v

array([16.,  5.,  1.,  1.])

Notice that all operations are defined _componentwise_, so the arrays must have the same dimension (unless one of the operands is a scalar). 

In particular, notice that `*` is _not_ the dot product of vectors. To compute that, use the `dot` function:

In [56]:
np.dot(u,v)

11.0

Vectorized operations and functions should be used whenever possible, since this is the most efficient way to do computations with arrays. For example, suppose that `x` and `y` contain the coordinates of a list of points:

In [57]:
x = np.linspace(2, 6, 5)
y = np.linspace(-3, 4, 5)
x, y

(array([2., 3., 4., 5., 6.]), array([-3.  , -1.25,  0.5 ,  2.25,  4.  ]))

To compute an array with the distances from the points to the origin, we can use:

In [58]:
distances = (x - y) ** 2
distances

array([25.    , 18.0625, 12.25  ,  7.5625,  4.    ])

# Exercises

# What you learned in this lesson

# Further information