# Numpy essentials

Let's analyze the main features that makes **numpy** the most important package for Computational Physics in python.

In [1]:
import numpy as np

# using time to measure... time
from time import time

## Arrays vs lists

A raw python **list** is literally a list, and a list can have anything.

In [2]:
# a random list of things
my_list = ['dogs', 'cats', 1.9 - 3j, 1, 2, 3, 'bananas']
print('Heterogeneous data types:', my_list)

Heterogeneous data types: ['dogs', 'cats', (1.9-3j), 1, 2, 3, 'bananas']


In [3]:
# easy to append elements
my_list.append('apples')
my_list += [10.0]
print('List with two extra elements:', my_list)

List with two extra elements: ['dogs', 'cats', (1.9-3j), 1, 2, 3, 'bananas', 'apples', 10.0]


In [4]:
# indexing starts at 0 and runs backwards as well
print('The third element is:', my_list[2])
print('The last element is:', my_list[-1])

The third element is: (1.9-3j)
The last element is: 10.0


In [5]:
# indexing ranges [init:end:step] semi-open interval [init, end)
print('Elements third to sixth:', my_list[2:6])
print('Every two elements:', my_list[::2])

Elements third to sixth: [(1.9-3j), 1, 2, 3]
Every two elements: ['dogs', (1.9-3j), 2, 'bananas', 10.0]


Notice that the + operator plays the role of *append*, therefore multiplication by an integer leads to many appends:

In [6]:
my_list = [0, 1, 2]
print('Two lists:', my_list * 2) # = my_list + my_list

Two lists: [0, 1, 2, 0, 1, 2]


In [7]:
# but appends runs *in place*, while + or * does not
my_list.append(3)
print('Adding the element 3 gives:', my_list)

Adding the element 3 gives: [0, 1, 2, 3]


In [8]:
new_list = my_list + [4]
print('The previous list was:', my_list)
print('      The new list is:', new_list)

The previous list was: [0, 1, 2, 3]
      The new list is: [0, 1, 2, 3, 4]


On the other hand, a **numpy array** acts as a vector, matrix, or tensor:

In [9]:
vec1 = np.array([0, 1, 2]) # float by default
print('Multiply by 2 to get:', 2*vec1)

Multiply by 2 to get: [0 2 4]


In [10]:
vec2 = np.array([3, 4, 5])
print('Adding, adds as vectors:', vec1 + vec2)

Adding, adds as vectors: [3 5 7]


In [11]:
# elementwise: vec1 * vec2 = vec1[0]*vec2[0] + vec1[1]*vec2[1] + ...
print('Multiplication is elementwise:', vec1 * vec2)

Multiplication is elementwise: [ 0  4 10]


In [12]:
# vector products: dot vs vdot?
print('The dot product:', np.vdot(vec1, vec2))
print('The cross product:', np.cross(vec1, vec2))

The dot product: 14
The cross product: [-3  6 -3]


## Broadcasting, vectorizing, loops in C

Loops in python are slow. It's an interpreted language. So, to avoid loops in python, use numpy broadcasts that transfer the loop to its internal C compiled codes.

**Broadcasting:** loops over the elements of the array and applies the operation to all of them. Often, broadcasting can also take advantage of **threads** (e.g., matrix multiplications).

Let's use the package **time** to measure time and check three forms to do the same vector initialization:

In [13]:
n = 1000000

# 1) Using only python and lists
t0 = time()
y1 = []
for x in range(n):
    y1.append(x**2)
t1 = time()
print('List and appends:', t1-t0, ' seconds')

List and appends: 0.4214651584625244  seconds


In [14]:
# 2) Fill a numpy array with a for loop
t0 = time()
y2 = np.zeros(n)
for x in range(n):
    y2[x] = x**2
t1 = time()
print('Loop over predefined array:', t1-t0, ' seconds')

Loop over predefined array: 0.5192162990570068  seconds


In [15]:
# 3) Using numpy broadcasts
t0 = time()
xs = np.arange(n)
y3 = xs**2 # xs is the full array above, loop is implied elementwise
t1 = time()
print('Numpy broadcast:', t1-t0, ' seconds')

Numpy broadcast: 0.006101846694946289  seconds


Let's see another example with a **simple rectangle rule integral**

$$\int_a^b f(x) dx \approx \sum_{i=0}^{N-2} f(x_i) \Delta x, \text{ with } x_i = a + i \Delta x, \Delta x = \dfrac{b-a}{N-1}$$ 

**warning:** this is a really bad method for integrals. Using only to illustrate the broadcasting. **Numpy** and specially **scipy** have more efficient and precise methods implemented (e.g., `scipy.integrate.quad` uses Chebyshev expansions)

In [16]:
# using a lambda call to define the function to integrate
f = lambda x: 6*x*(1-x)

a = 0
b = 1
N = 10000000
dx = (b-a)/(N-1)

# 1) a direct for loop in python 
t0 = time()
res1 = 0
for i in range(N-1): # range goes from 0 to N-2, semi-open interval
    x = a + i*dx
    res1 += f(x)
res1 *= dx
t1 = time()
print('Direct loop in python. The result:', res1)
print('And the time:', t1-t0)

Direct loop in python. The result: 0.9999999999999637
And the time: 4.028289318084717


In [17]:
# 2) using broadcast and auxiliary calls
t0 = time()
xs = np.linspace(a, b, N)
res2 = np.sum(f(xs[:-1]))*dx
t1 = time()
print('Numpy broadcasting. The result:', res2)
print('And the time:', t1-t0)

Numpy broadcasting. The result: 0.9999999999999905
And the time: 0.14052867889404297


Above we are using the fact that the function `f(x)` is **vectorized**: this means that its arguments (only `x` in this case) allows for elementwise operations with the numpy internal loops written in C. Thus leading to huge speedups.

All numpy functions are vectorized: `np.sin(...), np.cos(...), np.exp(...)`, and so on. What would be a non-vectorized function?

In [18]:
def larger(x, y):
    if x > y:
        return x
    else:
        return y
     
# Let's test with simple numbers
print('Which is larger, 3 or 5?', larger(3, 5))
print('Which is larger, 5 or 3?', larger(5, 3))

Which is larger, 3 or 5? 5
Which is larger, 5 or 3? 5


In [19]:
# and now with arrays
a = np.array([4, 5, 6])
b = np.array([7, 2, 8])

print('Compare element by element will fail:', larger(a, b))

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

**How to fix it?**

**See also:** `np.greater(...)` and other comparisons within numpy.

In [None]:
# wrapper to vectorize loops over elements of the inputs
vlarger = np.vectorize(larger)

print('Not it will work:', vlarger(a, b))

## Creating vectors with arrays

Let's start with some simple methods to build arrays. Later will check generalization for matrices or tensors.

In [None]:
# both zeros and ones take the shape as argument
x = np.zeros(5)
y = np.ones(5)
print('x =', x)
print('y =', y)

# creating one "by hand", or converting from list to array
z = np.array([10, 5, 9, 15, 42])
print('z =', z)

There are two similar and useful calls to define arrays over ranges:

- `np.arange(a, b, dx)` creates an array over the semi-open interval `[a,b)` in steps of `dx`.
- `np.linspace(a, b, n)` creates an array over the full interval `[a,b]` with `n` points.

For **arange** the number of points will be $n = (b-a)/dx$.

For **linspace** the step between points will be $dx = (b-a)/(n-1)$ by default. But the option `endpoint=False` makes it work with an semi-open interval `[a,b)`, and it becomes compatible with **arange**, with $dx = (b-a)/n$.

Try it:

In [None]:
x = np.arange(0, 5, 1)
print('Using arange:')
print('n:', len(x))
print('dx:', x[1]-x[0])
print('x:', x)

In [None]:
x = np.linspace(0, 5, 5)
print('Using linspace with full interval:')
print('n:', len(x))
print('dx:', x[1]-x[0])
print('x:', x)

In [None]:
x = np.linspace(0, 5, 5, endpoint=False)
print('Using linspace with semi-open interval:')
print('n:', len(x))
print('dx:', x[1]-x[0])
print('x:', x)

## Indexing arrays

**Sintax:** `x[i:f:s]` refers to the range from index `i` to `f` (not inclusive) in steps of `s`

**First and last:** indexes start at 0, and the last one is -1.

### Vectors

Let's start with a 1D array of 15 elements and extract parts of it.

In [None]:
x = np.arange(11, 26) # semi-open interval [11, 26) with steps of 1 (implied)
print('x:', x)
print('n:', len(x))

In [None]:
print('The fifth element:', x[4]) # because it starts from zero

In [None]:
print('The first five elements:', x[:5]) # if initial index is implied: from the start

In [None]:
print('The last five elements:', x[-5:]) # negative counts backwards

In [None]:
print('All odd elements:', x[::2]) # start and end implied, steps of 2

In [None]:
print('All even elements:', x[1::2]) # start and 1, end implied, steps of 2

In [None]:
print('Elements from 5th to 8th:', x[4:8]) # 4 since it starts from 0, 8 since it's a semi-open interval

In [None]:
print('Elements larger than 18:', x[x>18]) # x>18 returns list of True/False

### Matrices

It's all equivalent to above, but now we have lines and columns.

`m[i,j]` will refer to line i and column j

In [None]:
m = x.reshape([3,5]) # rewrites the 1D vector x as a 3x5 matrix.

print(m)
print('Shape of m:', m.shape)

In [None]:
print('The second line:', m[1, :]) # 1, since it starts from 0, and : refers to all

In [None]:
print('The third column:', m[:, 2])

In [None]:
print('The element from line 2, column 3:', m[1,2]) # remember that it counts from zero

## Creating matrices

In [20]:
# by hand (or use comprehensions, see the previous tutorial)
m = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
])

print(m)
print('Shape of m:', m.shape)

[[1 2 3]
 [4 5 6]
 [7 8 9]]
Shape of m: (3, 3)


In [21]:
# identity
m = np.eye(5)
print(m)
print('Shape of m:', m.shape)

[[1. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0.]
 [0. 0. 1. 0. 0.]
 [0. 0. 0. 1. 0.]
 [0. 0. 0. 0. 1.]]
Shape of m: (5, 5)


In [22]:
# subdiagonals
m = np.eye(5) + 4*np.eye(5, k=1) + 3*np.eye(5, k=-2)
print(m)
print('Shape of m:', m.shape)

[[1. 4. 0. 0. 0.]
 [0. 1. 4. 0. 0.]
 [3. 0. 1. 4. 0.]
 [0. 3. 0. 1. 4.]
 [0. 0. 3. 0. 1.]]
Shape of m: (5, 5)


In [23]:
# ones
m = np.ones([3,5]) # takes the shape as argument
print(m)
print('Shape of m:', m.shape)

[[1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]]
Shape of m: (3, 5)


In [24]:
# the same for zeros
m = np.zeros([3,5]) # takes the shape as argument
print(m)
print('Shape of m:', m.shape)

[[0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]]
Shape of m: (3, 5)
