# Introduction to numpy

### Summary

* Introduction
* numpy overview
* numpy arrays
    * Creation
    * Save and load
    * Manipulation
* numpy modules


Credits: V. A. Sole, ESRF Software Group

# Introduction

## Python basic operators

* `+`: Addition
* `-`: Substraction
* `/`: Division
* `**`: Exponentiation
* ``abs(x)``: Absolute value of x
* `x%y` Remaining of x/y
* `x//y`: Integer part of x/y

## Python basic high level data types

* Numbers: ``10, 10.0, 1.0e+01, (10.0+3j)``
* Strings: ``"Hello world"``
* Bytes: ``b"Hello world"``
* Lists: ``['abc', 3, 'x']``
* Tuples: ``('abc', 3, 'x')``
* Dictionnaries: ``{'key1': 'abc', 'key2': 3, 'key3': 'x'}``

## Exercice 1

Perform operations on a Python list:

In [1]:
# Let a be a list 
a = [1, 2, 3]
print(a)

[1, 2, 3]


In [2]:
# What is the results of
2 * a[2]

6

In [3]:
# and
2 * a

# Try other combination of operations and data types

[1, 2, 3, 1, 2, 3]

## Conclusion

Without additional libraries python is almost useless for scientific computing

## Scientist's Swiss knife

- The ``numpy`` package
- ``matplotlib`` provides high quality graphics
- ``SciPy`` provides additional scientific capabilities

# numpy

numpy is THE library providing number crunching capabilities to Python.

It extends Python by providing tools for:

* Treatment of multi-dimensional data
* Access to optimized linear algebra libraries
* Encapsulation of C and Fortran code

# numpy array

The ``numpy.ndarray`` object is:

* A collection of elements of the same type
* Multidimensional with flexible indexing
* Handled as any other Python object
* Implemented in memory as a true table optimized for performance
    
It can be interfaced with other languages.

In [4]:
import numpy

## Array creation given its content ``numpy.array``

In [5]:
# create an array from a list of values
a = numpy.array([1, 2, 3, 5, 7, 11, 13, 17])
a

array([ 1,  2,  3,  5,  7, 11, 13, 17])

In [6]:
# create an array from a list of values and dimensions
b = numpy.array([[1, 2, 3], [4, 5, 6]])
b

array([[1, 2, 3],
       [4, 5, 6]])

In [7]:
numpy.array?

## Exercice 2

Use Python as a simple calculator and try the basic operations
on different arrays of numbers

In [8]:
import numpy

a = [1, 2, 3]
b = numpy.array(a)

In [9]:
# perform some operations with the list a:
print('2 * a[2] =', 2 * a[2])
print('2 * a =', 2 * a)

2 * a[2] = 6
2 * a = [1, 2, 3, 1, 2, 3]


In [10]:
# perform some operations with the numpy array b:
print('2 * b[2] =', 2 * b[2])
print('2 * b =', 2 * b)

2 * b[2] = 6
2 * b = [2 4 6]


## Array creation with dedicated methods 1/2

In [11]:
numpy.empty((2, 4))

array([[6.95155385e-310, 1.99997299e-316, 5.32185380e-317,
        6.95155472e-310],
       [5.32185380e-317, 6.95149943e-310, 3.16202013e-322,
        5.32185380e-317]])

In [12]:
numpy.zeros((3,))

array([0., 0., 0.])

In [13]:
numpy.ones((2, 2, 3))

array([[[1., 1., 1.],
        [1., 1., 1.]],

       [[1., 1., 1.],
        [1., 1., 1.]]])

## Array creation with dedicated methods 2/2

In [14]:
numpy.arange(start=0, stop=10, step=1)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [15]:
numpy.linspace(start=1, stop=10, num=10, endpoint=True)

array([ 1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10.])

In [16]:
numpy.identity(2)

array([[1., 0.],
       [0., 1.]])

### Types of elements

* Integers and real numbers with different precision (float, double, long double)
* Complex
* Chains of characters
* Any python object (object).

### Types of elements: `dtype`

Specify the element type with the ``dtype`` argument:


In [17]:
numpy.zeros((3,), dtype=numpy.int)

array([0, 0, 0])

In [18]:
numpy.arange(3, dtype=numpy.float)

array([0., 1., 2.])

### Types of elements: `dtype`

* ``numpy.float`` is ``double`` precision (64 bits)
* ``numpy.int`` is a ``long int`` (either 64 or 32 bits)

Best to use sized types:
* Integers: ``int32``, ``int64``, ``uint8`` ...
* Real numbers: ``float32``, ``float64`` ...
* Complex: ``complex64``, ``complex128``

In [19]:
numpy.arange(2, dtype=numpy.float32)

array([0., 1.], dtype=float32)

#### Array of objects

numpy arrays can contain any type of object

In [20]:
a = dict({'key1': 0})
b = [1, 2, 3]
c = "element"
numpy.array([a, b, c])

array([{'key1': 0}, list([1, 2, 3]), 'element'], dtype=object)

#### Record arrays

They allow access to the data using named fields.

Imagine your data being a spreadsheet, the field names would be the column heading.

In [21]:
img = numpy.zeros((2,2),
        {'names': ('r','g','b'),
         'formats': (numpy.float32, numpy.float64, numpy.int32)})
img

array([[(0., 0., 0), (0., 0., 0)],
       [(0., 0., 0), (0., 0., 0)]],
      dtype=[('r', '<f4'), ('g', '<f8'), ('b', '<i4')])

In [22]:
img['r'] = 1.
img

array([[(1., 0., 0), (1., 0., 0)],
       [(1., 0., 0), (1., 0., 0)]],
      dtype=[('r', '<f4'), ('g', '<f8'), ('b', '<i4')])

## Save and load arrays

In [23]:
a = numpy.arange(start=0, stop=10, step=1, dtype=numpy.int32)

# Save as a binary file (.npy)
numpy.save('data.npy', a)

In [24]:
numpy.load('data.npy')

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=int32)

In [25]:
# Save as a text file
numpy.savetxt('myarray.txt', a)

In [26]:
numpy.loadtxt('myarray.txt', dtype=numpy.int32)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=int32)

#### ``numpy.loadtxt``

* Each row in the text file must have the same number of values
* Several other options exists

In [27]:
numpy.loadtxt?

# Using arrays

## Indexing

Select elements as with any other Python sequence:
* Indexing starts at 0 for each array dimension
* Indexes can be negative: x[-1] is the same as x[len(x) - 1]

The output refers to the original array and usually it is not contiguous in memory.

In [28]:
a = numpy.array([0, 1, 2, 3])
print('a[0] =', a[0])
print('a[-1] =', a[-1])

a[0] = 0
a[-1] = 3


In [29]:
a = numpy.array([(1, 2, 3, 4),
                 (5, 6, 7, 8),
                 (9, 10, 11, 12)])

In [30]:
a[1, 2]

7

In [31]:
a[0:2, 2]

array([3, 7])

In [32]:
a[2] # all the elements of the third row

array([ 9, 10, 11, 12])

In [33]:
a[2, :] # same as previous assuming a has at least two dimensions

array([ 9, 10, 11, 12])

In [34]:
a[0, -1]  # last element of the first row

4

In [35]:
a[0:2, 0:4:2]  # slicing allowed

array([[1, 3],
       [5, 7]])

In [36]:
a

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

In [37]:
a[0:2, :2] = 5  # assignation is also possible
a

array([[ 5,  5,  3,  4],
       [ 5,  5,  7,  8],
       [ 9, 10, 11, 12]])

## Indexing

In [38]:
a = numpy.arange(10., 18.)
a

array([10., 11., 12., 13., 14., 15., 16., 17.])

In [39]:
# The indexation argument can be a list or an array
a[[0, 3, 5]]

array([10., 13., 15.])

In [40]:
# The indexation argument can be a logical array
mask = a > 13
print('a > 13 =', mask)
a[mask]

a > 13 = [False False False False  True  True  True  True]


array([14., 15., 16., 17.])

## Excercise 3

* Calculate the element-wise difference between ``x`` and ``y``?
* Provide an expression to calculate the difference ``x[i+1]-x[i]`` for all the elements of the 1D array.

In [41]:
x = numpy.arange(10)
y = numpy.arange(1, 11)
print('x =', x)
print('y =', y)

x = [0 1 2 3 4 5 6 7 8 9]
y = [ 1  2  3  4  5  6  7  8  9 10]


In [42]:
# TODO

In [None]:
import exercicesolution
exercicesolution.show("ex3_1")

In [None]:
exercicesolution.show("ex3_2")

## Array attributes

### ``dtype`` attribute

Identifies the type of the elements of the array

In [45]:
a = numpy.array([[3, 2], [8, 12]])
a.dtype

dtype('int64')

In [46]:
a.dtype.str

'<i8'

### ``shape`` attribute

Tuple containing the array dimensions

In [47]:
a = numpy.array([[3, 2], [8, 12]])
a.shape

(2, 2)

In [48]:
# It is a Read and Write attribute.
a.shape = (4,)
a

array([ 3,  2,  8, 12])

### More array attributes

* ``ndim``: Number of dimensions
* ``size``: Total number of element
* ``itemsize``: Size of a single item
* ``strides``: Bytes to step in each dimension
* ``flags``: Contiguity of the data in the buffer
* ``nbytes``: Size in bytes occupied in memory
* ``data``: Read/write buffer containing the data

In [49]:
a.ndim

1

## Array methods

In [50]:
a = numpy.arange(10, 0, -1)
a

array([10,  9,  8,  7,  6,  5,  4,  3,  2,  1])

In [51]:
# Return a value computed from the array
a.min(), a.max(), a.sum()

(1, 10, 55)

In [52]:
# In-place operation
a.sort()
a

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

In [53]:
# Return array views
b = a.reshape(2, 5)
b

array([[ 1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10]])

In [54]:
b.transpose()

array([[ 1,  6],
       [ 2,  7],
       [ 3,  8],
       [ 4,  9],
       [ 5, 10]])

## Array methods

Many array methods are also available as ``numpy`` functions:

In [55]:
a = numpy.array([(0, 1), (2, 3)])

In [56]:
b = a.copy()
c = numpy.copy(a)
# or even
d = numpy.array(a, copy=True)

In [57]:
a.transpose()

array([[0, 2],
       [1, 3]])

In [58]:
numpy.transpose(a)

array([[0, 2],
       [1, 3]])

### Views

New array object but pointing to the same buffer

In [59]:
a = numpy.arange(10.)
a.shape = (2, 5)
c = a.transpose()
print('a = ', a)
print('c =', c)
a[1, 2]

a =  [[0. 1. 2. 3. 4.]
 [5. 6. 7. 8. 9.]]
c = [[0. 5.]
 [1. 6.]
 [2. 7.]
 [3. 8.]
 [4. 9.]]


7.0

In [60]:
c[2, 1] = 10

a[1, 2]

10.0

### Views

In [61]:
a = numpy.array([[0, 1, 2], [3, 4, 5]])

b = a[:]
b.shape = -1  # makes whatever needed to get the matching number

a[0, 0]

0

In [62]:
b[0] = 1000

a[0, 0]

1000

### Exercise 4: perform a 2x2 binning of an image

1. Generate a 1D array with 100 elements and perform a binning such that: ``1 2 3 4`` -> ``1+2`` ``3+4``
1. Generate a 100x100 array with elements in increasing order
1. Perform a 2x2 binning

|    |    |    |    |
|----|----|----|----|
| 1  | 2  | 3  | 4  |
| 5  | 6  | 7  | 8  |
| 9  | 10 | 11 | 12 |
| 13 | 14 | 15 | 16 |

2x2 binning:

|            |             |
|------------|-------------|
| 1+2+5+6    | 3+4+7+8     |
| 9+10+13+14 | 11+12+15+16 |

In [63]:
# TODO

In [None]:
import exercicesolution
exercicesolution.show("ex4_1")

In [None]:
exercicesolution.show("ex4_2")

In [None]:
exercicesolution.show("ex4_2_alt")

# Numpy modules

reference: https://docs.scipy.org/doc/numpy/reference/

## Array operations

Common functions are:
    
* Linear algebra: ``dot`` (matrix multiplication), ``inner`` product, ``outer`` product
* Statistics: ``mean``, ``std``, ``median``, ``percentile``, ...
* Sums: ``sum``, ``cumsum``, ...
* Math: ``cos``, ``sin``, ``log10``, ...
* Interpolation: ``interp``
* Indexing, Logic functions, Sorting
* ...

In [67]:
a = numpy.arange(9).reshape(3, 3)
b = numpy.identity(3)
numpy.dot(a, b)

array([[0., 1., 2.],
       [3., 4., 5.],
       [6., 7., 8.]])

In [68]:
numpy.std(a)

2.581988897471611

In [69]:
# Standard operations operate element by element
angles = numpy.linspace(0, numpy.pi, 5)
numpy.cos(angles)

array([ 1.00000000e+00,  7.07106781e-01,  6.12323400e-17, -7.07106781e-01,
       -1.00000000e+00])

## Array operations

Many ``numpy`` functions takes an `axis` argument

In [70]:
a = numpy.arange(8).reshape(2, 4)
a

array([[0, 1, 2, 3],
       [4, 5, 6, 7]])

In [71]:
numpy.min(a)

0

In [72]:
numpy.min(a, axis=1)

array([0, 4])

In [73]:
# argsort return the sorted indices not the sorted array
a = [2, 5, 3]

idx = numpy.argsort(a)
idx

array([0, 2, 1])

In [74]:
# numpy.take returns a new sorted array
numpy.take(a, idx)

array([2, 3, 5])

## Linear algebra: ``numpy.linalg``

As usual, dir() and help() are your friends...
Some functions:

* ``numpy.linalg.det(x)``: determinant of x
* ``numpy.linalg.eig(x)``: eigenvalues and eigenvectors of x
* ``numpy.linalg.eigh(x)``: idem profiting of x being a hermitian matrix
* ``numpy.linalg.inv(x)``: inverse matrix of x
* ``numpy.linalg.svd(x)``: singular value decomposition of x

## Random sampling: ``numpy.random``

### Simple random data

In [75]:
import numpy.random

In [76]:
# random integers in the interval [low:high[
numpy.random.randint(low=0, high=5, size=10)

array([4, 3, 3, 2, 2, 4, 1, 4, 3, 2])

In [77]:
# random floats in the half-open interval [0.0:1.0[
numpy.random.random(10)

array([0.70958586, 0.55074398, 0.83880158, 0.24258886, 0.19500197,
       0.54153772, 0.58894925, 0.54654143, 0.41901308, 0.4061334 ])

In [78]:
numpy.random.bytes(10)

b'\x83\x1e\x90X\x08\xecID?\xf3'

## Random sampling: ``numpy.random``

### Permutations

In [79]:
a = numpy.arange(1, 10)
a

array([1, 2, 3, 4, 5, 6, 7, 8, 9])

In [80]:
# In-place
numpy.random.shuffle(a)
a

array([2, 8, 9, 5, 6, 3, 1, 4, 7])

In [81]:
# Out-of-place
numpy.random.permutation(a)

array([7, 8, 6, 4, 5, 2, 1, 9, 3])

## Random sampling: ``numpy.random``

### Distributions

``normal``, ``poisson``, ...

In [82]:
numpy.random.normal(loc=1., scale=1., size=10)

array([ 0.88158348,  0.08937197,  0.23049296,  2.84015366,  1.99408412,
        0.64941659,  0.95326144, -0.27226321,  0.15859168,  0.60410956])

### Discrete Fourier Transform: ``numpy.fft``

* ``numpy.fft.fft`` --> 1D FFT
* ``numpy.fft.fft2`` --> 2D FFT
* ``numpy.fft.fftn`` --> nD FFT

In [83]:
numpy.fft.fft?

### Polynomials: ``numpy.polynomial``

Polynomials in numpy can be created, manipulated, and even fitted.

It provides Polynomial, Chebyshev, Legendre, Laguerre, Hermite and HermiteE series.

## Exercice 5

* Write a function ``fill_array(height, width)`` to generate an array of dimension (height, width) in which ``X[row, column] = cos(row) * sin(column)``
* Time it for height=1000, width=1000
* Extra: Do the same for ``X[row, column] = cos(row) + sin(column)``

In [84]:
def fill_array(height, width):
    return None  # TODO

%timeit fill_array(1000, 1000)

187 ns ± 4.66 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


In [None]:
# inefficient fill
import exercicesolution
exercicesolution.show("ex5_inefficient_fill")
%timeit exercicesolution.ex5_inefficient_fill(1000, 1000)

In [None]:
# naive fill
exercicesolution.show("ex5_naive_fill")
%timeit exercicesolution.ex5_naive_fill(1000, 1000)

In [None]:
# clever fill
exercicesolution.show("ex5_clever_fill")
%timeit exercicesolution.ex5_clever_fill(1000, 1000)

In [None]:
# practical fill
exercicesolution.show("ex5_practical_fill")
%timeit exercicesolution.ex5_practical_fill(1000, 1000)

In [None]:
# optimized fill
exercicesolution.show("ex5_optimized_fill")
%timeit exercicesolution.ex5_optimized_fill(1000, 1000)

In [None]:
# atleast_2d fill
exercicesolution.show("ex5_atleast_2d_fill")
%timeit exercicesolution.ex5_atleast_2d_fill(1000, 1000)

Speed is a question of algorithm.

It is not just a question of language.
    
| Implementation       | Duration (seconds) |
|----------------------|--------------------|
| ex5_inefficient_fill | 5.052937           |
| ex5_naive_fill       | 0.886003           |
| ex5_clever_fill      | 0.016836           |
| ex5_practical_fill   | 0.014959           |
| ex5_optimized_fill   | 0.004497           |
| ex5_atleast_2d_fill  | 0.005262           |

Done on Intel(R) Xeon(R) CPU E5-1650 @ 3.50GHz

## Resources

- Complete reference material:
  http://docs.scipy.org/doc/numpy/reference/
- numpy user guide:
  https://docs.scipy.org/doc/numpy/user/
- Many recipes for different purposes:
  http://www.scipy.org/Cookbook
- Active mailing list where you can ask your questions:
  numpy-discussion@scipy.org
- Internal data-analysis mailing list:
  data-analysis@esrf.fr

## More exercises for the braves

Thanks to Nicolas Rougier: https://github.com/rougier/numpy-100:

* Create a 5x5 matrix with values 1,2,3,4 just below the diagonal
* Create a 8x8 matrix and fill it with a checkerboard pattern
* Normalize a 5x5 random matrix
* Create a 5x5 matrix with row values ranging from 0 to 4
* Consider a random 10x2 matrix representing cartesian coordinates, convert them to polar coordinates
* Create random vector of size 10 and replace the maximum value by 0
* Consider a random vector with shape (100,2) representing coordinates, find point by point distances
* Generate a generic 2D Gaussian-like array
* Subtract the mean of each row of a matrix
* How to I sort an array by the nth column ?
* Find the nearest value from a given value in an array