# Introduction to numpy

0. Introduction
0. numpy overview
0. numpy arrays
    * Creation
    * Save and load
    * Manipulation
    * Indexing
0. numpy modules


Credits: V. A. Sole, ESRF Software Group

# Introduction

## Python basic operators

* `+`: Addition
* `-`: Substraction
* `/`: Division
* `**`: Exponentiation
* ``abs(x)``: Absolute value of x
* `x%y` Remaining of x/y
* `x//y`: Integer part of x/y

## Python basic high level data types

* Numbers: ``10, 10.0, 1.0e+01, (10.0+3j)``
* Strings: ``"Hello world"``
* Bytes: ``b"Hello world"``
* Lists: ``['abc', 3, 'x']``
* Tuples: ``('abc', 3, 'x')``
* Dictionnaries: ``{'key1': 'abc', 'key2': 3, 'key3': 'x'}``

## Exercice 1

Perform operations on a Python list:

In [None]:
# Let a be a list 
a = [1, 2, 3]
print(a)

In [None]:
# What is the results of
2 * a[2]

In [None]:
# and
2 * a

# Try other combination of operations and data types

## Conclusion

Without additional libraries python is almost useless for scientific computing

## Scientist's Swiss knife

- The ``numpy`` package
- ``matplotlib`` provides high quality graphics
- ``SciPy`` provides additional scientific capabilities

# numpy

numpy is THE library providing number crunching capabilities to Python.

It extends Python by providing tools for:

* Treatment of multi-dimensional data
* Access to optimized linear algebra libraries
* Encapsulation of C and Fortran code

# numpy array

The ``numpy.ndarray`` object is:

* A collection of elements of the same type
* Multidimensional with flexible indexing
* Handled as any other Python object
* Implemented in memory as a true table optimized for performance
    
It can be interfaced with other languages.

In [None]:
import numpy

## Array creation given its content ``numpy.array``

Documentation: https://docs.scipy.org/doc/numpy/reference/generated/numpy.array.html

In [None]:
# create an array from a list of values
a = numpy.array([1, 2, 3, 5, 7, 11, 13, 17])
a

In [None]:
# create an array from a list of values and dimensions
b = numpy.array([[1, 2, 3], [4, 5, 6]])
b

In [None]:
numpy.array?

## Exercice 2

Use Python as a simple calculator and try the basic operations
on different arrays of numbers

In [None]:
import numpy

a = [1, 2, 3]
b = numpy.array(a)

In [None]:
# perform some operations with the list a:
print('2 * a[2] =', 2 * a[2])
print('2 * a =', 2 * a)

In [None]:
# perform some operations with the numpy array b:
print('2 * b[2] =', 2 * b[2])
print('2 * b =', 2 * b)

## Array creation with dedicated methods

Documentation: https://docs.scipy.org/doc/numpy/reference/routines.array-creation.html

In [None]:
numpy.empty((2, 4))

In [None]:
numpy.zeros((3,))

In [None]:
numpy.ones((2, 2, 3))

### More array creation methods

In [None]:
numpy.arange(start=0, stop=10, step=1)

In [None]:
numpy.linspace(start=1, stop=10, num=10, endpoint=True)

In [None]:
numpy.identity(2)

## Types of elements

* Integers and real numbers with different precision (float, double, long double)
* Complex
* Chains of characters
* Any python object (object).

### Types of elements: `dtype`

Specify the element type with the ``dtype`` argument:


In [None]:
numpy.zeros((3,), dtype=numpy.int)

In [None]:
numpy.arange(3, dtype=numpy.float)

### Types of elements: `dtype`

* ``numpy.float`` is ``double`` precision (64 bits)
* ``numpy.int`` is a ``long int`` (either 64 or 32 bits)

The best is to use sized types:
* Integers: ``int32``, ``int64``, ``uint8`` ...
* Real numbers: ``float32``, ``float64`` ...
* Complex: ``complex64``, ``complex128``

In [None]:
numpy.arange(2, dtype=numpy.float32)

### Array of objects

numpy arrays can contain any type of object

In [None]:
a = dict({'key1': 0})
b = [1, 2, 3]
c = "element"
numpy.array([a, b, c])

### Record arrays

They allow access to the data using named fields.

Imagine your data being a spreadsheet, the field names would be the column heading.

In [None]:
img = numpy.zeros((2,2),
        {'names': ('r','g','b'),
         'formats': (numpy.float32, numpy.float64, numpy.int32)})
img

In [None]:
img['r'] = 1.
img

# Save and load arrays

Documentation: https://docs.scipy.org/doc/numpy/reference/routines.io.html

In [None]:
a = numpy.arange(start=0, stop=10, step=1, dtype=numpy.int32)

# Save as a binary file (.npy)
numpy.save('data.npy', a)

In [None]:
numpy.load('data.npy')

In [None]:
# Save as a text file
numpy.savetxt('myarray.txt', a)

In [None]:
numpy.loadtxt('myarray.txt', dtype=numpy.int32)

## ``numpy.loadtxt``

* Each row in the text file must have the same number of values
* Several other options exists

In [None]:
numpy.loadtxt?

# Using arrays

## Array operations

Common functions are:
    
* Linear algebra: ``dot`` (matrix multiplication), ``inner`` product, ``outer`` product
* Statistics: ``mean``, ``std``, ``median``, ``percentile``, ... (https://docs.scipy.org/doc/numpy/reference/routines.statistics.html)
* Sums: ``sum``, ``cumsum``, ...
* Math: ``cos``, ``sin``, ``log10``, ``interp``, ... (https://docs.scipy.org/doc/numpy/reference/routines.math.html)
* Indexing, Logic functions, Sorting
* ... See: https://docs.scipy.org/doc/numpy/reference/routines.html

In [None]:
a = numpy.linspace(0., 1., 100)
print('Mean:', numpy.mean(a), ', Std:', numpy.std(a))

In [None]:
# Standard operations operate element by element
angles = numpy.linspace(0, numpy.pi, 5)
numpy.cos(angles)

In [None]:
a = numpy.array([[0., 1., 2.],
                 [3., 4., 5.],
                 [6., 7., 8.]])
b = numpy.identity(3)
numpy.dot(a, b)
# Or: c= a @ b  with Python>=3.5

### Array operations along an axis

Many ``numpy`` *reduction* functions take an `axis` argument:

In [None]:
a = numpy.array([[0, 1, 2, 3],
                 [4, 5, 6, 7]])

In [None]:
numpy.min(a)

In [None]:
numpy.min(a, axis=1)

### Array operation example

In [None]:
# argsort return the sorted indices not the sorted array
a = [2, 5, 3]

idx = numpy.argsort(a)
idx

In [None]:
# numpy.take returns a new sorted array
numpy.take(a, idx)

## Array methods

Some ``numpy`` functions are also available as array methods:

In [None]:
a = numpy.array([[7, 6, 5, 4],
                 [3, 2, 1, 0]])

In [None]:
# Return a value computed from the array
a.min(), a.max(), a.sum()

In [None]:
# In-place operation
a.sort()
a

### More on array methods

In [None]:
a = numpy.array([(0, 1), (2, 3)])
a

In [None]:
b = a.copy()
c = numpy.copy(a)
# or even
d = numpy.array(a, copy=True)

In [None]:
a.transpose()

In [None]:
numpy.transpose(a)

## Array attributes

### ``dtype`` attribute

Identifies the type of the elements of the array

In [None]:
a = numpy.array([[3, 2], [8, 12]])
a.dtype

In [None]:
a.dtype.str

### ``shape`` attribute

Tuple containing the array dimensions

In [None]:
a = numpy.array([1, 2, 3, 4])
a.shape

In [None]:
# It is a Read and Write attribute.
a.shape = (2, 2)
a

### More array attributes

* ``ndim``: Number of dimensions
* ``size``: Total number of element
* ``itemsize``: Size of a single item
* ``strides``: Bytes to step in each dimension
* ``flags``: Contiguity of the data in the buffer
* ``nbytes``: Size in bytes occupied in memory
* ``data``: Read/write buffer containing the data

In [None]:
a.ndim

## Plotting numpy arrays with matplotlib

``matplolib`` is a versatile 1D and 2D plotting Python library producing publication quality figures.

``matplotlib.pyplot`` provides MATLAB-like functions, such as ``plot`` and ``imshow``.

In notebooks, this is enabled through the ``%pylab`` magic:

In [None]:
%pylab inline
# Or: %pylab nbagg

In [None]:
x = numpy.linspace(0., 2 * numpy.pi, 100)
cos_x = numpy.cos(x)

plot(cos_x)
#plot(x, cos_x, x, numpy.sin(x))

In [None]:
plot?

In [None]:
image = numpy.arange(5000)
image.shape = 100, 50

imshow(image)
colorbar() 

In [None]:
imshow?

## Excercise 3

0. Load the `anscombe.txt` file as a numpy.array.
0. Check the number of dimensions and the size of each dimension.
0. Compute the mean and std along dimension 1 (i.e., for each column).

In [None]:
# TODO

## Exercice 3

Data taken from the Anscombe's quartet: https://en.wikipedia.org/wiki/Anscombe%27s_quartet

In [None]:
# Display the curves
data = numpy.loadtxt('anscombe.txt')

curves = numpy.transpose(data)
for curve in curves:
    plot(curve)

## Indexing

Select elements as with any other Python sequence:
* Indexing starts at 0 for each array dimension
* Indexes can be negative: `x[-1]` is the same as `x[len(x) - 1]`

The output refers to the original array and usually it is not contiguous in memory.

In [None]:
a = numpy.array([0, 1, 2, 3])
print('a[0] =', a[0])
print('a[-1] =', a[-1])

In [None]:
a = numpy.array([(1, 2, 3, 4),
                 (5, 6, 7, 8),
                 (9, 10, 11, 12)])

In [None]:
a[1, 2]

In [None]:
a[0:2, 2]

In [None]:
a[2] # all the elements of the third row

In [None]:
a[2, :] # same as previous assuming a has at least two dimensions

In [None]:
a[0, -1]  # last element of the first row

In [None]:
a[0:2, 0:4:2]  # slicing allowed

In [None]:
a

In [None]:
a[0:2, :2] = 5  # assignation is also possible
a

### More indexing

In [None]:
a = numpy.arange(10., 18.)
a

In [None]:
# The indexation argument can be a list or an array
a[[0, 3, 5]]

In [None]:
# The indexation argument can be a logical array
mask = a > 13
print('a > 13 =', mask)
a[mask]

## Excercise 4

* Calculate the element-wise difference between ``x`` and ``y``?
* Provide an expression to calculate the difference ``x[i+1]-x[i]`` for all the elements of the 1D array.

In [None]:
x = numpy.arange(10)
y = numpy.arange(1, 11)
print('x =', x)
print('y =', y)

In [None]:
# TODO

In [None]:
import exercicesolution
exercicesolution.show("ex3_1")

In [None]:
exercicesolution.show("ex3_2")

## Array views

New array object but pointing to the same buffer

In [None]:
a = numpy.arange(10.)
a.shape = (2, 5)
c = a.transpose()
print('a =', a)
print('c =', c)
a[1, 2]

In [None]:
c[2, 1] = 10

a[1, 2]

### More on array views

In [None]:
a = numpy.array([[0, 1, 2], [3, 4, 5]])

b = a.reshape(-1) # makes whatever needed to get the matching number

a[0, 0]

In [None]:
b[0] = 1000

a[0, 0]

## Exercise 5: perform a 2x2 binning of an image

0. Generate a 1D array with 100 elements and perform a binning such that:

   ``1 2 3 4`` -> ``1+2`` ``3+4``
0. Generate a 100x100 array with elements in increasing order
0. Perform a 2x2 binning

| 1  | 2  | 3  | 4  |
|----|----|----|----|
| 5  | 6  | 7  | 8  |
| 9  | 10 | 11 | 12 |
| 13 | 14 | 15 | 16 |

2x2 binning:

| 1+2+5+6    | 3+4+7+8     |
|------------|-------------|
| 9+10+13+14 | 11+12+15+16 |

Bonus: Set all elements of the resulting array that are below 1000 to 0.

In [None]:
# TODO

In [None]:
import exercicesolution
exercicesolution.show("ex4_1")

In [None]:
exercicesolution.show("ex4_2")

In [None]:
exercicesolution.show("ex4_2_alt")

# Numpy modules

Documentation: https://docs.scipy.org/doc/numpy/reference/

## Linear algebra: ``numpy.linalg``

As usual, dir() and help() are your friends...
Some functions:

* ``numpy.linalg.det(x)``: determinant of x
* ``numpy.linalg.eig(x)``: eigenvalues and eigenvectors of x
* ``numpy.linalg.eigh(x)``: idem profiting of x being a hermitian matrix
* ``numpy.linalg.inv(x)``: inverse matrix of x
* ``numpy.linalg.svd(x)``: singular value decomposition of x

## Random sampling: ``numpy.random``

### Simple random data

In [None]:
import numpy.random

In [None]:
# random integers in the interval [low:high[
numpy.random.randint(low=0, high=5, size=10)

In [None]:
# random floats in the half-open interval [0.0:1.0[
numpy.random.random(10)

In [None]:
numpy.random.bytes(10)

### Permutations

In [None]:
a = numpy.arange(1, 10)
a

In [None]:
# In-place
numpy.random.shuffle(a)
a

In [None]:
# Out-of-place
numpy.random.permutation(a)

### Distributions

``normal``, ``poisson``, ...

In [None]:
data = numpy.random.normal(loc=1., scale=1., size=100000)

In [None]:
%pylab inline

histo, bin_edges = numpy.histogram(data, bins=100)
bin_centers = (bin_edges[:-1] + bin_edges[1:]) / 2.
plot(bin_centers, histo)
# Or: hist(data, bins=100)

## Discrete Fourier Transform: ``numpy.fft``

* ``numpy.fft.fft`` --> 1D FFT
* ``numpy.fft.fft2`` --> 2D FFT
* ``numpy.fft.fftn`` --> nD FFT

In [None]:
numpy.fft.fft?

## Polynomials: ``numpy.polynomial``

Polynomials in numpy can be created, manipulated, and even fitted.

It provides Polynomial, Chebyshev, Legendre, Laguerre, Hermite and HermiteE series.

## Exercice 6

* Write a function ``fill_array(height, width)`` to generate an array of dimension (height, width) in which ``X[row, column] = cos(row) * sin(column)``
* Time it for height=1000, width=1000

Bonus: Do the same for ``X[row, column] = cos(row) + sin(column)``

In [None]:
def fill_array(height, width):
    return None  # TODO

%timeit fill_array(1000, 1000)

In [None]:
# inefficient fill
import exercicesolution
exercicesolution.show("ex5_inefficient_fill")
%timeit exercicesolution.ex5_inefficient_fill(1000, 1000)

In [None]:
# naive fill
exercicesolution.show("ex5_naive_fill")
%timeit exercicesolution.ex5_naive_fill(1000, 1000)

In [None]:
# clever fill
exercicesolution.show("ex5_clever_fill")
%timeit exercicesolution.ex5_clever_fill(1000, 1000)

In [None]:
# practical fill
exercicesolution.show("ex5_practical_fill")
%timeit exercicesolution.ex5_practical_fill(1000, 1000)

In [None]:
# optimized fill
exercicesolution.show("ex5_optimized_fill")
%timeit exercicesolution.ex5_optimized_fill(1000, 1000)

In [None]:
# atleast_2d fill
exercicesolution.show("ex5_atleast_2d_fill")
%timeit exercicesolution.ex5_atleast_2d_fill(1000, 1000)

Speed is a question of algorithm.

It is not just a question of language.
    
| Implementation       | Duration (seconds) |
|----------------------|--------------------|
| ex5_inefficient_fill | 5.052937           |
| ex5_naive_fill       | 0.886003           |
| ex5_clever_fill      | 0.016836           |
| ex5_practical_fill   | 0.014959           |
| ex5_optimized_fill   | 0.004497           |
| ex5_atleast_2d_fill  | 0.005262           |

Done on Intel(R) Xeon(R) CPU E5-1650 @ 3.50GHz

# Resources

- Complete reference material:
  http://docs.scipy.org/doc/numpy/reference/
- numpy user guide:
  https://docs.scipy.org/doc/numpy/user/
- Many recipes for different purposes:
  http://www.scipy.org/Cookbook
- Active mailing list where you can ask your questions:
  numpy-discussion@scipy.org
- Internal data-analysis mailing list:
  data-analysis@esrf.fr

## More exercises for the braves

Thanks to Nicolas Rougier: https://github.com/rougier/numpy-100:

* Create a 5x5 matrix with values 1,2,3,4 just below the diagonal
* Create a 8x8 matrix and fill it with a checkerboard pattern
* Normalize a 5x5 random matrix
* Create a 5x5 matrix with row values ranging from 0 to 4
* Consider a random 10x2 matrix representing cartesian coordinates, convert them to polar coordinates
* Create random vector of size 10 and replace the maximum value by 0
* Consider a random vector with shape (100,2) representing coordinates, find point by point distances
* Generate a generic 2D Gaussian-like array
* Subtract the mean of each row of a matrix
* How to I sort an array by the nth column ?
* Find the nearest value from a given value in an array