# Python for (open) Neuroscience

_Lecture 1.0_ - Introduction to `numpy`

Luigi Petrucco

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/vigji/python-cimec-2024/blob/main/lectures/Lecture1.0_Numpy-intro.ipynb)

## Working with libraries (/packages/modules)

We can import external libraries using `import`.

Sometimes, external libraries will need to be installed (but Colab already features the ones we will be using now). We will see in a future lecture more on installing libraries.

In [None]:
# this command adds a whole bunch of functions and classes for us to use:
import numpy

# This function creates an array from a list:
numpy.array([1,2,3])

We can give aliases to the library we import for the sake of brevity:

In [None]:
import numpy as np
np.array([1,2,3])

We can also import specific functions (or classes) from a library with this syntax:

In [None]:
from numpy import array
array([1,2,3])

## The `numpy` library

NumPy is a Python library to create, manipulate, and combine arrays. 

An array is an ordered collection of items that can be accessed via an index. 

While they may seem similar to lists, arrays are much more powerful because they support a wealth of operations that cannot be performed on lists. 

## Array arithmetics

Arrays let us do entirely new operations:

In [None]:
# With lists, there is no way to do vector operations:
[5, 5, 5] - [1, 1, 1]  

In [None]:
import numpy as np
np.array([5, 5, 5]) - np.array([1, 1, 1])

### A small note on credit & citations

Many times we forget to do so, but remember: there's scientists behind many open-source tools, and citations are the way we can reward them in academic currency! 

(but buy them beer if you ever encounter them in person 🍺)

    Harris, C.R., Millman, K.J., van der Walt, S.J. et al. Array programming with NumPy. Nature 585, 357–362 (2020). DOI: 10.1038/s41586-020-2649-2

Let's import the library!

In [5]:
import numpy as np

## `np.ndarray`

Data type (a class!) representing N-dimensional arrays

Workhorse of scientific computing!

A note on language: when we say `array` we do not imply any number (`n`) of dimensions:
- <span style="color:indianred">vectors</span> will be arrays with `n=1` (i.e., 1D)
- <span style="color:indianred">matrices</span> will be arrays with `n=2` (i.e., 2D)
- <span style="color:indianred">tensors</span> will be arrays with `n>=3` (i.e., 3D or more)

Related note: there is a `np.matrix` class in `numpy`, but you **should not** use it; it has not been adopted much and  it might be removed soon!

`np.ndarray` is a powerful data storing structure:

- it uses memory very efficiently even for large data (imaging data, movies, ephys...)

- it gives powerful indexing functionalities

- implements vectors/matrices algebric operations

### Note for the MATLAB addicts

The NumPy documentation provides a [useful cheatsheet](https://numpy.org/doc/stable/user/numpy-for-matlab-users.html) to facilitate the transition from MATLAB to NumPy arrays!

## Creating arrays

### Initialize empty arrays

#### `np.zeros()`

We can create a simple array of zeros using the `np.zeros` function, and passing it a single integer n to create a 1D vector of length n:


In [None]:
np.zeros(2)  # we pass a single integer for a 1D array

If we want a multidimensional array (eg 2D matrix), we pass a tuple of numbers indicating the size for every dimension:

In [None]:
np.zeros((3,2))  # we pass the tuple (3, 2) to have a 3 x 2 matrix

#### `np.ones()`

The `np.ones` function works in the same way but creates a matrix of ones:

In [None]:
np.ones((3,2))

#### `np.full()`

We can inizialize a matrix with arbitrary values using `np.full`:

In [None]:
np.full((2,3), 20)

## Types of values in arrays

In numpy arrays, **all elements must be of the same type**! (This is important to make arrays efficient)

By default, `np.nparray`s will be initialized with `float` values:

In [None]:
np.ones((2,3))

To specify the data type of our array, we can pass the `dtype` argument (for data type). For example, we can make it `int`:

In [None]:
np.zeros((2,3), dtype=bool)

To change the data type of an existing array, we can use the `.astype(new_type)` method:

In [None]:
my_arr = np.full((2,3), 1.)
my_arr.astype(int)

## `np.nan`

The `numpy` equivalent of `None` is `nan` (Not a Number). We can add nans in an array of floats:

In [None]:
my_arr = np.ones((2,3))
my_arr[0, 0] = np.nan
my_arr

In [None]:
# this code will fail, as the array type is int:
my_arr = np.full((2,3), 1)
my_arr[0] = np.nan

### Memory-efficient types

We can also use some special data type from numpy, for memory saving purposes:
 - `np.uint8` (numbers from 0 to 255)
 - `np.int8` (numbers from -128 to 127)
 - `np.uint16` (numbers from 0 to 65535)
 - `np.int16` (numbers from -32768 to 32767)
 
The number (8 or 16) represents the number of **bits** used for every entry in the array!

In [None]:
a_python_int_array = np.ones((200, 300), dtype=int)

# With this code we can ask for the size of the array in RAM memory, with the (default) sys library:
import sys
sys.getsizeof(a_python_int_array) 

In [None]:
a_uint8_array = np.ones((200, 300), dtype=np.uint8)

sys.getsizeof(a_uint8_array)  # the np.uint8 type is much more efficient!

We can also make arrays of text! 

In [None]:
txt = np.full((300,2000), "some text")

txt  # look at the dtype! the number you read will be the number of characters:

### Useful attributes of `np.ndarray` objects

Being objects, `np.ndarray`s have **attributes** (or, most likely, properties) that can be useful to check out how they are structured!

### `.shape` and `.ndim`

The `.ndim` attribute gives us the number of dimensions of the array

In [None]:
my_array = np.zeros((4,3, 5, 10))

my_array.ndim

The `.shape` attribute gives us the shape (the number of elements along each dimension of the array):

In [27]:
my_array = np.ones((4, 3, 3, 10))

In [None]:
my_array.shape

### `.dtype`

The `.dtype` attribute gives the type of the elements in the array:

In [None]:
my_array = np.full((4,3,5), 10.)
my_array.dtype

### `.size`

Do not confuse `.shape` with `.size`! `.size` gives the numbers of elements in the whole matrix:

In [None]:
my_array = np.full((4,3), 3)
my_array.size

Note that an array's size is the product of its shape's elements (i.e., a (4,3) array contains 12 elements because 4*3 = 12)

In [None]:
my_array.size == (my_array.shape[0] * my_array.shape[1])

### Convert lists to arrays

One way of creating an array is to convert an existing list into an array with the `np.array()` function:

In [None]:
my_list = [1,2,3,4]
np.array(my_list)

Converting **lists of lists** will add more dimensions:

In [None]:
my_list = [[1,2,3,4], [2,3,4,5]]
np.array(my_list)

If you do so, make sure that all lists have the same length!

#### `np.arange()`

We can create ordered sequences of numbers using `np.arange()`:

In [None]:
np.arange(10)  # numbers from 0 to 9

We can optionally specify start, end, and step of the sequence (start and steps are optional; the logic is the same as for the list indexing)

In [None]:
np.arange(1, 10,3)  # numbers from 1 to 10 in steps of 2:

#### `np.linspace()`

Alternatively, we can use `np.linspace()` to generate n=`num` equally spaced numbers in a specified range:

In [None]:
# 5 equispaced values between 0 and 10, included:
np.linspace(0, 1, 100) # aka np.linspace(0, 10, num=5)

### Random arrays

We can use the `np.random` submodule to create random arrays. For example:

In [None]:
np.random.randint(0, 10, (3,2))  # we pass min, max, and desired shape of the random array

(Practicals 1.0.0)

## Indexing arrays

We have three ways of indexing arrays:
 - **slicing**: as for lists (specifying single values, or start/end/steps)
 - **integer indexing**: specifying with lists/arrays/tuples of indexes which elements to keep
 - **boolean indexing**: using `True`/`False` lists/arrays/tuples to specify which elements to keep

### Slicing

We can index arrays as we were doing with lists (this operation is called array **slicing**). For a 1D array:

In [None]:
my_vect = np.arange(0, 10)
print(my_vect[:5])  # first 5
print(my_vect[-3:])  # last 3
print(my_vect[:6:2])  # first 6, one every two

print(my_vect[:])  # this is a "null" indexing that returns all values

But! With `np.ndarray`s we have more flexibility than with lists!

### Indexing with arrays (or lists) of integer indexes

We can index passing an array (or a list) of the index values that we want to retrieve!

In [None]:
my_vect = np.random.normal(0, 10, 4)  # 4 values from a normal distribution of mean 0, std 10
print(my_vect)

In [None]:
index_list = [0, 2]  # to be indexes, those values have to be integers!

my_vect[index_list]  # we pass the index list (or array) in square brackets

In [None]:
index_list = [0.0, 2.0]  # won't work with floats

my_vect[index_list]

### Boolean indexing

Alternatively, we can use arrays of boolean values.

The boolean indexing vector must have the same shape of the array to be indexed!

In [None]:
my_vect = np.array([1,2,3,4, 5])

boolean_selector = [True, False, True, False, False]

my_vect[boolean_selector]

Boolean indexing is a powerful way of filtering arrays based on some criterion:

In [None]:
thr = 3
boolean_selector = my_vect > thr  # this operation returns a boolean array with the element-wise results
print(boolean_selector)

In [None]:
my_vect[boolean_selector]

### Multidimensional indexing

Many times, we want to index independently one or many axes of an n-dimensional array.

We can index over multiple dimensions specifying **comma-separated** indexes along each dimension:

In [None]:
my_mat = np.array([[1,   2,  3,  4, 5],
                   [6,   7,  8,  9, 10],
                   [11, 12, 13, 14, 15]])

# this idexes the first dimension (select the 3rd row) and leaves all elements over the second (:):
print(my_mat[2, :])

# this idexes the second dimension (select the 1st column) and leaves all elements over the rows (:):
print(my_mat[:, 0])

In [None]:
# this takes one every 2 columns, for the first 2 rows:
my_mat[:2, ::2]

If we specify a single index, `numpy` assumes you're indexing the first dimension:


In [None]:
my_mat[2]  # the same as writing my_mat[2, :], but discouraged!

### Combining indexing

We can use any combination of indexes and boolean selectors for each axis, as long as the dimension matches:

In [None]:
my_mat = np.array([[1,   2,  3,  4, 5],
                   [6,   7,  8,  9, 10],
                   [11, 12, 13, 14, 15]])

# Boolean indexing over one dimesion
boolean_selector = np.array([True, False, False])
my_mat[boolean_selector, :]

In [None]:
# Boolean selector over one axis, numerical indexing over another axis, slicing over the third axis:
my_mat[boolean_selector, [1,2, 4]]

## array views

Indexing operations return **views** on the original arrays, NOT COPIES! Changing values in the slice we will also alter the original array!

In [None]:
# import numpy as np
my_mat = np.array([[1,   2,  3,  4, 5],
                   [6,   7,  8,  9, 10],
                   [11, 12, 13, 14, 15]])

a_slice = my_mat[2, :].copy()  # this is a view of the original data, not a copy!
print(a_slice)

a_slice[0] = 2000  # change the entry unless we use the .copy()


In [None]:
my_mat

### Mind singleton dimensions!

Arrays can have **singleton dimensions** - _i.e._ dimensions along which there is a single entry.
In python, an array of size `(4,)` and an array of size `(4,1)` are different even if they look the same!

In [None]:
my_arr = np.zeros(4)  # this is a 1D array

print(f"{my_arr}; shape: {my_arr.shape}")

In [None]:
my_arr_1 = np.zeros((4, 1))  # this is a 2D array with a singleton dimension!
print(f"{my_arr_1}; shape: {my_arr_1.shape}")

Sometimes, it can be useful to quickly add a singleton dimension:

In [None]:
my_arr = np.zeros(4)  # 1D array

# With this special indexing, we artificially add a singleton dimension on first dim:
my_arr = my_arr[np.newaxis, :]  

print(f"{my_arr}; shape: {my_arr.shape}")

## Visualize arrays and matrices

We can visualize arrays and matrices using the `matplotlib.pyplot` library.

In [62]:
from matplotlib import pyplot as plt  # code you'll write many times in Python...
import numpy as np

We can plot 1D arrays (or lists!) using `plt.plot()`:

In [None]:
random_vect = np.random.randn(100)  # normal random vals array of shape 100
plt.plot(random_vect)

We can also plot multiple arrays together:

In [None]:
# each line will be an element over the second dimension, in this case we will have 2 lines):
random_vect = np.random.randn(100, 2)  # normal random vals array of shape (100, 2)
plt.plot(random_vect)

We can visualize 2D matrices with `plt.matshow()`

In [None]:
random_mat = np.random.randint(0, 255, (100, 200))  # (100, 200) shape array of random integers between 0 and 255:

plt.matshow(random_mat)  # plt.imshow()
plt.colorbar()

This can be convenient to check quickly the results of slicing:

In [None]:
plt.matshow(random_mat[:3, ::10])

There's much more to the matplotlib library, we'll discover more things as we go!

(Practicals 1.0.1)

In [None]:
a = np.concatenate([np.full((1,4), i) for i in range(5)], axis=0)
print(a)

In [None]:
b = np.ones(5)
print("shape a: ", a.shape)
print("shape b: ", b.shape)

# this will not work as the rightmost dimensions are 5 and 4:
a + b 

To make it work, we can use a trick: add a new "dummy" singleton dimension to `b` that will be broadcasted with the syntax `[:, np.newaxis]`

In [None]:
b_twodim = b[:, np.newaxis]  # This does the trick by adding a dummy singleton dimension
print("shape a: ", a.shape)
print("shape b_twodim: ", b_twodim.shape)
a + b_twodim  # now the last dimension is compatible between the two arrays:

Practicals 1.0.1