# Data Structures: `NumPy`

[docs](https://numpy.org/doc/stable/reference/index.html)  
[absolute basics for beginngers](https://numpy.org/doc/stable/user/absolute_beginners.html)  
[quickstart](https://numpy.org/doc/stable/user/quickstart.html)  
[w<sup>3</sup>](https://realpython.com/numpy-tutorial/), [RealPython](https://realpython.com/numpy-tutorial/)

What is `NumPy`? It is a library providing additional functionalities for **arrays**. This not only becomes very useful for a lot of numerical manipulations, but also because of its underlying optimisations: operations on arrays are often faster than native Python loops!

In [1]:
# the convention is to give the alias `np`
import numpy as np

## Array creation

[absolute basics: creation](https://numpy.org/doc/stable/user/absolute_beginners.html#how-to-create-a-basic-array)  
[quickstart: creation](https://numpy.org/doc/stable/user/quickstart.html#array-creation)

[`np.array` doc](https://numpy.org/doc/stable/reference/generated/numpy.array.html)  
[`np.ones` doc](https://numpy.org/doc/stable/reference/generated/numpy.ones.html)  
[`np.zeros` doc](https://numpy.org/doc/stable/reference/generated/numpy.zeros.html)  
[`np.arange` doc](https://numpy.org/doc/stable/reference/generated/numpy.arange.html)  
[`np.linspace` doc](https://numpy.org/doc/stable/reference/generated/numpy.linspace.html)  

In [None]:
python_list_of_lists = [[1,2,3],[4,5,6],[7,8,9]]

# turn a Python list into a numpy array
np_list_of_lists = np.array(python_list_of_lists)

# # same as:
# np_list_of_lists = np.array([[1,2,3],[4,5,6],[7,8,9]])

np_list_of_lists

In [None]:
# pass the shape as a tuple
np.ones(shape=(3,3))
# # same as:
# np.ones((3,3))

In [None]:
np.zeros((2,4))

In [None]:
np_range = np.arange(24)
np_range

In [None]:
# create 20 equally spaced numbers between -10 and 10
np.linspace(-10, 10, 20)

## Array Shapes

[absolute basics: attributes](https://numpy.org/doc/stable/user/absolute_beginners.html#array-attributes)  
[quickstart: printing](https://numpy.org/doc/stable/user/quickstart.html#printing-arrays)

In [None]:
np_list_of_lists

In [None]:
# the shape is 3x3 (3 rows, 3 columns)
np_list_of_lists.shape

## Reshaping

[absolute basics: reshape](https://numpy.org/doc/stable/user/absolute_beginners.html#can-you-reshape-an-array)  
[quickstart: printing](https://numpy.org/doc/stable/user/quickstart.html#printing-arrays)

[`np.reshape` doc](https://numpy.org/doc/stable/reference/generated/numpy.reshape)

In [None]:
np_range

In [None]:
np_range.shape

In [None]:
# pass a tuple describing the shape (or separate arguments)
# NOTE: it must be compatible with the number of elements! (2 x 12 = 24)
np_range.reshape((2,12))

# # same as
# np_range.reshape(2,12)

In [None]:
np_range.reshape((2,3,4))

In [None]:
np_range.reshape((4,3,2))

## Indexing & Slicing

[absolute basics: indexing and slicing](https://numpy.org/doc/stable/user/absolute_beginners.html#indexing-and-slicing)  
[quickstart: indexing, slicing and iterating](https://numpy.org/doc/stable/user/quickstart.html#indexing-slicing-and-iterating)

In [None]:
np_list_of_lists

If we want to select the first row, all fine, Python and NumPy are the same. But if we want the first **column**, then Python is clunky, requires a loop:

In [None]:
# row ok
print(python_list_of_lists[0])
print(np_list_of_lists[0])

In [None]:
# column uh oh
print([row[0] for row in python_list_of_lists])

# yay indexing! 
# - ':' means all elements in the first dimension (all rows)
# - '0' in the second dimension (first element of each column)
print(np_list_of_lists[:, 0])

In [None]:
np_list_of_lists

In [None]:
# in each axis, slicing works exactly like in regular Python
print(np_list_of_lists[1:3, 1])

The `[]` syntax to select elements is the same, just with additional functionalities:

```python
      ← outer axes              inner axes →
array[first axis, second axis, ... last axis]
```

In [None]:
# the use of `...` means: take everything in all the remaining axes
print(np_list_of_lists[-1:, ...])

## Methods

[absolute basics: more useful operations](https://numpy.org/doc/stable/user/absolute_beginners.html#more-useful-array-operations)  
[quickstart: basic operations](https://numpy.org/doc/stable/user/quickstart.html#basic-operations)

Many, many available operations! Often they can be found either as method (`array.max()`) or as functions (`np.max(array)`).

Examples:
- `.min() / .max()`
- `.sum() / .mean() / .std()`
- `.exp() / .log() / .sqrt() / etc.`

In [None]:
np_list_of_lists

In [None]:
np_list_of_lists.max()

In [None]:
# max of each column, try 1 for rows
np_list_of_lists.max(axis=0)

In [None]:
# can also accept an `axis` argument, like most methods
np_list_of_lists.mean()

## Random Numbers

[Random Sampling doc](https://numpy.org/doc/stable/reference/random/index.html#random-sampling)  
[RealPython tutorial](https://realpython.com/numpy-random-number-generator/)

In [None]:
seed = 42

# instantiate a random number generator object
rng = np.random.default_rng(seed)

### Integers

[`rng.integers` doc](https://numpy.org/doc/stable/reference/random/generated/numpy.random.Generator.integers.html)  
[`np.random.randint` doc](https://numpy.org/doc/stable/reference/random/generated/numpy.random.RandomState.randint.html) (legacy)

In [None]:
# create matrix of 2x2, integers from 0 to 10 [exclusive]
rng.integers(0, 10, (2,2))

# # old way:
# np.random.randint(0, 10, (2,2))

### Sequences

[`rng.choice` doc](https://numpy.org/doc/stable/reference/random/generated/numpy.random.Generator.choice.html)  
[`np.random.choice` doc](https://numpy.org/doc/stable/reference/random/generated/numpy.random.RandomState.choice.html) (legacy)

In [None]:
# create matrix of 2x2, numbers sampled uniformly from the given list, with replacement
rng.choice([1,2,3,4,5,6], replace=True, size=(2,2))

# # old way:
# np.random.choice([1,2,3,4,5,6], replace=True, size=(2,2))

[`rng.shuffle` doc](https://numpy.org/doc/stable/reference/random/generated/numpy.random.Generator.shuffle.html)  
[`np.random.shuffle` doc](https://numpy.org/doc/stable/reference/random/generated/numpy.random.RandomState.shuffle.html) (legacy)

In [None]:
# shuffle the elements in-place!
a = [1,2,3,4,5,6]
rng.shuffle(a)

# # old way:
# np.random.shuffle(a)
a

### Floats (distributions)

[`rng.random` doc](https://numpy.org/doc/stable/reference/random/generated/numpy.random.Generator.random.html)  
[`np.random.rand` doc](https://numpy.org/doc/stable/reference/random/generated/numpy.random.RandomState.rand.html) (legacy)

In [None]:
# uniform distribution within [0, 1]
rng.random()

In [None]:
# create matrix of 2x2, numbers sampled from the uniform distribution, in [0, 1)
rng.random(size=(2,2))

[`rng.uniform` doc](https://numpy.org/doc/stable/reference/random/generated/numpy.random.Generator.uniform.html)   
[`np.random.uniform` doc](https://numpy.org/doc/stable/reference/random/generated/numpy.random.RandomState.uniform.html) (legacy)

In [None]:
# create matrix of 2x2, numbers from a uniform distribution in [2, 3]
rng.uniform(2, 4, size=(2,2))

# # old way
# np.random.uniform(2, 4, size=(2,2))

[`rng.normal` doc](https://numpy.org/doc/stable/reference/random/generated/numpy.random.Generator.normal.html)  
[`np.random.randn` doc](https://numpy.org/doc/stable/reference/random/generated/numpy.random.RandomState.randn.html) (legacy)  
[`np.random.normal` doc](https://numpy.org/doc/stable/reference/random/generated/numpy.random.RandomState.normal.html) (legacy)

In [None]:
# one number sampled from a standard normal distribution
rng.normal()

In [None]:
# create matrix of 2x2, numbers from a gaussian of mu 3, sigma 10
# (`loc` is mu, `scale` is sigma)
rng.normal(loc=3, scale=10, size=(2,2))

# # old way, same matrix, numbers from standard normal (mu = 0, sigma = 1)
# np.random.randn(2,2)

# # old way, same matrix, numbers from a normal with mu = 3, sigma = 10
# np.random.normal(loc=3, scale=10, size=(2,2))

## Extra: Broadcasting

[doc](https://numpy.org/doc/stable/user/basics.broadcasting.html)  
[absolute basics: broadcasting](https://numpy.org/doc/stable/user/absolute_beginners.html#broadcasting)

Broadcasting is one of the great powers of `NumPy`. It allows you to operate on an entire array in one go (again eliminating the need for loops)!

![Sasha Rush, broadcasting](pics/srush-broadcasting.png)
[Source](https://twitter.com/srush_nlp/status/1516781757596680194?t=RwVp5kUWPvHG-e42wo0ryw&s=19)

In [None]:
# fails
python_list_of_lists + 1

In [None]:
[[x + 1 for x in row] for row in python_list_of_lists]


In [None]:
np_list_of_lists + 1

In [None]:
# it's also possible to combine arrays
np_row_vector = np.array([[5,10,15]])

print("shape:", np_row_vector.shape)
print(np_row_vector)
print()

# add the vector to every row
print("shape:", np_list_of_lists.shape)
print(np_list_of_lists + np_row_vector)

In [None]:

# `.T` transposes the vector, same as `np.transpose(np_row_vector)``
np_column_vector = np_row_vector.T
print("shape:", np_column_vector.shape)
print(np_column_vector)
print()

# add the vector to every column
print("shape:", np_list_of_lists.shape)
print(np_list_of_lists + np_column_vector)
