# Numpy

NumPy is the fundamental package for scientific computing in Python. It is a Python library that provides a multidimensional array object, various derived objects (such as masked arrays and matrices), and an assortment of routines for fast operations on arrays, including mathematical, logical, shape manipulation, sorting, selecting, I/O, discrete Fourier transforms, basic linear algebra, basic statistical operations, random simulation and much more.

In this seccion, we’ll look at some of the main ways to use NumPy and how it can represent different types of data (tables, images, text…etc) before we can serve them to machine learning models.

As mentioned before `numpy` is a Python package and should be imported before can be used:

In [1]:
import numpy as np

## Creating Arrays

We can create a NumPy array by passing a python list to it and using `np.array()`. In this case, python creates the array we can see on the right here:

![Creating array](img/create-numpy-array-1.png)

In [2]:
np.array([1, 2, 3])

array([1, 2, 3])

There are often cases when is required NumPy to initialize the values of the array for us. NumPy provides methods like `ones()`, `zeros()`, and `random.random()` for these cases. We just pass them the number of elements we want it to generate:

![Creating ones_zeros_random](img/create-numpy-array-ones-zeros-random.png)

In [14]:
np.random.random(3)

array([0.79908316, 0.72695767, 0.1991393 ])

Similar to Python `range()` function, NumPy provides a `arange()` funtion to generate arrays:

In [72]:
np.arange(10)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [73]:
np.arange(1, 10)

array([1, 2, 3, 4, 5, 6, 7, 8, 9])

In [74]:
np.arange(1, 10, 2)

array([1, 3, 5, 7, 9])

Once an array is created, we can start to manipulate them in interesting ways. 

## Array Arithmetic

Let’s create two NumPy arrays to showcase their usefulness. We’ll call them data and ones:

![example1](img/numpy-arrays-example-1.png)

In [15]:
data = np.array([1, 2])
ones = np.ones(2)

Adding them up position-wise (i.e. adding the values of each row) is as simple as typing data + ones:

![adding1](img/numpy-arrays-adding-1.png)

In [16]:
data + ones

array([2., 3.])

Note how such an abstraction like this prevents having to program such a calculation in loops. It’s a wonderful abstraction that allows you to think about problems at a higher level.

And it’s not only addition that we can do this way:

![subs_mult_div1](img/numpy-array-subtract-multiply-divide.png)

In [17]:
data - ones

array([0., 1.])

In [18]:
data * data

array([1, 4])

In [19]:
data / data

array([1., 1.])

## Broadcasting 

There are often cases when is required to carry out an operation between an array and a single number (known as an operation between a vector and a scalar). For example, `data` array represents distance in miles, and we want to convert it to kilometers. We simply say:

![scalar_ops](img/numpy-array-broadcast.png)

In [22]:
data * 1.6

array([1.6, 3.2])

Notice how NumPy understood that operation to mean that the multiplication should happen with each cell. That concept is called **broadcasting**, and it’s very useful.

## Indexing

Numpy arrays can be indexed and sliceed in all the ways we can slice python lists:

![index_slice](img/numpy-array-slice.png)

In [27]:
data = np.array((1, 2, 3))
data[0]

1

In [24]:
data[1]

2

In [28]:
data[0:2]

array([1, 2])

In [29]:
data[1:]

array([2, 3])

## Aggregation

Additional benefits NumPy gives us are aggregation functions:

![agregations1](img/numpy-array-aggregation.png)

In [30]:
data.max()

3

In [31]:
data.min()

1

In [32]:
data.sum()

6

In addition to min, max, and sum, you get all the greats like mean to get the average, prod to get the result of multiplying all the elements together, std to get standard deviation, and [plenty of others](https://jakevdp.github.io/PythonDataScienceHandbook/02.04-computation-on-arrays-aggregates.html).

## 2 dimensions :: 2D

All the examples so far shown deal with vectors in one dimension. A key part of the beauty of NumPy is its ability to apply these operations to any number of dimensions.

### Creating Matrices

We can pass python lists of lists in the following shape to have NumPy create a matrix to represent them:

![array_create_2d](img/numpy-array-create-2d.png)

In [33]:
np.array([[1, 2], [3, 4]])

array([[1, 2],
       [3, 4]])

The same methods mentioned above (ones(), zeros(), and random.random()) can be used to create multi-dimentional arrays, as long as we give them a tuple describing the dimensions of the matrix we are creating:

![mattrix_ones_zeros](img/numpy-matrix-ones-zeros-random.png)

In [34]:
np.ones((3, 2))

array([[1., 1.],
       [1., 1.],
       [1., 1.]])

In [35]:
np.zeros((3, 2))

array([[0., 0.],
       [0., 0.],
       [0., 0.]])

In [37]:
np.random.random((3, 2))

array([[0.82630354, 0.08662493],
       [0.9852385 , 0.84994476],
       [0.49167097, 0.8544155 ]])

Iterating over multidimensional arrays is done with respect to the first axis:

In [106]:
for row in np.random.random((3, 2)):
    print(row)

[0.53401533 0.51058443]
[0.29770069 0.61887417]
[0.21380881 0.27476924]


However, if one wants to perform an operation on each element in the array, one can use the `flat` attribute which is an iterator over all the elements of the array:

In [107]:
for item in np.random.random((3, 2)).flat:
    print(item)

0.7502288946545267
0.49038170326113784
0.18011613541799254
0.2409503014568425
0.2866269035260519
0.10961356273237899


### Matrix Arithmetic

Matrices can be added and multiplied using arithmetic operators (+-*/) if the two matrices are the same size. NumPy handles those as position-wise operations:

![matrix_arith](img/numpy-matrix-arithmetic.png)

In [43]:
data = np.array([[1, 2], [3, 4]])
ones = np.ones((2, 2))

data + ones

array([[2., 3.],
       [4., 5.]])

We can get away with doing these arithmetic operations on matrices of different size only if the different dimension is one (e.g. the matrix has only one column or one row), in which case NumPy uses its **broadcast** rules for that operation:

![matrix_broadcast](img/numpy-matrix-broadcast.png)

In [47]:
data = np.array([[1, 2], [3, 4], [5, 6]])
ones_row = np.ones(2)
data + ones_row

array([[2., 3.],
       [4., 5.],
       [6., 7.]])

### Dot Product

A key distinction to make with arithmetic is the case of [matrix multiplication](https://www.mathsisfun.com/algebra/matrix-multiplying.html) using the dot product. NumPy gives every matrix a dot() method we can use to carry-out dot product operations with other matrices:

![matrix_dot_product1](img/numpy-matrix-dot-product-1.png)

Two matrices have to have the same dimension on the side they face each other with. You can visualize this operation as looking like this:

![matrix_dot_product2](img/numpy-matrix-dot-product-2.png)

In [79]:
data = np.arange(1, 4)
power_of_ten = np.array([1, 10, 100, 1000, 10000, 100000]).reshape(3, 2)

data.dot(power_of_ten)

array([ 30201, 302010])

### Matrix Indexing

Indexing and slicing operations become even more useful when manipulating matrices.

![matrix_indexing1](img/numpy-matrix-indexing.png)

In [91]:
data = np.array([[1, 2], [3, 4], [5, 6]])

data[0, 1]

2

In [92]:
data[1:3]

array([[3, 4],
       [5, 6]])

### Matrix Aggregation

Matrices can be agregated the same way vectors can:

![matrix_agregation1](img/numpy-matrix-aggregation-1.png)

Not only can we aggregate all the values in a matrix, but we can also aggregate across the rows or columns by using the axis parameter:

![matrix_agregation2](img/numpy-matrix-aggregation-4.png)

In [93]:
data.max()

6

In [94]:
data.min()

1

In [97]:
data.sum()

21

### Transposing and Reshaping

A common need when dealing with matrices is the need to rotate them. This is often the case when we need to take the dot product of two matrices and need to align the dimension they share. NumPy arrays have a convenient property called T to get the transpose of a matrix:

![traspose1](img/numpy-transpose.png)

In [108]:
data

array([[1, 2],
       [3, 4],
       [5, 6]])

In [109]:
data.T

array([[1, 3, 5],
       [2, 4, 6]])

In more advanced use case, you may find yourself needing to switch the dimensions of a certain matrix. This is often the case in machine learning applications where a certain model expects a certain shape for the inputs that is different from your dataset. NumPy’s `reshape()` method is useful in these cases. You just pass it the new dimensions you want for the matrix. You can pass -1 for a dimension and NumPy can infer the correct dimension based on your matrix:

![reshape1](img/numpy-reshape.png)

In [111]:
data = np.arange(1, 7)
data

array([1, 2, 3, 4, 5, 6])

In [112]:
data.reshape(2, 3)

array([[1, 2, 3],
       [4, 5, 6]])

In [113]:
data.reshape(3, 2)

array([[1, 2],
       [3, 4],
       [5, 6]])

### Yet More Dimensions

NumPy can do everything mentioned in any number of dimensions. Its central data structure is called ndarray (N-Dimensional Array) for a reason.

![3d_array1](img/numpy-3d-array.png)

In [114]:
np.array([
    [
        [1, 2], [3, 4]
    ],
    [
        [5, 6], [7, 8]
    ]
])

array([[[1, 2],
        [3, 4]],

       [[5, 6],
        [7, 8]]])

Dealing with a new dimension is just adding a comma to the parameters of a NumPy function:

![3d_creation1](img/numpy-3d-array-creation.png)

## Practical Usage

Next are some examples of the useful things NumPy will help you through.

### Formulas

Implementing mathematical formulas that work on matrices and vectors is a key use case to consider NumPy for. It’s why NumPy is the darling of the scientific python community. For example, consider the mean square error formula that is central to supervised machine learning models tackling regression problems:

![mean_sq_formula1](img/mean-square-error-formula.png)

Implementing this is a breeze in NumPy:

![mean_sq_formula2](img/numpy-mean-square-error-formula.png)

The beauty of this is that numpy does not care if predictions and labels contain one or a thousand values (as long as they’re both the same size). 

Let's walk through an example stepping sequentially through the four operations in that line of code:

![mean_sq_formula3](img/numpy-mse-1.png)

Both the predictions and labels vectors contain three values. Which means n has a value of three. After we carry out the subtraction, we end up with the values looking like this:

![mean_sq_formula4](img/numpy-mse-2.png)

Then we can square the values in the vector:

![mean_sq_formula5](img/numpy-mse-3.png)

Now we sum these values:

![mean_sq_formula6](img/numpy-mse-4.png)

Which results in the error value for that prediction and a score for the quality of the model.

### Data Representation

Think of all the data types you’ll need to crunch and build models around (spreadsheets, images, audio…etc). So many of them are perfectly suited for representation in an n-dimensional array:
Tables and Spreadsheets

A spreadsheet or a table of values is a two dimensional matrix. Each sheet in a spreadsheet can be its own variable. The most popular abstraction in python for those is the pandas dataframe, which actually uses NumPy and builds on top of it.

![data_rep](img/excel-to-pandas.png)

### Audio and Timeseries

An audio file is a one-dimensional array of samples. Each sample is a number representing a tiny chunk of the audio signal. CD-quality audio may have 44,100 samples per second and each sample is an integer between -32767 and 32768. Meaning if you have a ten-seconds WAVE file of CD-quality, you can load it in a NumPy array with length 10 * 44,100 = 441,000 samples. Want to extract the first second of audio? simply load the file into a NumPy array that we’ll call audio, and get audio[:44100].

Here’s a look at a slice of an audio file:


![audio1](img/numpy-audio.png)

The same goes for time-series data; for example, the price of a stock over time.

### Images

An image is a matrix of pixels of size (height x width).

If the image is black and white (a.k.a. grayscale), each pixel can be represented by a single number (commonly between 0 (black) and 255 (white)). Want to crop the top left 10 x 10 pixel part of the image? Just tell NumPy to get you `image[:10,:10]`.

Here’s a look at a slice of an image file:

![numpy-grayscale-image](img/numpy-grayscale-image.png)

If the image is colored, then each pixel is represented by three numbers - a value for each of red, green, and blue. In that case we need a 3rd dimension (because each cell can only contain one number). So a colored image is represented by an ndarray of dimensions: (height x width x 3).

![numpy-color-image](img/numpy-color-image.png)

This content is based on Alammar, J (2018). [A Visual Intro to NumPy and Data Representation](https://jalammar.github.io/visual-numpy/)