# Numpy basics
Numpy is a library that provides an easy interface to work with N-dimensional arrays, instead of using the pure python alternative that is nested lists.

**The main problem we can have using numpy is that we cannot use values that are not numeric** (see: https://numpy.org/doc/stable/user/basics.types.html for all the available types). 
*Although this is not entirely true, as we will see when using pandas, this library is focused on working with numbers.*

We can install it using the following command in our shell (usually inside a virtualenv):

In [1]:
!pip install numpy



Once we have it, let's see how to create an array, access a single value, or access an entire row or column.

In [2]:
import numpy

pure_python_data = [
    [1,  2,  3,  4],
    [5,  6,  7,  8],
    [9, 10, 11, 12]
]

array = numpy.array(pure_python_data)

print(array)
print(type(array))

[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]
<class 'numpy.ndarray'>


We can fetch additional information of the array:

In [3]:
print("There are", array.ndim, "dimensions in the array")
print("The shape of the array is", array.shape)
print("In total, there are", array.size, "values")
print("We have an array of", array.dtype)

There are 2 dimensions in the array
The shape of the array is (3, 4)
In total, there are 12 values
We have an array of int64


Other creation methods:

In [4]:
numpy.array([i for i in range(1,13)]).reshape( (2, 3, 2) )

array([[[ 1,  2],
        [ 3,  4],
        [ 5,  6]],

       [[ 7,  8],
        [ 9, 10],
        [11, 12]]])

In [5]:
numpy.eye(10,5)

array([[1., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0.],
       [0., 0., 1., 0., 0.],
       [0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 1.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.]])

For other methods see: https://numpy.org/doc/stable/reference/routines.array-creation.html

## Numpy axes

One of the most difficult concept of numpy is the concept of axes (we will see later why).

It is important to have this concept clear, as it will avoid having troubles when using numpy functions such as `sum`, `mean`, `max`, `min`...

Assuming two dimensions, we have the following array:

In [6]:
numpy.eye(5,6, dtype=numpy.int)

array([[1, 0, 0, 0, 0, 0],
       [0, 1, 0, 0, 0, 0],
       [0, 0, 1, 0, 0, 0],
       [0, 0, 0, 1, 0, 0],
       [0, 0, 0, 0, 1, 0]])

When we talk that we are operating on the axis 0, we are talking about traversing the array in the direction of how the rows are span:

![Source: https://www.sharpsightlabs.com/blog/numpy-axes-explained/](https://vrzkj25a871bpq7t1ugcgmn9-wpengine.netdna-ssl.com/wp-content/uploads/2018/12/numpy-axis0.png)
Source: https://www.sharpsightlabs.com/blog/numpy-axes-explained/

Thus, when we apply an operation over the row axis, we are collapsing the rows into a single row, while keeping all the other dimensions.

The next dimension is the columns, so when we apply an operation over the column axis (axis 1) we are collapsing the columns.

**We will come at this later.**

## Accessing array values

If we want to access a single element, with python lists we would use nested indexing, such as:

In [7]:
pure_python_data[1][3]

8

With numpy, we can access the value using only one indexing that combines both the first and second dimensions:

In [8]:
array[1, 3]

8

If we skip a dimension, we get all the values in the dimension we skipped.

For example, not specifying the column we get the row with all the columns:

In [9]:
array[1,]

array([5, 6, 7, 8])

This resembles the code that we use with pure python (to get a row we use `list[row_no]`). The advantage of using numpy is that we can also access columns for example:

In [10]:
array[:, 2]

array([ 3,  7, 11])

**Note** that we have to set the `:` indexing value (fetch all values) for all the dimensions that we skip before specifying a value, to be explicit about which dimension we are using.

This simplifies the code when we are working with lists of data. 

For example, when implementing the algorithm KMeans, we must compute the centroid by calculating the mean point of all the points assigned to this centroid.

With pure python, we used:

In [11]:
points = numpy.array([
    (1, 2),
    (3, 2),
    (4, 4)
])

def mean_points(points):
    n_feats = len(points[0])
    acc = [0.0] * n_feats
    for i in range(n_feats):
        for p in points:
            acc[i] += p[i]
        
        acc[i] /= len(points)
    
    return acc
print(mean_points(points))

[2.6666666666666665, 2.6666666666666665]


Using numpy, we can set this function to be:

In [12]:
def mean_points(points):
    n_points, n_feats = points.shape
    acc = [sum(points[:,i]) / n_points for i in range(n_feats)]
    return acc

print(mean_points(points))

[2.6666666666666665, 2.6666666666666665]


### Numpy convenient  methods

As some operations are common in mots user cases, Numpy provides some methods to apply those common operations to an array. Thus, if we want to sum an entire array we would not write an iterator, but just use the `.sum` method:

In [13]:
def mean_points(points):
    n_points, n_feats = points.shape
    return points.sum(axis=0) / n_points
print(mean_points(points))

[2.66666667 2.66666667]


Other available methods that we have are:

- `.min`
- `.max`
- `.mean`
- `.median`
- ...

So we can improve even more our mean_points function to be:

In [14]:
def mean_point(points):
    # return points.mean(axis=0)
    return numpy.mean(points, axis=0)

print(mean_point(points))

[2.66666667 2.66666667]


## Operations using numpy arrays

We can apply arithmetic operators between arrays *elementwise*. This means that for example we can sum a matrix with another one directly.

**Note the "elementwise". When multiplying matrices using `a*b` in Numpy, it will not apply the same rules we use in maths.**

In [15]:
array1 = numpy.arange(9).reshape(3, 3)
# arange creates a single dimension array with the elements
# from 0 to 8
# with reshape we give it the desired shape (a 3x3 matrix)

array2 = numpy.ones((3,3), dtype=numpy.int32)
# ones creates an array with the desired shape (3,3) filled with ones

print(array1)
print(array2)

array1 * array2

[[0 1 2]
 [3 4 5]
 [6 7 8]]
[[1 1 1]
 [1 1 1]
 [1 1 1]]


array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

In [16]:
print((array1 * array2) / 2.)
print((array1 * array2) / 2)   # <- division between ints is an int in python 2

[[0.  0.5 1. ]
 [1.5 2.  2.5]
 [3.  3.5 4. ]]
[[0.  0.5 1. ]
 [1.5 2.  2.5]
 [3.  3.5 4. ]]


## Conclusion

This was a really basic introduction to the numpy library. This library is really extense, and has lots of methods to help us deal with matrix operations.

Numpy is interesting by itself, but also when using other frameworks such as Pandas, Tensorflow, Pytorch... you will see that they resemble the language used by this library.

For more information, check the official documentation at https://numpy.org/doc/stable/