This Notebook was based in the incredible classes taught by Prof. Sebastian Raschka, which are all made available for free in his [YouTube Channel](https://www.youtube.com/watch?v=I8vRP4GVs_E).

## Why Numpy? Why not staying strict to Python's default data types for multidimensional data? 
#### *(A practical, yet somehow detailed answer)*

The short answer is: because using Numpy is much more efficient than using Python's default data types for manipulating lists of numeric values.    
Numpy is written in [multiple programming languages](https://github.com/numpy/numpy), such as C and C++, and this allows it to be much more faster. 
While in Python's _lists every element acts like a pointer to a random position in memory and it allows users to store multiple different data types within the same variable, storing both the address and the element content in the background, Numpy uses contiguous blocks of memory, allowing our computer to do caching and for performing much faster lookup algorithms in RAM, with the restriction of using a single data type per variable.  
As an example:  
> Let's say you have 8 64-bit integers. Since you know the size that your array would have (512 bits), you would not have to store all of the memory addresses for all the elements in your array, but only the first one's address, and, from it, you would be able to find every other element with an efficient lookup algorithm.  

Apart from the computational advantages, Numpy can also be more elegant and readable for performing vectorized operations and broadcasting.

**Summing it up:** essentialy, you can and should use lists whenever you want to handle different data types within the same variable, when you have a smaller dataset or when you simply don't have a performance requirement. However, in scenarios where you need to manipulate a large amount of data, numeric data, and have a performance requirement, you should go for Numpy- it is the best choice for scientific computing, on most of the cases! 

## Demonstration and performance comparison: Dot Product
> A and B are n-dimensional vectors  
> A's dimension: 1 x n  
> B's dimension: 1 x n  
> C = A . B = A \* B^T = A1 \* B1 + A2 \* B2 + A3 \* B3 + ... + An \* Bn  

In [1]:
import numpy as np

In [2]:
# Dot product using Lists
def dot_product_list(A, B):
    return sum([A[i] * B[i] for i in range(len(A))])

# Dot product using Numpy
def dot_product_numpy(A, B):
    return np.dot(A, B)

In [3]:
# Defining two big floating point lists and its respectives ndarrays
A = [i/5.5 for i in range(1000000)]
B = [i/8.8 for i in range(1000000)]
A_numpy = np.array(A)
B_numpy = np.array(B)

In [4]:
# Measuring the time it takes for the dot product to be executed using Lists
%timeit C = dot_product_list(A, B)

179 ms ± 28.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [5]:
# Measuring the time it takes for the dot product to be executed using Numpy
%timeit C_numpy = dot_product_numpy(A_numpy, B_numpy)

676 µs ± 19.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


The execution time speaks for itself! Numpy did much better in performance.

## Exploring some nice Numpy features:

### Custom Data Types:

Numpy allows for custom data types, such as float16 or float32.  
While this is a nice feature for adding more control over your data types, you must be careful because using less bits implies in less precision, and, in Scientific Computing, using higher precision data types, such as float64, is highly encouraged.

In [6]:
type(A_numpy)

numpy.ndarray

In [7]:
type(A_numpy[0])

numpy.float64

In [8]:
A_numpy = A_numpy.astype(np.float16)

In [9]:
type(A_numpy[0])

numpy.float16

### Useful ndarray attributes:

In [10]:
A_numpy.ndim

1

In [11]:
A_numpy.shape

(1000000,)

This and much more can be found at the [official Numpy documentation](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.html)!

## Array Construction Routines

In [78]:
# Creates a matrix of 3x3 dimension, with int32 data types, filled with ones
# The default dtype is np.float64
np.ones((3,3), dtype=np.int32)

array([[1, 1, 1],
       [1, 1, 1],
       [1, 1, 1]])

In [60]:
help(np.ones)

Help on function ones in module numpy:

ones(shape, dtype=None, order='C')
    Return a new array of given shape and type, filled with ones.
    
    Parameters
    ----------
    shape : int or sequence of ints
        Shape of the new array, e.g., ``(2, 3)`` or ``2``.
    dtype : data-type, optional
        The desired data-type for the array, e.g., `numpy.int8`.  Default is
        `numpy.float64`.
    order : {'C', 'F'}, optional, default: C
        Whether to store multi-dimensional data in row-major
        (C-style) or column-major (Fortran-style) order in
        memory.
    
    Returns
    -------
    out : ndarray
        Array of ones with the given shape, dtype, and order.
    
    See Also
    --------
    ones_like : Return an array of ones with shape and type of input.
    empty : Return a new uninitialized array.
    zeros : Return a new array setting values to zero.
    full : Return a new array of given shape filled with value.
    
    
    Examples
    --------
   

In [64]:
np.empty((3,3)) # Is much faster to initialize arrays, as it doesn't have to fill in the array with custom values, it just takes the random values from the memory.

array([[0.00000000e+000, 0.00000000e+000, 0.00000000e+000],
       [0.00000000e+000, 0.00000000e+000, 1.13239846e-320],
       [1.42410974e-306, 8.70018274e-313, 2.56761491e-312]])

In [67]:
np.eye(2) # Identity Matrix

array([[1., 0.],
       [0., 1.]])

In [68]:
np.arange(5)

array([0, 1, 2, 3, 4])

In [79]:
np.linspace(1,10,19) # creates an evenly spaced array

array([ 1. ,  1.5,  2. ,  2.5,  3. ,  3.5,  4. ,  4.5,  5. ,  5.5,  6. ,
        6.5,  7. ,  7.5,  8. ,  8.5,  9. ,  9.5, 10. ])

In [77]:
np.arange(1,5,2) # start, stop, step

array([1, 3])

## Array Indexing

In [81]:
a = np.empty((4,4)) # a 4x4 matrix

In [82]:
a[0]

array([6.23042070e-307, 3.56043053e-307, 1.37961641e-306, 6.23039694e-307])

In [84]:
a[0][0]

6.230420704259778e-307

In [90]:
a[0][1:3]

array([3.56043053e-307, 1.37961641e-306])

In [87]:
a[0][:2]

array([6.23042070e-307, 3.56043053e-307])

In [89]:
a[0][2:]

array([1.37961641e-306, 6.23039694e-307])

In [92]:
# So far, it was much alike lists! Let's explore some of the unique features of Numpy
a[0, 0]

6.230420704259778e-307

In [94]:
a[0, -1]

6.230396937285255e-307

In [95]:
a[0,:]

array([6.23042070e-307, 3.56043053e-307, 1.37961641e-306, 6.23039694e-307])

In [96]:
a[:]

array([[6.23042070e-307, 3.56043053e-307, 1.37961641e-306,
        6.23039694e-307],
       [6.23053954e-307, 9.34609790e-307, 8.45593934e-307,
        9.34600963e-307],
       [1.11261774e-306, 6.23037657e-307, 6.23053954e-307,
        8.06638080e-308],
       [8.01106038e-307, 6.89805151e-307, 1.78020169e-306,
        1.42410974e-306]])

## Mathematical Operations, Iterations Over Numpy Arrays and Numpy's Universal Functions (ufuncs)

In [99]:
for value in a[0]:
    print(value)

6.230420704259778e-307
3.5604305343967845e-307
1.379616413496319e-306
6.230396937285255e-307


In [101]:
a[0] + a[1] # vector sum

array([1.24609602e-306, 1.29065284e-306, 2.22521035e-306, 1.55764066e-306])

In [102]:
a[0] - a[1] # vector differnce

array([-1.18833940e-311, -5.78566737e-307,  5.34022479e-307,
       -3.11561269e-307])

#### Exercise: creating a one dimensional array from our matrix 'a':

In [146]:
cells=[]
[[cells.append(cell) for cell in row] for row in a]
cells = np.array(cells)

In [166]:
print(type(cells), type(cells[0]), cells.shape, cells.ndim)

cells

<class 'numpy.ndarray'> <class 'numpy.float64'> (16,) 1


array([6.23042070e-307, 3.56043053e-307, 1.37961641e-306, 6.23039694e-307,
       6.23053954e-307, 9.34609790e-307, 8.45593934e-307, 9.34600963e-307,
       1.11261774e-306, 6.23037657e-307, 6.23053954e-307, 8.06638080e-308,
       8.01106038e-307, 6.89805151e-307, 1.78020169e-306, 1.42410974e-306])

#### Exercise: how to sum up all values from individual cells in the matrix 'a'?

In [165]:
sum(sum(row for row in a))

1.345419564494449e-305

#### Exercise: how to add 1 to each element of a matrix?

In [171]:
my_matrix = np.ones((3,3))

In [172]:
my_matrix

array([[1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.]])

In [176]:
my_matrix = np.add(my_matrix, 1) # this function is not only much smaller, but also much faster, because it also makes advantage of parallelism

In [177]:
my_matrix

array([[2., 2., 2.],
       [2., 2., 2.],
       [2., 2., 2.]])

#### Operation Overloading

In [179]:
my_matrix + 2 

array([[4., 4., 4.],
       [4., 4., 4.],
       [4., 4., 4.]])

#### Numpy's *reduce* operation

The principle of this operation comes from the MapReduce operation, the idea is that it will distribute the operations amongst multiple computing nodes and group up the results in the end!

In [187]:
my_matrix[0][0] = 8
my_matrix[1][0] = 1

In [188]:
my_matrix

array([[8., 2., 2.],
       [1., 2., 2.],
       [2., 2., 2.]])

In [189]:
np.add.reduce(my_matrix, axis=0) # column sums

array([11.,  6.,  6.])

In [190]:
np.add.reduce(my_matrix, axis=1) # row sums

array([12.,  5.,  6.])

In [191]:
np.add.reduce(my_matrix)

array([11.,  6.,  6.])

Add and many other ufuncs can be found at https://numpy.org/doc/stable/reference/ufuncs.html#available-ufuncs