This Notebook was based in the wonderful classes taught by Prof. Sebastian Raschka, which are all made available for free in his [YouTube Channel](https://www.youtube.com/watch?v=I8vRP4GVs_E).

## Why Numpy? Why not staying strict to Python's default data types for multidimensional data? 
#### *(A practical, yet somehow detailed answer)*

The short answer is: because using Numpy is much more efficient than using Python's default data types for manipulating lists of numeric values.    
Numpy is written in [multiple programming languages](https://github.com/numpy/numpy), such as C and C++, and this allows it to be much more faster. 
While in Python's _lists every element acts like a pointer to a random position in memory and it allows users to store multiple different data types within the same variable, storing both the address and the element content in the background, Numpy uses contiguous blocks of memory, allowing our computer to do caching and for performing much faster lookup algorithms in RAM, with the restriction of using a single data type per variable.  
As an example:  
> Let's say you have 8 64-bit integers. Since you know the size that your array would have (512 bits), you would not have to store all of the memory addresses for all the elements in your array, but only the first one's address, and, from it, you would be able to find every other element with an efficient lookup algorithm.  

Apart from the computational advantages, Numpy can also be more elegant and readable for performing vectorized operations and broadcasting.

**Summing it up:** essentialy, you can and should use lists whenever you want to handle different data types within the same variable, when you have a smaller dataset or when you simply don't have a performance requirement. However, in scenarios where you need to manipulate a large amount of data, numeric data, and have a performance requirement, you should go for Numpy- it is the best choice for scientific computing, on most of the cases! 

## Demonstration and performance comparison: Dot Product
> A and B are n-dimensional vectors  
> A's dimension: 1 x n  
> B's dimension: 1 x n  
> C = A . B = A \* B^T = A1 \* B1 + A2 \* B2 + A3 \* B3 + ... + An \* Bn  

In [57]:
import numpy as np

In [71]:
# Dot product using Lists
def dot_product_list(A, B):
    return sum([A[i] * B[i] for i in range(len(A))])

# Dot product using Numpy
def dot_product_numpy(A, B):
    return np.dot(A, B)

In [72]:
# Defining two big floating point lists and its respectives ndarrays
A = [i/5.5 for i in range(1000000)]
B = [i/8.8 for i in range(1000000)]
A_numpy = np.array(A)
B_numpy = np.array(B)

In [73]:
# Measuring the time it takes for the dot product to be executed using Lists
%timeit C = dot_product_list(A, B)

193 ms ± 3.24 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [79]:
# Measuring the time it takes for the dot product to be executed using Numpy
%timeit C_numpy = dot_product_numpy(A_numpy, B_numpy)

686 µs ± 44.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


The execution time speaks for itself! Numpy did much better in performance.

## Exploring some nice Numpy features:

### Custom Data Types:

Numpy allows for custom data types, such as float16 or float32.  
While this is a nice feature for adding more control over your data types, you must be careful because using less bits implies in less precision, and, in Scientific Computing, using higher precision data types, such as float64, is highly encouraged.

In [81]:
type(A_numpy)

numpy.ndarray

In [87]:
type(A_numpy[0])

numpy.float64

In [90]:
A_numpy = A_numpy.astype(np.float16)

In [91]:
type(A_numpy[0])

numpy.float16

### Useful ndarray attributes:

In [94]:
A_numpy.ndim

1

In [95]:
A_numpy.shape

(1000000,)

This and much more can be found at the [official Numpy documentation](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.html)!