# 1. Introduction to NumPy (LAYG - Learn-As-You-Go)

**References:**
* http://www.numpy.org/
* https://docs.scipy.org/doc/numpy/user/quickstart.html
* https://www.datacamp.com/community/tutorials/python-numpy-tutorial
* https://blog.thedataincubator.com/2018/02/numpy-and-pandas/
* https://medium.com/@ericvanrees/pandas-series-objects-and-numpy-arrays-15dfe05919d7
* https://www.machinelearningplus.com/python/numpy-tutorial-part1-array-python-examples/
* https://towardsdatascience.com/a-hitchhiker-guide-to-python-numpy-arrays-9358de570121
* McKinney, Wes. Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython. O'Reilly Media. Kindle Edition

## 1.1 What is NumPy?
* NumPy is short for "Numerical Python" and it is a fundamental python package for scientific computing.
* It uses a high-performance data structure known as the **n-dimensional array** or **ndarray**, a multi-dimensional array object, for efficient computation of arrays and matrices.

## 1.2 What is an Array?
* Python arrays are data structures that store data similar to a list, except the type of objects stored in them is constrained.
* Elements of an array are all of the same type and indexed by a tuple of positive integers.
* The python module array allows you to specify the type of array at object creation time by using a type code, which is a single character. You can read more about each type code here: https://docs.python.org/3/library/array.html?highlight=array#module-array 

In [1]:
import array

In [2]:
array_one = array.array('i',[1,2,3,4])
print(type(array_one))
array_one

<class 'array.array'>


array('i', [1, 2, 3, 4])

## 1.3 What is a NumPy N-Dimensional Array (ndarray)?
* It is an efficient multidimensional array providing fast array-oriented arithmetic operations.
* An ndarray as any other array, it is a container for homogeneous data (Elements of the same type)
* In NumPy, data in an ndarray is simply referred to as an array.
* As with other container objects in Python, the contents of an ndarray can be accessed and modified by indexing or slicing operations.
* For numerical data, NumPy arrays are more efficient for storing and manipulating data than the other built-in Python data structures. 

In [3]:
array_one[0]

1

## 1.5 Creating NumPy arrays

NumPy offers several built-in functions for creating arrays

It's idiomatic to import NumPy with the statement

```python
import numpy as np
```

and use the abbreviated `np` namespace.

```python
import numpy as np
x = np.array([2,3,11])
x = np.array([[1,2.],[0,0],[1+1j,2.]])
x = np.arange(-10,10,2, dtype=float)
x = np.zeros((2,4))
x = np.ones((3,4))
x = np.linspace(1.,4.,6)
x = np.fromfile('foo.dat')
```

NumPy arrays can be created from regular Python lists of floats, integers, strings, etc. and the types will be infered.  However, it's also possible (and not a bad idea to enhance readability/clarity) to specify the data-type explicitly using the optional keyword `dtype`.  There are several other ways to create arrays from `arange`, `linspace`, etc.  

Don't forget we can use the introspection features of IPython to show the documentation signature for NumPy functions, e.g.

In [4]:
import numpy as np
x = np.ones((10,1))
x

array([[1.],
       [1.],
       [1.],
       [1.],
       [1.],
       [1.],
       [1.],
       [1.],
       [1.],
       [1.]])

We can create a NumPy array from a list

In [5]:
import numpy as np
np.__version__

'1.24.3'

In [6]:
list_one = [1,2,3,4,5,6,7,8,9,20]
list_one

[1, 2, 3, 4, 5, 6, 7, 8, 9, 20]

In [7]:
numpy_array = np.array(list_one)
type(numpy_array)

numpy.ndarray

In [8]:
numpy_array

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 20])

Or using any other method described before:

In [9]:
another_numpy_array = np.arange(1,20)
#type(numpy_array)
another_numpy_array

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
       18, 19])

In [10]:
array_of_zeros = np.zeros((5,5))
array_of_zeros

array([[0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.]])

In [11]:
array_of_ones = np.ones((6,2))
array_of_ones

array([[1., 1.],
       [1., 1.],
       [1., 1.],
       [1., 1.],
       [1., 1.],
       [1., 1.]])

## 1.4 Advantages of NumPy Arrays

### 1.4.1 Blazing fast

If you've heard that Python is slow for numerical computations, it's primarily due to "duck typing".  Duck typing is a colloquial phrase that refers to the fact that in Python, we don't declare variable types (e.g. string, float, int, etc.) when we declare variables.  They are *inferred* at runtime.  In summary,

**Duck Typing:**

   * If it looks like a duck, then it is a duck.
   * a.k.a. dynamic typing
   * Dynamic typing requires lots of metadata around a variable.

This causes runtime overhead leading to poor performance in numerical computations.

**Solution**: NumPy data structures

   * Data structures, as objects, that have a single type and contiguous storage.
   * Common functionality with implementation in C.

Contiguous storage means that the array data is stored in a continuous "chunk" in memory, i.e. the elements of an array are next to each other in memory in the order they appear in the array.  This adds performance by avoiding "[cache misses](https://www.quora.com/What-is-a-cache-miss)" and other low-level performance issues.

Most NumPy operations can be expected to perform at a level very-close to what you would expect from compiled C code.

The fact that NumPy arrays are objects makes them slightly different that arrays in C code.  NumPy arrays have attributes that can be changed and queried, e.g. the shape or data type of an array.

#### 1.4.1.1 How slow is Python?

To demonstrate the efficacy of NumPy, we'll perform the same operation, i.e. adding 1 to an array containing a million numbers, using a Python list comprehension and then using a NumPy array.

We use the `%timeit` magic function from [IPython](https://ipython.org/) which runs the command that follows it a number of times and reports some statistics on how long it takes the function to execute.  First, the Python list comprehension

In [12]:
%timeit [i+1 for i in range(10000000)]

638 ms ± 25 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


And now the NumPy equivalent

Here the `+ 1` is *broadcast* across the array, i.e. each element has `1` added to it.

In [13]:
import numpy as np
%timeit np.arange(10000000) + 1

25.7 ms ± 2.27 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


Here we see that adding 1 to a million numbers in NumPy is significantly faster than using a Python list comprehension (which itself is much faster than a `for` loop would be in pure Python).

### 1.4.2 Memory.
* NumPy internally stores data in a contiguous block of memory, independent of other built-in Python objects.
* NumPy arrays takes significantly less amount of memory as compared to python lists.

In [14]:
import numpy as np
import sys

In [15]:
python_list = [1,2]

# Return the size of an object in bytes * size of our list
python_list_size = sys.getsizeof(10) * len(python_list)
python_list_size

56

In [16]:
python_numpy_array = np.array([1,2,3,4,5,6])

# Length of one array element in bytes * size of our NumPy array
python_numpy_array_size = python_numpy_array.itemsize * python_numpy_array.size
python_numpy_array_size

24

### 1.4.3 Vectorized Operations
* The key difference between an array and a list is, arrays are designed to handle vectorized operations while a python list is not.
* NumPy operations perform complex computations on entire arrays without the need for Python for loops.
* In other words, if you apply a function to an array, it is performed on every item in the array, rather than on the whole array object.
* In a python list, you will have to perform a loop over the elements of the list.

In [17]:
list_two = [2023,2023,2023,2023,2023]
# The following will throw an error:
list_two + 1

TypeError: can only concatenate list (not "int") to list

* Performing a loop to add **2** to every integer in the list

In [18]:
for index, item in enumerate(list_two):
    list_two[index] = item + 1
list_two

[2024, 2024, 2024, 2024, 2024]

* With a NumPy array, you can do the same simply by doing the following:

In [19]:
python_numpy_array = np.array([2023,2023])
python_numpy_array

array([2023, 2023])

In [20]:
python_numpy_array + 1

array([2024, 2024])

* Any arithmetic operations between equal-size arrays applies the operation element-wise:
* Arithmetic operators (`+`, `-`, `/`, `*`, `**`) are overloaded to work in an element-by-element fashion.

In [21]:
numpy_array_one = np.array([1,2])
numpy_array_two = np.array([4,6])

In [22]:
numpy_array_one + numpy_array_two

array([5, 8])

In [23]:
numpy_array_one > numpy_array_two

array([False, False])

Another speed example comparison:

In [24]:
import math
%timeit [math.sin(i) ** 2 for i in range(1000000)]

215 ms ± 4.39 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [25]:
%timeit np.sin(np.arange(1000000)) ** 2

9.59 ms ± 359 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


## 1.5 Array Functions

NumPy has many built-in functions for slicing, getting info, etc.

In [26]:
import numpy as np
x = np.arange(1,17).reshape(4,4)
x

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12],
       [13, 14, 15, 16]])

Here we create a one-dimensional array with  the `np.arange()` function that containts the integer numbers 0-9.  Then we reshape it to be a 3 x 3 array.  Keep in mind that the memory is still continguous and the reshape operation merely offers a different 'view' of the data.  It also allows us to use multiple indexes and special notation to retrieve parts of the data, e.g.

In [27]:
x[2,2]

11

An explanation of the notation used in the operation above: the comma (`,`) separates dimensions of the array in the indexing operation.  The colon (`:`) represents *all* of that dimension.  So is this operation, the colon signifies getting all *rows*, i.e. the $0^\mbox{th}$ dimension of the array.  Along the second dimension, i.e. the columns, the `0` represents the $0^\mbox{th}$ indexed column.  

Getting parts of array with this type of notation is called *slicing* in NumPy.

In [28]:
x.shape

(4, 4)

`shape` is an attribute of the `x` NumPy array object.  We can see that it has been changed from what would have been originally `(9,)` before the `reshape()` class method was called.

## 1.5 Basic Indexing and Slicing 

### 1.5.1 One Dimensional Array
* When it comes down to slicing and indexing, one-dimensional arrays are the same as python lists

In [29]:
import numpy as np
numpy_array = np.arange(5)
numpy_array

array([0, 1, 2, 3, 4])

In [30]:
numpy_array[2]

2

In [31]:
numpy_array[1:4]

array([1, 2, 3])

* You can slice the array and pass it to a variable. Remember that variables just reference objects.
* Any change that you make to the array slice, it will be technnically done on the original array object. Once again, variables just reference objects.

In [None]:
numpy_array_slice = numpy_array[1:4]
numpy_array_slice

In [None]:
numpy_array_slice[1] = 10
numpy_array_slice

In [None]:
numpy_array

### 1.5.2 Two-Dimensional Array
* In a two-dimensional array, elements of the array are one-dimensional arrays 

In [None]:
numpy_two_dimensional_array = np.array([[1,2,3],[4,5,6],[7,8,9]])

In [None]:
numpy_two_dimensional_array

In [None]:
numpy_two_dimensional_array[1]

* Instead of looping to the one-dimensional arrays to access specific elements, you can just pass a second index value

In [None]:
numpy_two_dimensional_array[1][2]

In [None]:
numpy_two_dimensional_array[1,2]

* Slicing two-dimensional arrays is a little different than one-dimensional ones.

In [None]:
numpy_two_dimensional_array

In [None]:
numpy_two_dimensional_array[:1]

In [None]:
numpy_two_dimensional_array[:2]

In [None]:
numpy_two_dimensional_array[:3]

In [None]:
numpy_two_dimensional_array[:2,1:]

In [None]:
numpy_two_dimensional_array[:2,:1]

In [None]:
numpy_two_dimensional_array[2][1:]