In [1]:
import numpy as np # np is a commonly-used alias for the numpy library

## Introduction to NumPy

NumPy is a Python library that provides support for large, **multi-dimensional arrays and matrices**, along with a collection of mathematical functions to operate on these arrays. NumPy is your first choice for linear algebra and any kind of scientific computing in Python.

<center>
<img src="imgs/numpy.png" width=400>
</center>

### Why use NumPy when we have python lists?
NumPy arrays are faster and more memory efficient than Python lists. NumPy arrays are also homogeneous: all elements of a NumPy array have the same data type. This allows NumPy to perform operations on arrays much faster than Python lists. In machine learning and data science, where you often work with large datasets, you will quickly find yourself unable to compute anything in reasonable timeframes using only native Python data structures.

### Let's compare the performance of NumPy arrays and Python lists on a simple task:

In [2]:
from math import sqrt, cos 
import time 

# With Python lists
start_time = time.time() # Record the start time

# Create a list of integers up to 1 000 000
data = range(1000000)

# Perform some operation on the dataset (first take the square root, then find the cosine)
list_results = []
for x in data:
    list_results.append(cos(sqrt(x)))

time_elapsed_python = time.time() - start_time # Calculate the time elapsed

print("First 5 elements of the results:")
print(list_results[:5])
print("Time elapsed using Python lists:", round(time_elapsed_python*1000), "miliseconds")

First 5 elements of the results:
[1.0, 0.5403023058681398, 0.15594369476537437, -0.16055653857469052, -0.4161468365471424]
Time elapsed using Python lists: 614 miliseconds


In [3]:
# With NumPy arrays

start_time = time.time() # Record the start time

# Create an array of integers up to 1 000 000
data = np.arange(1000000)

# Perform some operation on the dataset (first take the square root, then find the cosine)
numpy_results = np.cos(np.sqrt(data))

time_elapsed_numpy = time.time() - start_time # Calculate the time elapsed

print("First 5 elements of the results:")
print(numpy_results[:5])
print("Time elapsed using NumPy arrays:", round(time_elapsed_numpy*1000), "miliseconds")
print("NumPy was", round(time_elapsed_python/time_elapsed_numpy, 2), "times faster than Python lists in this task")

First 5 elements of the results:
[ 1.          0.54030231  0.15594369 -0.16055654 -0.41614684]
Time elapsed using NumPy arrays: 104 miliseconds
NumPy was 5.88 times faster than Python lists in this task


### Basics of NumPy arrays

A one-dimensional array will often be called a **vector**.

NumPy arrays are created using the `np.array` function. This function takes an array (such as a Python list) and returns a superfast NumPy array. 

The elements of a NumPy array are accessed using square brackets and indexed from zero, just like Python lists.

In [4]:
some_numbers = [0.1, 0.2, 0.4, 0.2, 0.8, 1.2, 0.7, 0.1, 0.9, 0.3]
vector = np.array(some_numbers)

print("First element of the array:", vector[0])
print("Third element of the array:", vector[2])
print("Last element of the array:", vector[-1])
print("Second to fourth elements of the array:", vector[1:4])

First element of the array: 0.1
Third element of the array: 0.4
Last element of the array: 0.3
Second to fourth elements of the array: [0.2 0.4 0.2]


In [5]:
# You can create an array of zeros, ones, random numbers, or a range of numbers using NumPy functions. Those are faster that passing a list to np.array.

zero_array = np.zeros(5)
print("Zero array:", zero_array)

ones_array = np.ones(5)
print("Ones array:", ones_array)

random_array = np.random.rand(5)
print("Random array:", random_array)

range_array = np.arange(0, 18, 3) # Start at 0, stop before 18, step by 3
print("Range array:", range_array)

Zero array: [0. 0. 0. 0. 0.]
Ones array: [1. 1. 1. 1. 1.]
Random array: [0.52563768 0.6186933  0.08518754 0.93573703 0.61328551]
Range array: [ 0  3  6  9 12 15]


In [8]:
# Calculations with NumPy arrays are vectorized, meaning that you can perform operations on entire arrays at once.
# Let's create two arrays and add them together.

first_array = np.array([1, 1, 2, 3, 5, 8, 13, 21, 34, 55])
second_array = np.array([2, 3, 5, 7, 11, 13, 17, 19, 23, 29])

sum_array = first_array + second_array
print("Sum of the arrays:", sum_array)

# You can also multiply, subtract, divide, and apply other mathematical operations to arrays.

product_array = first_array * second_array
print("Product of the arrays:", product_array)

# Note that the above operations do not work on native Python lists. What would happen if you tried to add two Python lists together? 
#In what way is this different from adding two NumPy arrays?

Sum of the arrays: [ 3  4  7 10 16 21 30 40 57 84]
Product of the arrays: [   2    3   10   21   55  104  221  399  782 1595]


In [9]:
# You can also apply mathematical functions to entire arrays at once.

squared_array = first_array ** 2
print("Squared array:", squared_array)

# You can also apply functions that are not part of the standard Python math library, such as the sigmoid function, 
#which is often used in machine learning as an activation function. 
#It flattens the input values to the range (0, 1), which is useful for binary classification tasks. 
#We will implement the sigmoid function and apply it to an array.

def sigmoid(x):
    return 1 / (1 + np.exp(-x))

sigmoid_array = sigmoid(first_array)
print("Sigmoid array:", sigmoid_array)

Squared array: [   1    1    4    9   25   64  169  441 1156 3025]
Sigmoid array: [0.73105858 0.73105858 0.88079708 0.95257413 0.99330715 0.99966465
 0.99999774 1.         1.         1.        ]


### Multi-dimensional arrays

A two-dimensional array will often be called a **matrix**. You can create a matrix by passing a list of lists to `np.array`. The outer list represents the **rows of the matrix**, and the inner lists represent the **elements in each row**.

In [10]:
matrix = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print(matrix)

# You can access elements in a matrix by providing the row and column indices in square brackets.

print("First element of the first row:", matrix[0, 0]) 
print("Second element of the third row:", matrix[2, 1])

# You can also access entire rows or columns using the `:` symbol.

print("First row:", matrix[0, :])
print("Second column:", matrix[:, 1])

[[1 2 3]
 [4 5 6]
 [7 8 9]]
First element of the first row: 1
Second element of the third row: 8
First row: [1 2 3]
Second column: [2 5 8]


### Shape of an array

Trying to visualise higher-dimensional arrays can be a bit tricky. A 3-dimensional array can be thought of as a cube of numbers, a 4-dimensional array as a cube of cubes of numbers, and so on. A generalisation of a matrix to more than two dimensions is called a **tensor**, and you will soon encounter them in the context of deep learning. A **vector** is a tensor of **rank 1**, a **matrix** is a tensor of **rank 2**, and so on.

Instead of trying to wrap one's head around multidimensional hypercubes, it's often easier to think of tensors in terms of their **shape**. The shape of an array tells you how many elements there are in each dimension. You can access the shape of an array using the `shape` attribute.

So, for example, a vector of 10 numbers has shape `(10)` a 3x4 matrix has shape `(3, 4)`, a 2x2x2 cube of numbers has shape `(2, 2, 2)`.

In [11]:
print("Shape of the matrix:", matrix.shape)

Shape of the matrix: (3, 3)


### Concatenation (joining) of arrays

Let's imagine we have matrix A of shape (3, 3), and we want to add a new row to it. We can do this by creating a new matrix B and using the `np.concatenate` function to join the two matrices. The `axis` parameter specifies the axis along which the arrays will be joined. If `axis=0`, the arrays will be joined along rows, and if `axis=1`, the arrays will be joined along columns. 

Note that, when adding new rows to a matrix, the number of columns must be the same in order to retain a rectangular shape of a matrix. Similarly, when adding new columns, the number of rows must match. Remember that the dimensions of the arrays you are joining must **match at all but the joining axis**. So we can join an array of shape (3, 3, 2) with an array of shape (1, 3, 2) along axis 0, but not along axis 1.

In [12]:
matrix = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print("Shape of the matrix A:", matrix.shape, '\n')

new_row = np.array([[10, 11, 12]])
print("Shape of the new row:", new_row.shape)
print("matches along axis 1 (columns), so we can join them along axis 0 (by rows)\n")

new_matrix = np.concatenate([matrix, new_row], axis=0)
print("New matrix:")
print(new_matrix)
print("Shape of the new matrix:", new_matrix.shape)

Shape of the matrix A: (3, 3) 

Shape of the new row: (1, 3)
matches along axis 1 (columns), so we can join them along axis 0 (by rows)

New matrix:
[[ 1  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]]
Shape of the new matrix: (4, 3)
