## NumPy Tutorial

NumPy, which stands for **Num**erical **Py**thon, is a fundamental package for scientific computing in Python. It provides a high-performance multidimensional array object and tools for working with these arrays. Whether you are dealing with mathematics, engineering, or data science, NumPy is an essential library.

In this session, we'll explore the basics of NumPy, including array creation, array manipulation, and basic operations. By the end of this lab, you'll have a solid understanding of how to work with NumPy arrays and utilize their power in your Python programming projects.

Our goals are to familiarize you with NumPy's array structure, understand how to perform operations on arrays, and introduce some of the key functionalities that NumPy offers, such as mathematical functions, linear algebra, and random number capabilities.

Importing the NumPy library, which is conventionally imported as `np` to make the code more readable.

In [None]:
import numpy as np

### Array Creation

Let's first explore basic functionalities of the NumPy library, specifically focusing on array creation, types, shapes, dimensions, and data type conversions.

**Creating Arrays**: We start by creating three different numpy arrays (`array1`, `array2`, `array3`) using `np.array()`. This demonstrates how the structure of the input list affects the dimensionality of the resulting numpy array.

**Inspecting Array Attributes**: We print the type and shape of these arrays to understand their structure. The `type()` function confirms that each variable is a Numpy array, while the `.shape` attribute shows the dimensions of the array, highlighting the difference between 1D and 2D arrays.

**Dimensionality of Arrays**: By using the `.ndim` attribute, we examine the dimensionality of each array, distinguishing between 1D and 2D arrays in our examples.

In [None]:
array1 = np.array([1,2,3])
print('array1 type:',type(array1))
print('array1 array shape:',array1.shape)

array2 = np.array([[1,2,3],
                   [2,3,4]])
print('array2 type:',type(array2))
print('array2 array shape:',array2.shape)

array3 = np.array([[1,2,3]])
print('array3 type:',type(array3))
print('array3 array shape:',array3.shape)

In [None]:
print('array1: {} dim, array2: {} dim, array3: {} dim'.format(array1.ndim, array2.ndim, array3.ndim))

Numpy automatically assigns data types to arrays based on the elements provided at creation and how we can explicitly cast these data types to suit our needs.
 Let's check how a list of integers is seamlessly converted into a Numpy array, and observe the data type (`dtype`) assigned to it.

In [None]:
list1 = [1, 2, 3]
print(type(list1))
array1 = np.array(list1)
print(type(array1))
print(array1, array1.dtype)

By including a string in a list, we notice how Numpy adjusts the entire array's data type to accommodate for the mixed types, resulting in an array of strings. Similarly, a list with a floating-point number leads Numpy to create an array of floats, demonstrating Numpy's type inference capability.

In [None]:
list2 = [1, 2, 'test']
array2 = np.array(list2)
print(array2, array2.dtype)

list3 = [1, 2, 3.0]
array3 = np.array(list3)
print(array3, array3.dtype)

**Data Type Conversions**: We can convert the data type of array elements using the `.astype()` method. We convert integers to floats and vice versa, highlighting how Numpy allows for flexible data type conversions within arrays.

In [None]:
array_int = np.array([1, 2, 3])
array_float = array_int.astype('float64')
print(array_float, array_float.dtype)

array_int1= array_float.astype('int32')
print(array_int1, array_int1.dtype)

array_float1 = np.array([1.1, 2.1, 3.1]) # np.array([1.7, 2.1, 3.1])
array_int2= array_float1.astype('int32')
print(array_int2, array_int2.dtype)

In this part, we focus on Numpy functions for generating specific types of arrays quickly and efficiently. These functions are incredibly useful for initializing arrays for further data processing or computational tasks.

**Creating Sequence Arrays**: We start with `np.arange(10)`, a function that generates sequence arrays. It's similar to Python's `range` but returns a Numpy array. The output, `sequence_array`, demonstrates how we can easily create a sequence of numbers.

In [None]:
sequence_array = np.arange(10)
print(sequence_array)
print(sequence_array.dtype, sequence_array.shape)

**Arrays of Zeros and Ones**: Next, we explore `np.zeros` and `np.ones`, which create arrays filled with zeros and ones, respectively. These functions are handy when we need an array with a default value as a placeholder before filling it with actual data. By specifying the shape (for example, (3, 2) for a 3x2 array) and the data type (`dtype`), we can tailor these arrays to our needs. While `np.zeros` defaults to creating floating-point arrays, we explicitly set `zero_array` to integer data type (`int32`).

In [None]:
zero_array = np.zeros((3, 2), dtype='int32') # dtype default: float64
print(zero_array)
print(zero_array.dtype, zero_array.shape)

one_array = np.ones((3, 2)) # dtype default: float64
print(one_array)
print(one_array.dtype, one_array.shape)

**Empty and Identity Arrays**: The `np.empty` function creates an array without initializing its values to any particular number. This can be faster than `np.zeros` or `np.ones` for large arrays if you plan to fill the data immediately afterwards. However, it's important to note that it will contain arbitrary values, potentially very large or small, depending on the state of the memory. The `np.identity` and `np.eye` functions generate identity matrices, which are square arrays with ones on the main diagonal and zeros elsewhere. `np.eye` can also create non-square matrices and shift the diagonal with the `k` parameter.

In [None]:
np.empty((3, 5))

In [None]:
np.identity(3)

In [None]:
np.eye(3, k=0)

### Reshaping

This section demonstrates the versatility and power of reshaping arrays in Numpy, a fundamental operation that allows us to alter the dimensions of an array without changing its data.

**Basic Reshaping**: We begin by creating a one-dimensional array with a range of 10 elements. We then reshape this array into two different formats: 2x5 (2 rows, 5 columns) matrix and 5x2 (5 rows, 2 columns) matrix. These operations illustrate how the same data set can be viewed in different shapes, facilitating various computational tasks.

In [None]:
array1 = np.arange(10)
print('array1:\n', array1)

array2 = array1.reshape(2, 5)
print('array2:\n',array2)

array3 = array1.reshape(5, 2)
print('array3:\n',array3)

In [None]:
array1.reshape(4,3)

**Flexible Reshaping with `-1`**: A particularly useful feature of the `reshape` method is the ability to specify one dimension as `-1`, which tells Numpy to automatically calculate the size of this dimension. For `array2`, reshaping with `-1, 5` results in Numpy determining the appropriate number of rows to accommodate a 5-column structure given the data. Similarly, for `array3`, specifying `5, -1` lets Numpy decide on the number of columns for a 5-row structure.

In [None]:
array1 = np.arange(10)
print(array1)

array2 = array1.reshape(-1, 5)
print('array2 shape:',array2.shape)

array3 = array1.reshape(5, -1)
print('array3 shape:',array3.shape)

In [None]:
array1 = np.arange(10)
array4 = array1.reshape(-1, 4)

**Reshaping to Higher Dimensions**: The flexibility of `reshape` extends to transforming a one-dimensional array into a three-dimensional one, as demonstrated with `array3d`. Here, we reshape `array1` into a 2x2x2 structure, showcasing how Numpy can manage data in multiple dimensions, which is crucial for tasks involving matrices or higher-dimensional datasets.

**Flattening Multi-dimensional Arrays**: We see how multi-dimensional arrays can be "flattened" or reshaped back into a one-dimensional array using `-1`. This is illustrated by reshaping `array3d` into `array5`, a process that is invaluable when you need to linearize a multi-dimensional array for certain types of processing or output formatting.

In [None]:
array1 = np.arange(8)
print('array:\n', array1.tolist())

array3d = array1.reshape((2, 2, 2))
print('array3d:\n', array3d.tolist())

# 3-dim ndarray to 2-dim ndarray
array5 = array3d.reshape(-1, 1)
print('array5:\n', array5.tolist())
print('array5 shape:',array5.shape)

### Indexing and Slicing

In this section, we explore various ways to index, slice, and filter Numpy arrays, demonstrating the flexibility and power of Numpy for data manipulation.

**Basic Indexing**: We start with a one-dimensional array, `array1`, created using `np.arange()`. Accessing individual elements through indexing is straightforward; indexes start at 0, so `array1[2]` refers to the third element. We also explore accessing elements from the end using negative indexes.

In [None]:
array1 = np.arange(start=1, stop=10)
print('array1:', array1)

value = array1[2]
print('value:',value)
print(type(value))

In [None]:
print('Last element:', array1[-1], ', Second to last element:', array1[-2])

In [None]:
# print(array1[-10])

**Modifying Elements**: Modifying array elements is as simple as accessing them. For example, changing the first and last elements of `array1` demonstrates how mutable Numpy arrays are.

In [None]:
array1[0] = 9
array1[-1] = 0
print('array1:',array1)

**Multidimensional Array Indexing**: With a 2D array, we illustrate accessing elements with row and column indexes, e.g., `array2d[0, 0]`.

In [None]:
array1d = np.arange(start=1, stop=10)
array2d = array1d.reshape(3, 3)
print(array2d)

print('Value at (row=0, col=0):', array2d[0, 0])
print('Value at (row=0, col=1):', array2d[0, 1])
print('Value at (row=1, col=0):', array2d[1, 0])
print('Value at (row=2, col=2):', array2d[2, 2])

**Slicing**: Slicing in Numpy is similar to slicing in standard Python lists but extends to multiple dimensions. We can select a range of elements and apply slicing to select subarrays.

In [None]:
array1 = np.arange(start=1, stop=10)
print(array1)
array2 = array1[0:3]
print(array2)
print(type(array2))

In [None]:
array1 = np.arange(start=1, stop=10)
array3 = array1[:3]
print(array3)

array4 = array1[3:]
print(array4)

array5 = array1[:]
print(array5)

In [None]:
array1d = np.arange(start=1, stop=10)
array2d = array1d.reshape(3, 3)
print('array2d:\n',array2d)

print('array2d[0:2, 0:2] \n', array2d[0:2, 0:2])
print('array2d[1:3, 0:3] \n', array2d[1:3, 0:3])
print('array2d[1:3, :] \n', array2d[1:3, :])
print('array2d[:, :] \n', array2d[:, :])
print('array2d[:2, 1:] \n', array2d[:2, 1:])
print('array2d[:2, 0] \n', array2d[:2, 0])
print('array2d[:5, 0] \n', array2d[:5, 0])

In [None]:
print(array2d[0])
print(array2d[1])
print('array2d[0] shape:', array2d[0].shape, 'array2d[1] shape:', array2d[1].shape)

In [None]:
array1d = np.arange(start=1, stop=10)
array2d = array1d.reshape(3, 3)
print(array2d)

array1 = array2d[[0,2]]
print('array2d[[0,2]] => ',array1.tolist())

array2string = array2d[[0,1], 2]
print('array2d[[0,1], 2] => ',array2.tolist())

array3 = array2d[[0,1], 0:2]
print('array2d[[0,1], 0:2] => ',array3.tolist())

**Boolean Indexing**: A powerful feature of Numpy is boolean indexing, which allows us to filter elements based on conditions. For example, `array1d > 5` creates a boolean array, and `array1d[array1d > 5]` selects elements greater than 5.

In [None]:
array1d = np.arange(start=1, stop=10)
print(array1d)

condition = array1d > 5
print(condition)

array1 = array1d[condition]
print(array1)

In [None]:
boolean_indexes = np.array([False, False, False, False, False, True, True, True, True])
array2 = array1d[boolean_indexes]
print(array2)

### Sorting

Here, we demonstrate various ways to sort arrays using Numpy, highlighting the difference between sorting methods and their effects on the original array, sorting in descending order, and sorting multi-dimensional arrays along different axes. It also shows how to use sorting indices to reorder other related arrays.

**Sorting with `np.sort()`**: This function returns a sorted copy of the provided array without altering the original array. The example shows sorting `org_array` in ascending order, confirming that the original array remains unchanged after sorting.

**In-place Sorting with `.sort()` Method**: Unlike `np.sort()`, the `.sort()` method sorts an array in-place, modifying the original array directly. This is illustrated by sorting `org_array`, which alters the original array itself, and no separate sorted array is returned.

In [None]:
org_array = np.array([3, 1, 9, 5])
print('Original:', org_array)

# np.sort(): Return a sorted copy of an array
sort_array1 = np.sort(org_array)
print('After calling np.sort(), array returned:', sort_array1)
print('After calling np.sort(), original array:', org_array)

# ndarray.sort(): Sort an array in-place
sort_array2 = org_array.sort()
print('After calling org_array.sort(), array returned:', sort_array2)
print('After calling org_array.sort(), original array:', org_array)

**Sorting in Descending Order**: To sort an array in descending order, the sorted array (using `np.sort()`) is reversed using the `[::-1]` slicing technique.

In [None]:
sort_array1_desc = np.sort(org_array)[::-1]
print(sort_array1_desc)

# print(np.array([[3, 1], [9, 5]][::-1]))

**Sorting Multi-dimensional Arrays**: The example demonstrates sorting a 2-dimensional array along both rows (`axis=0`) and columns (`axis=1`), showing how the `axis` parameter controls the direction of sorting. For example, sorting the matrix by `axis=0` will compare and sort the values in each column independently.

In [None]:
array2d = np.array([[8, 12], [7, 1]])
print(array2d)

sort_array2d_axis0 = np.sort(array2d, axis=0)
print('axis=0:\n', sort_array2d_axis0)

sort_array2d_axis1 = np.sort(array2d, axis=1)
print('axis=1:\n', sort_array2d_axis1)

**Using `np.argsort()` for Indices**: `np.argsort()` returns the indices that would sort an array. This is used to sort `org_array` both in ascending and descending order. The sorted indices are then applied to reorder another array (`name_array`), demonstrating how to align the elements of two related arrays based on the sorted order of one.

In [None]:
org_array = np.array([3, 1, 9, 5])
sort_indices = np.argsort(org_array) # Returns the indices that would sort an array
print(type(sort_indices))
print('Indices for the ascending sort:', sort_indices)

In [None]:
org_array = np.array([ 3, 1, 9, 5])
sort_indices_desc = np.argsort(org_array)[::-1]
print('Indices for the descending sort:', sort_indices_desc)

**Example**: Sorting scores (`score_array`) in ascending order and using the sorted indices to rearrange the names (`name_array`) accordingly showcases a practical application of sorting and indexing in data analysis, where related data structures must be aligned according to sorted values.

In [None]:
name_array = np.array(['John', 'Mike', 'Sarah', 'Kate', 'Samuel'])
score_array= np.array([78, 95, 84, 98, 88])

sort_indices_asc = np.argsort(score_array)
print('Indices for the ascending sort:', sort_indices_asc)
print('Sorted array using the indices:', name_array[sort_indices_asc])

### Linear Algebra

Scalar-vector multiplication

In [None]:
x = np.array([3, 4])
y = 5 * x
print(y)

Linear combination

In [None]:
x1 = np.array([1, 2, 3])
x2 = np.array([0, 3, 1])
x3 = np.array([0, 2, 1])
y = 2*x1 + 1*x2 - 2*x3
print(y)

Inner product

In [None]:
a = np.array([3, 4])
b = np.array([5, 6])
np.dot(a, b)

Norm

In [None]:
v = np.array([3, 4])
np.linalg.norm(v)

Unit vector

In [None]:
u = v / np.linalg.norm(v)
print(u)
print(np.linalg.norm(u))

Orthogonal projection

In [None]:
y = np.array([2, 4])
x = np.array([4, 3])
u_x = x / np.linalg.norm(x)
y_x = np.dot(y,u_x) * u_x
print(y_x)

Matrix multiplication

In [None]:
A = np.array([[1, 2, 3],
              [4, 5, 6]])
B = np.array([[7, 8],
              [9, 10],
              [11, 12]])

print(A)
print(B)
dot_product = np.dot(A, B)
print(dot_product)

Transpose

In [None]:
A = np.array([[1, 2, 3],
              [4, 5, 6]])
transpose_mat = np.transpose(A)
print(transpose_mat)

Trace

In [None]:
A = np.array([[1, 2], [3, 4]])
trace_A = np.trace(A)
print(trace_A)

Determinant

In [None]:
det_A = np.linalg.det(A)
print(det_A)

Eigenvalues and Eigenvectors

In [None]:
eigenvalues, eigenvectors = np.linalg.eig(A)
print("Eigenvalues of A:", eigenvalues)
print("Eigenvectors of A:\n", eigenvectors)

### [Random Sampling](https://numpy.org/doc/stable/reference/random/index.html)

`np.random.rand()` creates an array of the given shape and populate it with random samples from a "uniform" distribution over `[0, 1)`.

In [None]:
np.random.rand()

In [None]:
np.random.rand(2, 3)

`np.random.randint` returns random integers from low (inclusive) to high (exclusive) from the “discrete uniform” distribution.

In [None]:
np.random.randint(10)

In [None]:
 np.random.randint(low=1, high=10, size=(2, 2))

`np.random.seed()` sets the seed for NumPy's random number generator, ensuring reproducible results. For example, np.random.seed(10) initializes the generator in a fixed state.

In [None]:
np.random.seed(10)

np.random.randint(10, size=10)

Generate the numbers from various distributions.

In [None]:
print(np.random.randn())
print(np.random.uniform(low=0.0, high=1.0, size=None))
print(np.random.binomial(n=10, p=0.5, size=None))
print(np.random.normal(loc=0.0, scale=1.0, size=None)) # loc: mean, scale, standard deviation
print(np.random.chisquare(df=1.0, size=None))
print(np.random.beta(a=5, b=10, size=None))
print(np.random.gamma(shape=1, scale=1.0, size=None))

- `np.random.permutation()` randomly permutes a sequence or returns a permuted range. If you pass an integer `n`, it will create a permutation of the range `[0, n)`. If you pass an array, it will create a copy of the array with its elements shuffled.

- `np.random.shuffle()` modifies the sequence **in-place** by shuffling its contents. This means that the original array or list is changed.


In [None]:
array = np.array([1, 2, 3, 4, 5])
permuted_array = np.random.permutation(array)

print("Original array:", array)
print("Permuted array:", permuted_array)

In [None]:
np.random.shuffle(array)
print("Shuffled array:", array)

### Superiority of NumPy

 Let's compare a mathematical operation using both a Python list and a NumPy array. We'll calculate the square of each element in a large list and a large NumPy array to see the difference in execution time.

**Using Python List**

In [None]:
import time

large_list = list(range(1, 1000001))

start_time_list = time.time()
squared_list = [x**2 for x in large_list]
end_time_list = time.time()

print(f"Time taken using list: {end_time_list - start_time_list} seconds")

**Using NumPy Array**

In [None]:
large_array = np.arange(1, 1000001)

start_time_array = time.time()
squared_array = large_array**2
end_time_array = time.time()

print(f"Time taken using NumPy array: {end_time_array - start_time_array} seconds")