## Duration:

## Difficulty:

## Prerequisites:

<a name='0'></a>
## Table of Content:

- [1. Introduction to Numpy](#1)
    - [1.1. What is Numpy and why it is essential for machine learning?](#1-1)
    - [1.2. Installing Numpy and setting up the development environment](#1-2)
- [2. Numpy Arrays](#2)
    - [2.1. Creating Numpy arrays using np.array()](#2-1)
    - [2.2. Understanding array shapes and dimensions](#2-2)
    - [2.3. Analogy between Numpy arrays and mathematical data structures](#2-3)
    - [2.4. Accessing and modifying array elements](#2-4)
    - [2.5. Basic array operations (element-wise operations, broadcasting)](#2-5)
- [3. Array Initialisation and Attributes](#3)
    - [3.1. Creating arrays with specific values (np.zeros(), np.ones(), np.full(), etc.)](#3-1)
    - [3.2. Generating arrays with a range of values (np.arange(), np.linspace(), etc.)](#3-2)
    - [3.3. Array attributes (shape, size, data type, etc.)](#3-3)
- [4. Array Operations](#4)
    - [4.1. Mathematical operations with arrays (+, -, *, /, etc.)](#4-1)
    - [4.2. Array aggregation functions (np.sum(), np.mean(), np.min(), np.max(), etc.)](#4-2)
    - [4.3. Array transformations (np.reshape(), np.transpose(), etc.)](#4-3)
- [5. Linear Algebra with Numpy](#5)
    - [5.1. Matrix operations (np.dot(), np.matmul(), np.linalg.inv(), etc.)](#5-1)
    - [5.2. Eigenvalues and eigenvectors (np.linalg.eig())](#5-2)
    - [5.3. Solving linear equations (np.linalg.solve())](#5-3)
- [6. Advanced Numpy Techniques](#6)
    - [6.1. Broadcasting (Advanced examples)](#6-1)
    - [6.2. Vectorisation and performance optimisation](#6-2)
    - [6.3. Memory optimisation techniques (views, copies, etc.)](#6-3)
- [7. Numpy and Machine Learning](#7)
    - [7.1. Loading data into Numpy arrays](#7-1)
    - [7.2. Preprocessing data with Numpy for machine learning tasks](#7-2)
    
    
    








    
    

- Add a note that this tutorial is for educational purposes and it's not comprehensive or not replace the official documentation in anyway...
- Add duration
- Add level/difficulty

<a name='1'></a>
# 1. Introduction to Numpy

<a name='1-1'></a>
## 1.1. What is Numpy and why it is essential for machine learning?

`Numpy` is a powerful Python library that stands for _'Numerical Python'_. It provides support for large, multi-dimensional arrays and matrices, along with a vast collection of mathematical functions to operate on these arrays efficiently. 

`Numpy` is essential for machine learning because it enables fast and efficient numerical computations, making it possible to work with large datasets and perform complex mathematical operations with ease. Its ability to handle arrays and matrices efficiently allows for concise and optimised code implementations of various machine learning algorithms.

<a name='1-2'></a>
## 1.2. Installing Numpy and setting up the development environment

To get started with `Numpy`, we first need to install the library and set up our development environment. Follow the steps below to install `Numpy` and set up a Jupyter notebook:

### Step 1: Install Numpy

1. Open your command prompt or terminal (e.g., Press Command + Space Bar on your Mac keyboard and Type in “Terminal").
2. If you have Python installed, you can install `Numpy` by running the following command:
`pip install numpy`. This command will download and install `Numpy` from the Python Package Index (PyPI).

### Step 2: Set up Jupyter Notebook

Jupyter Notebook is an interactive coding environment that allows you to create and share documents containing code, visualizations, and explanatory text. Here's how to set it up:

1. Install Jupyter Notebook by running the following command in your command prompt or terminal: `pip install jupyter`
2. Launch Jupyter Notebook by typing the following command and pressing Enter: `jupyter notebook`. This will open Jupyter Notebook in your default web browser.
3. In the Jupyter Notebook interface, click on "New" and select "Python 3" to create a new Python notebook.

### Step 3: Importing Numpy

In your Jupyter notebook, you need to import the `Numpy` library before using its functions and features. To import Numpy, add the following line of code at the beginning of your notebook:

**Note:** By convention, `Numpy` is often imported with the alias `np`, which allows us to use the shorthand notation `np` when referring to `Numpy` functions throughout our notebook.

That's it! You have successfully installed `Numpy` and set up your Jupyter notebook environment. Now, you're ready to dive into the basics of `Numpy` and explore its powerful capabilities for machine learning.

<a name='2'></a>
# 2. Numpy Arrays

`Numpy` arrays are the foundation of the `Numpy` library and provide a powerful way to store and manipulate data. In this section, we will explore various aspects of `Numpy` arrays, starting with creating arrays using the `np.array()` function.

<a name='2-1'></a>
### 2.1. Creating Numpy arrays using np.array()

To create a `Numpy` array, you can use the `np.array()` function and provide a Python list or tuple as an argument. The `np.array()` function converts the input into a Numpy array. Here's an example:

In [10]:
# Create a Numpy array from a Python list
my_list = [1, 2, 3, 4, 5]
my_array = np.array(my_list)

print(my_array)

[1 2 3 4 5]


In the example above, we created a `Numpy` array `my_array` from a Python list `my_list` using the `np.array()` function. The resulting `Numpy` array contains the same elements as the original list.

`Numpy` arrays are homogeneous, meaning they can only contain elements of the same data type. If the elements in the input list have different data types, Numpy will attempt to convert them to a common data type. For example:

In [11]:
my_list = [1, 2.5, "hello", True]
my_array = np.array(my_list)

print(my_array)

['1' '2.5' 'hello' 'True']


In this case, the elements in the input list have different data types (integer, float, string, and a boolean). `Numpy` converts all the elements to strings, resulting in a `Numpy` array of strings.

It's important to note that `Numpy` arrays are fixed in size once created. If you try to append or remove elements from a Numpy array, a new array will be created with the updated elements. Therefore, Numpy arrays are not designed to be dynamically resizable like Python lists.

That's it for creating `Numpy` arrays using the `np.array()` function.

<a name='2-2'></a>
### 2.2. Understanding array shapes and dimensions

`Numpy` arrays can have different shapes and dimensions, which define the structure and size of the array. In this subsection, we will explore how to determine the shape and dimensions of a `Numpy` array.

#### Shape of an Array

The shape of a `Numpy` array refers to the number of elements along each dimension of the array. You can access the shape of an array using the `shape` attribute. Here's an example:

In [12]:
my_array = np.array([[1, 2, 3], [4, 5, 6]])
print(my_array.shape)

(2, 3)


In this example, my_array is a **2-dimensional** array with 2 rows and 3 columns. The shape attribute returns a tuple `(2, 3)` indicating the shape of the array.

If you have a **1-dimensional** array, the shape will be a single number representing the size of the array. For example:

In [13]:
my_array = np.array([1, 2, 3, 4, 5])
print(my_array.shape)

(5,)


In this case, `my_array` is a **1-dimensional** array with 5 elements. The `shape` attribute returns a tuple `(5,)` indicating the shape of the array. 

Here's an example of a **3-dimensional** array:

In [15]:
my_array = np.array([ [[1, 2, 3], [4, 5, 6], [7, 8, 9]], [[1, 2, 3], [4, 5, 6], [7, 8, 9]], [[1, 2, 3], [4, 5, 6], [7, 8, 9]] ])
print(my_array.shape)

(3, 3, 3)


#### Dimensions of an Array

The dimensions of a `Numpy` array refer to the number of axes or dimensions it has. You can determine the number of dimensions using the `ndim` attribute. Here's an example:

In [16]:
my_array = np.array([[1, 2, 3], [4, 5, 6]])
print(my_array.ndim)

2


In this example, `my_array` is a **2-dimensional** array, so the `ndim` attribute returns `2`.

If you have a **1-dimensional** array, the `ndim` attribute will return `1`. For example:

In [17]:
my_array = np.array([1, 2, 3, 4, 5])
print(my_array.ndim)

1


Understanding the shape and dimensions of a `Numpy` array is crucial when working with multi-dimensional arrays, as it helps you correctly access and manipulate the array elements.

<a name='2-3'></a>
### 2.3. Analogy between Numpy arrays and mathematical data structures

`Numpy` arrays can be thought of as multi-dimensional counterparts to **basic mathematical data structures** such as **scalars**, **vectors**, and **matrices**. Here's an analogy to help you understand this relationship:

1. **Scalars:**
- In mathematics, a scalar is a single numerical value, representing magnitude but not direction.
- In `Numpy`, a scalar is the simplest form of an array, with zero dimensions. It represents a single value.
- Analogously, a `Numpy` scalar can be seen as the equivalent of a scalar in mathematics.

2. Vectors:
- In mathematics, a vector is a one-dimensional array of values, with both magnitude and direction.
- In `Numpy`, a one-dimensional array represents a vector.
- Analogously, a `Numpy` one-dimensional array can be seen as the equivalent of a vector in mathematics.

3. Matrices:
- In mathematics, a matrix is a two-dimensional array of values, arranged in rows and columns.
- In `Numpy`, a two-dimensional array represents a matrix.
- Analogously, a `Numpy` two-dimensional array can be seen as the equivalent of a matrix in mathematics.

4. n-dimensional arrays:
- In mathematics, higher-dimensional arrays can be thought of as extensions of vectors and matrices.
- In `Numpy`, n-dimensional arrays represent arrays with more than two dimensions.
- Analogously, a `Numpy` n-dimensional array can be seen as the extension of vectors and matrices to higher dimensions.

By leveraging the power of n-dimensional arrays, `Numpy` provides a versatile framework for handling and manipulating data in various dimensions, making it well-suited for tasks in **machine learning** and **scientific computing**.

**Note:** It's important to note that while the analogy helps in understanding the relationship between `Numpy` arrays and mathematical data structures, `Numpy` arrays also have additional capabilities and functionalities specific to array operations, broadcasting, and other numerical computations.

<a name='2-4'></a>
### 2.4. Accessing and modifying array elements

`Numpy` arrays provide convenient ways to access and modify individual elements or subsets of elements within the array.

#### Accessing Array Elements

You can access specific elements in a `Numpy` array using indexing. `Numpy` arrays are zero-indexed, which means the first element has an index of `0`. Here are a few examples:

In [21]:
my_array = np.array([1, 2, 3, 4, 5])

# Access the first element
print(my_array[0])  # Output: 1

# Access the third element
print(my_array[2])  # Output: 3

# Access the last element
print(my_array[-1])  # Output: 5

1
3
5


In this example, `my_array` is a **1-dimensional** array. We use square brackets `[]` with the desired index to access specific elements. Negative indices can be used to access elements from the end of the array.

For multi-dimensional arrays, you can use multiple indices to access elements in different dimensions. Here's an example with a **2-dimensional** array:

In [25]:
my_array = np.array([[1, 2, 3], [4, 5, 6]])

# Access the element at row 0, column 1
print(my_array[0, 1])  # Output: 2

# Access the element at row 1, column 2
print(my_array[1, 2])  # Output: 6

2
6


In this case, `my_array` is a **2-dimensional** array. We use comma-separated indices within the square brackets to access specific elements based on their row and column positions.

#### Modifying Array Elements

`Numpy` arrays allow you to modify individual elements or subsets of elements by assigning new values. Here are a few examples:

In [29]:
my_array = np.array([1, 2, 3, 4, 5])

# Modify the second element
my_array[1] = 10
print(my_array)  # Output: [1, 10, 3, 4, 5]

# Modify a subset of elements (from the third element to the last element)
my_array[2:] = [20, 30, 40]
print(my_array)  # Output: [1, 10, 20, 30, 40]

[ 1 10  3  4  5]
[ 1 10 20 30 40]


In this example, we first modify the second element of `my_array` by assigning a new value. Then, we modify a subset of elements from index 2 to 4 (inclusive) by assigning a new list of values.

For multi-dimensional arrays, you can modify elements in a similar way using indexing. Here's an example:

In [30]:
my_array = np.array([[1, 2, 3], [4, 5, 6]])

# Modify the element at row 1, column 0
my_array[1, 0] = 10
print(my_array)

[[ 1  2  3]
 [10  5  6]]


In this case, we modify the element at row `1` and column `0` by assigning a new value.

Understanding how to access and modify array elements is fundamental when working with Numpy arrays, as it allows you to extract and update the data within the arrays based on your specific needs.

<a name='2-5'></a>
### 2.5. Basic array operations (element-wise operations, broadcasting)

`Numpy` arrays support various basic operations that can be performed element-wise on the arrays. These operations include arithmetic operations, mathematical functions, and logical operations. In this subsection, we will explore how to perform element-wise operations and utilize broadcasting in `Numpy`.

#### Element-wise Operations

Element-wise operations allow you to perform arithmetic operations or apply mathematical functions to each element in a `Numpy` array independently. Here are a few examples:

In [33]:
my_array = np.array([1, 2, 3, 4, 5])

# Addition
result_add = my_array + 2
print(result_add)  # Output: [3, 4, 5, 6, 7]

# Subtraction
result_subtract = my_array - 2
print(result_subtract)  # Output: [-1, 0, 1, 2, 3]

# Multiplication
result_multiply = my_array * 2
print(result_multiply)  # Output: [2, 4, 6, 8, 10]

# Division
result_divide = my_array / 2
print(result_divide)  # Output: [0.5, 1.0, 1.5, 2.0, 2.5]

# Exponentiation
result_power = my_array ** 2
print(result_power)  # Output: [1, 4, 9, 16, 25]

[3 4 5 6 7]
[-1  0  1  2  3]
[ 2  4  6  8 10]
[0.5 1.  1.5 2.  2.5]
[ 1  4  9 16 25]


In this example, we perform various arithmetic operations (addition, subtraction, multiplication, division, and exponentiation) on `my_array`, resulting in new arrays with the element-wise operation applied.

`Numpy` also provides a wide range of mathematical functions that can be applied element-wise to arrays, such as `np.sin()`, `np.cos()`, `np.exp()`, etc. Here's an example:

In [35]:
my_array = np.array([0, np.pi/2, np.pi])

# Sine function
result_sin = np.sin(my_array)
print(result_sin)  # Output: [0.0, 1.0, 1.2246468e-16]

# Exponential function
result_exp = np.exp(my_array)
print(result_exp)  # Output: [ 1., 4.81047738, 23.14069263]

[0.0000000e+00 1.0000000e+00 1.2246468e-16]
[ 1.          4.81047738 23.14069263]


In this case, we apply the `np.sin()` and `np.exp()` functions to `my_array`, resulting in new arrays with the element-wise function applied.

#### Broadcasting

Broadcasting is a powerful feature in `Numpy` that enables operations between arrays of different shapes or dimensions. `Numpy` automatically performs broadcasting when certain conditions are met. Here's an example:

In [37]:
my_array = np.array([[1, 2, 3], [4, 5, 6]])
scalar = 2

result_broadcast = my_array * scalar
print(result_broadcast)

[[ 2  4  6]
 [ 8 10 12]]


In this example, we multiply a **2-dimensional** array my_array with a scalar value `scalar`. `Numpy` automatically broadcasts the `scalar` to match the shape of the array, and the element-wise multiplication is performed accordingly.

**Broadcasting** allows for concise and efficient code implementation, eliminating the need for explicit loops or repetitions when operating on arrays of different shapes.

Understanding and utilizing element-wise operations and broadcasting in `Numpy` are essential when performing mathematical computations and transformations.

<a name='3'></a>
# 3. Array Initialisation and Attributes

In this section, we will explore different methods of initialising `Numpy` arrays with specific values or patterns. We will also learn about various attributes associated with `Numpy` arrays that provide useful information about the array's shape, size, data type, etc.

<a name='3-1'></a>
### 3.1. Creating arrays with specific values (np.zeros(), np.ones(), np.full(), etc.)

#### Initializing arrays with a single value
You can create an array filled with a single value using the `np.full()` function. Here's an example:

In [39]:
# Create a 3x3 matrix filled with value 5
my_array = np.full((3, 3), 5)
print(my_array)

[[5 5 5]
 [5 5 5]
 [5 5 5]]


#### Creating arrays of zeros or ones
You can create an array filled with zeros or ones using the `np.zeros()` or `np.ones()` functions. Here's an example:

In [49]:
# Create a 2x3 matrix filled with zeros
my_zeros_array = np.zeros(shape=(2, 3))
print(my_zeros_array)

# Create a 2x3 array filled with ones
my_ones_array = np.ones(shape=(2, 3))
print(my_ones_array)

[[0. 0. 0.]
 [0. 0. 0.]]
[[1. 1. 1.]
 [1. 1. 1.]]


#### Generating random arrays
`Numpy` provides functions to generate random arrays with different distributions. Here's an example:

In [50]:
# Create a 3x3 matrix with random values from a standard normal distribution
my_random_array = np.random.randn(3, 3)
print(my_random_array)

[[ 0.56304351 -0.98598801 -1.1967482 ]
 [ 1.41891495  0.03252234 -0.71940226]
 [ 0.15576343 -2.01528573  0.76977121]]


**Note:** `np.random.randn` generates an array ofilled with random floats sampled from a univariate “normal” (Gaussian) distribution of mean `0` and variance `1`. 

<a name='3-2'></a>
### 3.2. Generating arrays with a range of values (np.arange(), np.linspace(), etc.)
You can create an array with a sequence of values using the `np.arange()` or `np.linspace()` functions. Here are examples of both:

In [40]:
# Create an array with values from 0 to 10 (exclusive)
my_array1 = np.arange(10)
print(my_array1)  # Output: [0 1 2 3 4 5 6 7 8 9]

# Create an array with 5 equally spaced values from 0 to 1 (inclusive)
my_array2 = np.linspace(0, 1, 5)
print(my_array2)  # Output: [0.   0.25 0.5  0.75 1.  ]

[0 1 2 3 4 5 6 7 8 9]
[0.   0.25 0.5  0.75 1.  ]


<a name='3-3'></a>
### 3.3. Array attributes (shape, size, data type, etc.)

In this sub-section, we will explore various attributes associated with `Numpy` arrays that provide useful information about the array.

#### Shape of the array
The shape of an array refers to its dimensions. You can access the shape attribute of a `Numpy` array using the `.shape` property. Here's an example:

In [51]:
my_array = np.array([[1, 2, 3], [4, 5, 6]])
print(my_array.shape)  # Output: (2, 3)

(2, 3)


In this example, the shape of `my_array` is `(2, 3)`, indicating that it has `2` rows and `3` columns.

#### Dimensions and size of the array
The number of dimensions of an array can be obtained using the `.ndim` property, and the total number of elements in the array can be obtained using the `.size` property. Here's an example:

In [54]:
my_array = np.array([[1, 2, 3], [4, 5, 6]])
print(my_array.ndim)  # Output: 2
print(my_array.size)  # Output: 6

2
6


In this example, `my_array` has `2` dimensions and a total of `6` elements.

#### Data type of the array
The data type of an array can be accessed using the `.dtype` property. Here's an example:

In [56]:
my_array = np.array([1, 2, 3])
print(my_array.dtype)  # Output: int64

int64


In this example, the data type of `my_array` is `int64`, indicating that it contains 64-bit integers.

#### Number of elements in the array
The number of elements in an array can be obtained using the `len()` function. Here's an example:

In [62]:
my_array = np.array([1, 2, 3])
print(len(my_array))  # Output: 3

3


In this example, `my_array` contains `3` elements.

**Note:** In the case of n-dimensional arrays ($n>1$), `len()` will give you the number of 'elements' of the first dimension whereas the attribute `.size` will give you the number of 'elements' in the array. Here's an example:

In [66]:
my_array = np.ones(shape=(3,3))
len(my_array) # Output: 3
my_array.size # Output: 9

9

#### Memory consumption of the array (optional)
The memory consumed of an array obtained by using the `.nbytes` property which can also be calculated by multiplying its size (number of elements) with the item size (memory size of each element). Here's an example:

In [67]:
my_array = np.array([1, 2, 3])
print(my_array.nbytes)  # Output: 24

24


In this example, `my_array` consumes `24` bytes of memory (assuming each integer element requires `8` bytes).

In [71]:
my_array = np.full(shape=(3,3), fill_value=.5)
my_array.nbytes

72

<a name='4'></a>
# 4. Array Operations

In this section, we will explore various mathematical operations that can be performed on `Numpy` arrays, such as addition, subtraction, multiplication, division, exponentiation, and more.

<a name='4-1'></a>
### 4.1. Mathematical operations with arrays (+, -, *, /, etc.)

`Numpy` provides element-wise operations for mathematical operations between arrays, allowing you to perform arithmetic operations on corresponding elements of two or more arrays. Here's how it works:

In [74]:
# Creating two arrays
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])

# Addition
add_result = arr1 + arr2
print("Addition:", add_result)

# Subtraction
sub_result = arr1 - arr2
print("Subtraction:", sub_result)

# Multiplication
mul_result = arr1 * arr2
print("Multiplication:", mul_result)

# Division
div_result = arr1 / arr2
print("Division:", div_result)

# Exponentiation
exp_result = arr1 ** arr2
print("Exponentiation:", exp_result)

Addition: [5 7 9]
Subtraction: [-3 -3 -3]
Multiplication: [ 4 10 18]
Division: [0.25 0.4  0.5 ]
Exponentiation: [  1  32 729]


In this example, we have two arrays `arr1` and `arr2`. We perform various mathematical operations on these arrays using the corresponding operators `+`, `-`, `*`, `/`, and `**`. The operations are performed element-wise, meaning each element of one array is operated on with the corresponding element of the other array.

It's important to note that the arrays involved in the operations must have compatible shapes, or they must be broadcastable to compatible shapes. Broadcasting allows arrays with different shapes to be used in arithmetic operations by automatically extending or duplicating their values to match each other's shapes.

`Numpy` also provides many other mathematical functions and operations that can be applied to arrays, such as `np.sin()`, `np.cos()`, `np.sqrt()`, and more. These functions operate element-wise on the array and return a new array with the calculated values.

In [76]:
# Creating an array
my_array = np.array([1, 2, 3])

# Applying mathematical functions
sin_result = np.sin(my_array)
print("Sine:", sin_result)

cos_result = np.cos(my_array)
print("Cosine:", cos_result)

sqrt_result = np.sqrt(my_array)
print("Square Root:", sqrt_result)

Sine: [0.84147098 0.90929743 0.14112001]
Cosine: [ 0.54030231 -0.41614684 -0.9899925 ]
Square Root: [1.         1.41421356 1.73205081]


In this example, we apply various mathematical functions (`np.sin()`, `np.cos()`, and `np.sqrt()`) to `my_array` and obtain new arrays with the calculated values.

These mathematical operations and functions play a vital role in performing computations on arrays in machine learning and numerical computing tasks, allowing you to manipulate and analyze data efficiently.

<a name='4-2'></a>
### 4.2. Array aggregation functions (np.sum(), np.mean(), np.min(), np.max(), etc.)

`Numpy` provides a wide range of aggregation functions that allow you to calculate various statistics and properties of `Numpy` arrays. These functions can help you summarize and analyze your data efficiently. Here are some commonly used aggregation functions:

In [77]:
# Creating an array
my_array = np.array([1, 2, 3, 4, 5])

# Sum of array elements
sum_result = np.sum(my_array)
print("Sum:", sum_result)

# Mean of array elements
mean_result = np.mean(my_array)
print("Mean:", mean_result)

# Minimum value in the array
min_result = np.min(my_array)
print("Minimum:", min_result)

# Maximum value in the array
max_result = np.max(my_array)
print("Maximum:", max_result)

Sum: 15
Mean: 3.0
Minimum: 1
Maximum: 5


In this example, we have an array `my_array`, and we apply various aggregation functions to calculate the `sum`, `mean`, `minimum`, and `maximum` values of the array.

`Numpy` aggregation functions also support specifying the axis parameter to calculate the aggregation along a specific axis in multi-dimensional arrays. This allows you to perform aggregations row-wise or column-wise. Here's an example:

In [78]:
# Creating a 2D array
my_array = np.array([[1, 2, 3],
                     [4, 5, 6]])

# Sum along the rows (axis=0)
sum_rows = np.sum(my_array, axis=0)
print("Sum along rows:", sum_rows)

# Sum along the columns (axis=1)
sum_cols = np.sum(my_array, axis=1)
print("Sum along columns:", sum_cols)

Sum along rows: [5 7 9]
Sum along columns: [ 6 15]


In this example, we calculate the `sum` along the rows (`axis=0`) and columns (`axis=1`) of the 2D array `my_array`.

`Numpy` provides many other aggregation functions, such as `np.median()`, `np.std()`, `np.var()` and more. These functions allow you to calculate the median, standard deviation, variance and other statistical properties of your arrays.

In [80]:
# Creating an array
my_array = np.array([1, 2, 3, 4, 5])

# Median of array elements
median_result = np.median(my_array)
print("Median:", median_result)

# Standard deviation of array elements
std_result = np.std(my_array)
print("Standard Deviation:", std_result)

# Variance of array elements
var_result = np.var(my_array)
print("Variance:", var_result)

Median: 3.0
Standard Deviation: 1.4142135623730951
Variance: 2.0


In this example, we calculate the median, standard deviation, and variance of the array `my_array`.

These aggregation functions are essential for analyzing and summarizing data in machine learning and data analysis tasks. They provide valuable insights into the distribution and properties of the arrays.

<a name='4-3'></a>
### 4.3. Array transformations (np.reshape(), np.transpose(), etc.)

`Numpy` provides several functions to transform and manipulate the shape and dimensions of `Numpy` arrays. These functions allow you to `reshape`, `transpose`, `concatenate`, and `split` arrays. Let's explore some commonly used array transformation functions.

#### Reshaping arrays using `np.reshape()`
The `np.reshape()` function allows you to change the shape of an array while maintaining the same elements. Here's an example:

In [84]:
# Creating a 1D array
arr1d = np.array([1, 2, 3, 4, 5, 6])

# Reshaping the array to a 2D array
arr2d = np.reshape(arr1d, (2, 3))
print("2D Array:\n", arr2d)


2D Array:
 [[1 2 3]
 [4 5 6]]


In this example, we reshape the 1D array `arr1d` into a 2D array `arr2d` with a shape of `(2, 3)`.

**Note:** The total number of elements before and after a reshaping operation should remain the same. For example, if you try: `arr2d = np.reshape(arr1d, (2, 2))`, you'll get a `ValueError`.

#### Transposing arrays using `np.transpose()`
The `np.transpose()` function allows you to interchange the dimensions of an array. Here's an example:

In [89]:
# Creating a 2D array
arr2d = np.array([[1, 2, 3],
                  [4, 5, 6]])

# Transposing the array
transposed_arr = np.transpose(arr2d)
print("Transposed Array:\n", transposed_arr)

Transposed Array:
 [[1 4]
 [2 5]
 [3 6]]


In this example, we transpose the 2D array `arr2d` by interchanging its rows and columns.

#### Concatenating arrays using `np.concatenate()`
The `np.concatenate()` function allows you to concatenate multiple arrays along a specified axis. Here's an example:

In [94]:
# Creating two 1D arrays
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])

# Concatenating the arrays along the axis 0
concatenated_arr = np.concatenate((arr1, arr2), axis=0)
print("Concatenated Array:", concatenated_arr)

Concatenated Array: [1 2 3 4 5 6]


In this example, we concatenate the two 1D arrays `arr1` and `arr2` along the `axis=0`, resulting in a single 1D array.

#### Stacking arrays using `np.stack()`
The `np.stack()` function allows you to stack arrays along a new axis. Here's an example:

In [98]:
# Creating two arrays
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])

# Stacking the arrays along a new axis (axis=0)
stacked_arr = np.stack((arr1, arr2), axis=0)
print("Stacked Array:\n", stacked_arr)

Stacked Array:
 [[1 2 3]
 [4 5 6]]


In this example, we stack the two 1D arrays `arr1` and `arr2` along a new axis (`axis=0`), resulting in a 2D array.

#### Horizontally stacking arrays using `np.hstack()`
The `np.hstack()` function allows you to horizontally stack arrays. Here's an example:

In [99]:
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])

# Horizontally stacking the arrays
hstacked_arr = np.hstack((arr1, arr2))
print("Horizontally Stacked Array:", hstacked_arr)

Horizontally Stacked Array: [1 2 3 4 5 6]


In this example, we horizontally stack the two 1D arrays `arr1` and `arr2`, resulting in a single 1D array.

#### Vertically stacking arrays using `np.vstack()`
The `np.vstack()` function allows you to vertically stack arrays. Here's an example:

In [100]:
# Creating two arrays
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])

# Vertically stacking the arrays
vstacked_arr = np.vstack((arr1, arr2))
print("Vertically Stacked Array:\n", vstacked_arr)

Vertically Stacked Array:
 [[1 2 3]
 [4 5 6]]


In this example, we vertically stack the two 1D arrays `arr1` and `arr2`, resulting in a 2D array.

These array combining functions provide flexibility in combining arrays based on your desired axis and dimensions. They are useful when you need stack arrays to form larger structures or perform operations that require combined arrays.

#### Splitting arrays using `np.split()`
The `np.split()` function allows you to split an array into multiple sub-arrays along a specified axis. Here's an example:

In [95]:
# Creating a 1D array
arr1d = np.array([1, 2, 3, 4, 5, 6])

# Splitting the array into three equally sized sub-arrays
split_arr = np.split(arr1d, 3)
print("Split Array:", split_arr)

Split Array: [array([1, 2]), array([3, 4]), array([5, 6])]


In this example, we split the 1D array `arr1d` into three equal-sized sub-arrays.

These array transformation functions allow you to manipulate and restructure arrays to meet the requirements of your machine learning or data analysis tasks. They provide flexibility in reshaping, transposing, concatenating, and splitting arrays based on your specific needs.

<a name='5'></a>
# 5. Linear Algebra with Numpy

Linear algebra is a fundamental mathematical discipline that plays a crucial role in many areas of machine learning and numerical computing. `Numpy`, with its powerful array operations and linear algebra functions, provides a convenient and efficient way to perform various linear algebra operations. In this section, we will explore the capabilities of `Numpy` for linear algebra tasks. We will cover matrix operations, such as matrix multiplication and inversion, as well as other essential linear algebra operations. By understanding and utilising these functions, you will be equipped to handle a wide range of linear algebra computations in your machine learning projects.

<a name='5-1'></a>
### 5.1. Matrix operations (np.dot(), np.matmul(), np.linalg.inv(), etc.)

`Numpy` provides a wide range of functions for performing various matrix operations. These functions allow you to perform matrix multiplication, dot product, inverse, determinant calculation, and more. Let's explore some commonly used matrix operations functions.

#### Matrix multiplication using `np.dot()` or `@` operator
The `np.dot()` function or the `@` operator allows you to perform matrix multiplication between two arrays. Here's an example:

In [101]:
# Creating two matrices
matrix1 = np.array([[1, 2], [3, 4]])
matrix2 = np.array([[5, 6], [7, 8]])

# Matrix multiplication using np.dot()
result_dot = np.dot(matrix1, matrix2)
print("Matrix Multiplication (np.dot()):\n", result_dot)

# Matrix multiplication using @ operator
result_at = matrix1 @ matrix2
print("Matrix Multiplication (@ operator):\n", result_at)

Matrix Multiplication (np.dot()):
 [[19 22]
 [43 50]]
Matrix Multiplication (@ operator):
 [[19 22]
 [43 50]]


In this example, we perform matrix multiplication between `matrix1` and `matrix2` using both the `np.dot()` function and the `@` operator, resulting in the product matrix.

#### Matrix multiplication with `np.matmul()`
The `np.matmul()` function also allows you to perform matrix multiplication between arrays, similar to `np.dot()`. Here's an example:

In [102]:
# Creating two matrices
matrix1 = np.array([[1, 2], [3, 4]])
matrix2 = np.array([[5, 6], [7, 8]])

# Matrix multiplication using np.matmul()
result_matmul = np.matmul(matrix1, matrix2)
print("Matrix Multiplication (np.matmul()):\n", result_matmul)

Matrix Multiplication (np.matmul()):
 [[19 22]
 [43 50]]


In this example, we perform matrix multiplication between `matrix1` and `matrix2` using the `np.matmul()` function, resulting in the product matrix.

**Note:**
- **`np.matmul()`**: The `np.matmul()` function performs matrix multiplication between two arrays. It follows the matrix multiplication rules and handles multi-dimensional arrays appropriately. It is recommended to use `np.matmul()` for matrix multiplication operations.

- **`np.dot()`**: The `np.dot()` function performs dot product between two arrays. For 2D arrays, it is equivalent to matrix multiplication. However, for higher-dimensional arrays, `np.dot()` performs a summation product over the last axes of the arrays.

- **`@` operator**: The `@` operator is a shorthand notation for matrix multiplication in `Numpy`. It can be used between two arrays to perform matrix multiplication, similar to `np.matmul()`. The `@` operator was introduced in `Python 3.5` as a convenient way to express matrix multiplication.

#### Matrix transpose using `np.transpose()`
The `np.transpose()` function allows you to transpose a matrix by flipping its rows and columns. Here's an example:

In [105]:
# Creating a matrix
matrix = np.array([[1, 2], [3, 4], [5, 6]])

# Matrix transpose using np.transpose()
transpose_matrix = np.transpose(matrix)
print("Matrix Transpose:\n", transpose_matrix)

Matrix Transpose:
 [[1 3 5]
 [2 4 6]]


#### Matrix inversion using `np.linalg.inv()`
The `np.linalg.inv()` function allows you to calculate the inverse of a matrix. Here's an example:

In [103]:
# Creating a matrix
matrix = np.array([[1, 2], [3, 4]])

# Matrix inversion using np.linalg.inv()
inverse_matrix = np.linalg.inv(matrix)
print("Inverse Matrix:\n", inverse_matrix)

Inverse Matrix:
 [[-2.   1. ]
 [ 1.5 -0.5]]


In this example, we calculate the inverse of the matrix using the `np.linalg.inv()` function.

These are just a few examples of matrix operations that can be performed using `Numpy`. `Numpy` provides a wide range of functions for linear algebra operations, including solving linear equations, calculating eigenvalues and eigenvectors, and more. These functions are powerful tools for performing various matrix computations in machine learning and numerical computations.

<a name='5-2'></a>
### 5.2. Eigenvalues and eigenvectors (np.linalg.eig())

Eigenvalues and eigenvectors are essential concepts in linear algebra, and they have wide applications in various fields, including machine learning and data analysis. Numpy provides the `np.linalg.eig()` function to compute the eigenvalues and eigenvectors of a square matrix. Let's explore how to use this function:

In [106]:
# Creating a matrix
matrix = np.array([[1, 2], [3, 4]])

# Computing eigenvalues and eigenvectors using np.linalg.eig()
eigenvalues, eigenvectors = np.linalg.eig(matrix)

print("Eigenvalues:\n", eigenvalues)
print("Eigenvectors:\n", eigenvectors)

Eigenvalues:
 [-0.37228132  5.37228132]
Eigenvectors:
 [[-0.82456484 -0.41597356]
 [ 0.56576746 -0.90937671]]


In this example, we compute the eigenvalues and eigenvectors of the matrix using `np.linalg.eig()`. The eigenvalues are stored in the `eigenvalues` array, and the corresponding eigenvectors are stored in the `eigenvectors` array. 

Eigenvalues represent scalar values that describe the behavior of a linear transformation associated with a matrix. Eigenvectors are non-zero vectors that, when multiplied by the matrix, result in a scaled version of the original vector.

The eigenvalues and eigenvectors provide valuable insights into the properties and behavior of linear transformations, which are useful in various applications, such as dimensionality reduction, spectral analysis, and understanding the dynamics of linear systems.

It's important to note that the `np.linalg.eig()` function assumes that the input matrix is square. If you need to compute the eigenvalues and eigenvectors of a non-square matrix, you can use the singular value decomposition (SVD) approach provided by the `np.linalg.svd()` function.

Using `np.linalg.eig()` allows you to perform eigenvalue and eigenvector computations efficiently in `Numpy`, empowering you to analyze and understand the characteristics of linear transformations and systems.

<a name='5-3'></a>
### 5.3. Solving linear equations (np.linalg.solve())

In many applications, solving linear equations is a common task in linear algebra. `Numpy` provides the `np.linalg.solve()` function to efficiently solve systems of linear equations. This function takes the coefficient matrix and the constant vector as inputs and returns the solution vector. Let's see how to use this function:

In [107]:
# Coefficient matrix
A = np.array([[2, 3], [1, -2]])

# Constant vector
b = np.array([5, -4])

# Solving linear equations using np.linalg.solve()
solution = np.linalg.solve(A, b)

print("Solution:\n", solution)

Solution:
 [-0.28571429  1.85714286]


In this example, we have a system of linear equations represented by the coefficient matrix `A` and the constant vector `b`. We use `np.linalg.solve()` to find the solution vector that satisfies the equation `A @ solution = b`. The solution vector contains the values that solve this system of equations.

It's important to note that the `np.linalg.solve()` function assumes that the system of equations has a unique solution. If the system is singular or overdetermined, an error may occur. In such cases, you may need to consider alternative methods, such as least squares regression.

Solving linear equations is a fundamental operation in various mathematical and scientific applications, including optimisation problems, data fitting, and signal processing. `Numpy`'s `np.linalg.solve()` function provides a straightforward and efficient way to find the solutions, allowing you to tackle these problems effectively in your machine learning and numerical computing tasks.

<a name='6'></a>
# 6. Advanced Numpy Techniques

In this section, we will explore advanced techniques and functionalities in `Numpy` that can enhance your array manipulation and computational capabilities. These techniques go beyond the basics and provide powerful tools for handling complex data operations efficiently. By understanding and leveraging these advanced `Numpy` features, you can streamline your code and tackle more sophisticated tasks in machine learning and data analysis.

<a name='6-1'></a>
### 6.1. Broadcasting (Advanced examples)

Broadcasting is a powerful mechanism in `Numpy` that allows arrays of different shapes to be used together in element-wise operations. In a previous section, we introduced broadcasting briefly, but now we will delve into advanced examples to illustrate its capabilities.

Broadcasting eliminates the need for explicit loops and enables efficient computations by automatically expanding arrays to match their shapes. This feature is particularly useful when dealing with arrays of different sizes or dimensions, as it simplifies the code and improves performance.

Let's explore some advanced examples of broadcasting in `Numpy`:

In [108]:
# Broadcasting example 1
a = np.array([[1, 2, 3], [4, 5, 6]])
b = np.array([10, 20, 30])
result1 = a + b
print("Example 1 - Broadcasting:\n", result1)

Example 1 - Broadcasting:
 [[11 22 33]
 [14 25 36]]


In [109]:
# Broadcasting example 2
c = np.array([[1], [2], [3]])
d = np.array([10, 20, 30])
result2 = c * d
print("\nExample 2 - Broadcasting:\n", result2)


Example 2 - Broadcasting:
 [[10 20 30]
 [20 40 60]
 [30 60 90]]


In the first example, we have a 2D array `a` and a 1D array `b`. By simply adding these arrays together `(a + b)`, `Numpy` automatically broadcasts the dimensions to match and performs element-wise addition. The result is an array of the same shape as `a` where each element is added with the corresponding element from `b`.

In the second example, we have a 2D array `c` and a 1D array `d`. By multiplying these arrays `(c * d)`, `Numpy` broadcasts the dimensions to match and performs element-wise multiplication. The result is an array of the same shape as `c` where each element is multiplied by the corresponding element from `d`.

These examples illustrate how broadcasting simplifies element-wise operations between arrays of different shapes. `Numpy`'s broadcasting rules enable efficient computations by expanding arrays implicitly, saving you from writing explicit loops and allowing for more concise and readable code.

By understanding and utilising broadcasting effectively, you can perform complex operations on arrays of different shapes seamlessly, leading to efficient and elegant code in your machine learning workflows.

<a name='6-2'></a>
### 6.2. Vectorisation and performance optimisation

Vectorisation is a technique in `Numpy` that allows you to perform operations on entire arrays or large chunks of data simultaneously, rather than iterating over individual elements. By leveraging vectorisation, you can achieve significant performance improvements and write more concise and efficient code.

`Numpy` provides a wide range of built-in functions and operations that are vectorised, meaning they can be applied element-wise to arrays without the need for explicit loops. This enables you to express complex computations in a more compact and readable manner.

Let's explore some examples of vectorisation and performance optimisation in `Numpy`:

In [114]:
# Vectorisation example 1
a = np.array([1, 2, 3, 4, 5])
b = np.array([10, 20, 30, 40, 50])
result1 = a + b
print("Example 1 - Vectorisation:\n", result1)

# Vectorisation example 2
c = np.array([1, 2, 3, 4, 5])
result2 = np.sin(c)
print("\nExample 2 - Vectorisation:\n", result2)

# Performance optimisation example
n = 1000000
data = np.random.rand(n)
result3 = np.sum(data)
print("\nPerformance Optimisation Example - Summing an Array:\n", result3)

Example 1 - Vectorisation:
 [11 22 33 44 55]

Example 2 - Vectorisation:
 [ 0.84147098  0.90929743  0.14112001 -0.7568025  -0.95892427]

Performance Optimisation Example - Summing an Array:
 499664.585091213


In the first example, we perform element-wise addition between two arrays `a` and `b` `(a + b)`. `Numpys`'s vectorisation capabilities enable us to add the corresponding elements of the arrays directly, resulting in an array with the same shape as the input arrays.

In the second example, we apply the `np.sin()` function to an array `c`. `Numpy`'s vectorised implementation allows us to compute the sine of each element of the array without the need for a loop.

Furthermore, `Numpy`'s vectorised operations can significantly enhance the performance of computations on large datasets. In the performance optimisation example, we generate a random array of size 1,000,000 (`data`), and then use `np.sum()` to calculate the `sum` of its elements. `Numpy`'s optimised implementation ensures efficient computation, resulting in faster execution compared to traditional loop-based approaches.

By utilising vectorization and taking advantage of `Numpy`'s built-in functions, you can optimise your code and achieve better performance in data processing, numerical computations, and machine learning tasks. Vectorisation allows you to express complex operations succinctly and efficiently, leading to more readable and faster code execution.

<a name='6-3'></a>
### 6.3. Memory optimization techniques (views, copies, etc.)

Memory optimisation is crucial when working with large datasets or performing computationally intensive tasks. `Numpy` provides various techniques that allow you to manage memory efficiently, including views and copies of arrays. Understanding these techniques can help you avoid unnecessary memory usage and improve the performance of your code.

#### Views
In `Numpy`, a view refers to a different way of accessing the same data without creating a new copy. Views allow you to manipulate or extract specific portions of an array without incurring the memory overhead of creating a new array. They provide a convenient way to work with sub-arrays or modify the shape of an array without copying the data.

Let's see an example of creating a view in `Numpy`:

In [120]:
# Creating an array
a = np.array([1, 2, 3, 4, 5])

# Creating a view
view = a[1:4]

print("Original array:", a)
print("View:", view)

view[0] = 1
print("Original array:", a)
print("View:", view)

Original array: [1 2 3 4 5]
View: [2 3 4]
Original array: [1 1 3 4 5]
View: [1 3 4]


In this example, we create a view `view` by slicing the original array `a` using the range `[1:4]`. The view represents a subset of the original array, sharing the same underlying data. Modifying the view will affect the original array, and vice versa.

Views are particularly useful when working with large arrays, as they provide a memory-efficient way to manipulate and extract portions of data without the need for additional memory allocation.

#### Copies
Sometimes, you may need to create an independent copy of an array to avoid unintended side effects or modify the data without affecting the original array. In `Numpy`, you can create a copy of an array using the `copy()` method.

Let's see an example of creating a copy of an array:

In [122]:
# Creating an array
a = np.array([1, 2, 3, 4, 5])

# Creating a copy
copy = a.copy()

print("Original array:", a)
print("Copy:", copy)

copy[1] = 1
print("Original array:", a)
print("Copy:", copy)

Original array: [1 2 3 4 5]
Copy: [1 2 3 4 5]
Original array: [1 2 3 4 5]
Copy: [1 1 3 4 5]


In this example, we create a copy `copy` of the original array a using the `copy()` method. The copy is a new array with its own separate data, allowing modifications to one array without affecting the other.

Creating copies can be beneficial when you want to preserve the original data or perform independent operations on the array without modifying the original data.

Understanding when to use views and copies is essential for memory optimisation. Views provide a memory-efficient way to manipulate data, while copies provide independence and isolation. By using the appropriate technique, you can optimise memory usage and improve the performance of your code.

Memory optimisation is particularly critical when working with large datasets or performing computations on resource-constrained systems. By leveraging `Numpy`'s views and copies, you can effectively manage memory and ensure efficient execution of your machine learning and data analysis workflows.

<a name='7'></a>
# 7. Numpy and Machine Learning

`Numpy` plays a vital role in machine learning workflows as it provides efficient data structures and functions for numerical computations. This section focuses on the integration of `Numpy` with machine learning tasks, covering various aspects such as data loading, preprocessing, and handling.

<a name='7-1'></a>
### 7.1. Loading data into Numpy arrays

Before applying machine learning algorithms, it's essential to load and preprocess the data. `Numpy` provides convenient methods to load data from various sources into `Numpy` arrays, allowing you to easily manipulate and analyze the data using the extensive capabilities of `Numpy`.

Let's explore some common methods for loading data into `Numpy` arrays:

#### Loading data from CSV files
CSV (Comma-Separated Values) files are a popular format for storing structured data. `Numpy` provides the `np.genfromtxt()` function to load data from CSV files into `Numpy` arrays.

In [159]:
# Load data from a CSV file
data = np.genfromtxt('./datasets/data.csv', delimiter=';', skip_header=1)

print("Data loaded from CSV:\n", data)

Data loaded from CSV:
 [[ 1.   17.    5.   ...  1.4   1.74   nan]
 [ 1.   15.    1.   ... -0.3   0.79   nan]
 [ 1.    1.    5.   ...  1.4   1.74   nan]
 ...
 [ 1.    1.    1.   ... -0.3   0.79   nan]
 [ 1.    1.    1.   ... -0.8  -3.12   nan]
 [ 1.   10.    1.   ...  3.7  -1.7    nan]]


In this example, we use `np.genfromtxt()` to load data from a CSV file named `data.csv`. The `delimiter` parameter specifies the character used to separate the values in the file. The `skip_header` is used to skip the table header. The resulting `Numpy` array data contains the loaded data.

#### Loading data from text files
`Numpy` also allows loading data from general text files using the `np.loadtxt()` function. This function supports various options to handle different formats and data structures.

In [149]:
# Load data from a text file
data = np.loadtxt('./datasets/data.txt')

print("Data loaded from text file:\n", data)

Data loaded from text file:
 [[ 1.   17.    5.   ... 10.8   1.4   1.74]
 [ 1.   15.    1.   ... 13.9  -0.3   0.79]
 [ 1.    1.    5.   ... 10.8   1.4   1.74]
 ...
 [ 1.    1.    1.   ... 13.9  -0.3   0.79]
 [ 1.    1.    1.   ...  9.4  -0.8  -3.12]
 [ 1.   10.    1.   ... 12.7   3.7  -1.7 ]]


In this example, we use `np.loadtxt()` to load data from a text file named `data.txt`. The function automatically detects the structure of the file and loads it into a `Numpy` array data.

**Note:** The dataset used in these examples belongs to: Realinho,Valentim, Vieira Martins,Mónica, Machado,Jorge, and Baptista,Luís. (2021). __Predict students' dropout and academic success__. UCI Machine Learning Repository. https://doi.org/10.24432/C5MC89.

#### Loading data from other sources
`Numpy` provides additional functions for loading data from various sources, including databases, images, and more. For instance, you can use `np.fromfile()` to load binary data from a file, `np.frombuffer()` to load data from a buffer, and `np.fromfunction()` to create arrays based on a function.

These functions offer flexibility in loading data into `Numpy` arrays, allowing you to seamlessly integrate your data from different sources into your machine learning pipeline.

Loading data into `Numpy` arrays is an essential step in preparing your data for machine learning tasks. `Numpy`'s functions provide versatile options to handle different file formats and data sources, enabling you to efficiently load and manipulate your data using the powerful features of `Numpy`.

<a name='7-2'></a>
### 7.2. Preprocessing data with Numpy for machine learning tasks

Data preprocessing is a crucial step in preparing data for machine learning tasks. `Numpy` provides a range of functions and techniques to preprocess and transform data efficiently. This sub-section explores some common preprocessing techniques using `Numpy` for machine learning tasks.

#### Standardising features
Standardisation is a common preprocessing technique that involves scaling the features of the dataset to have zero mean and unit variance. `Numpy`'s `np.mean()` and `np.std()` functions are useful for calculating the mean and standard deviation of the data, respectively. You can then use these values to standardize the features.

In [157]:
data = np.genfromtxt('./datasets/features.csv', delimiter=',', skip_header=1)

# Calculate mean and standard deviation of each feature
mean = np.mean(data, axis=0)
std = np.std(data, axis=0)

# Standardise the features
standardised_data = (data - mean) / std

print("Standardised data:\n", standardised_data)
print(data.shape == standardised_data.shape)

Standardised data:
 [[-0.29482875 -0.09547022  2.49089589 ... -0.28763846  0.12438647
   0.76576084]
 [-0.29482875 -0.20986898 -0.55406775 ...  0.87622207 -1.10522155
   0.34719942]
 [-0.29482875 -1.01066035  2.49089589 ... -0.28763846  0.12438647
   0.76576084]
 ...
 [-0.29482875 -1.01066035 -0.55406775 ...  0.87622207 -1.10522155
   0.34719942]
 [-0.29482875 -1.01066035 -0.55406775 ... -0.81325289 -1.46687097
  -1.37551124]
 [-0.29482875 -0.4958659  -0.55406775 ...  0.42569541  1.7879738
  -0.74987207]]
True


In this example, we calculate the mean and standard deviation along each feature axis using `np.mean()` and `np.std()` with `axis=0`. Then, we subtract the mean and divide by the standard deviation to standardise the features. The resulting `standardised_data` array will have zero mean and unit variance for each feature.

#### Normalising features
Normalisation is another common preprocessing technique that involves scaling the values of the features to a specific range, typically between 0 and 1. `Numpy`'s `np.min()` and `np.max()` functions are helpful for finding the minimum and maximum values of the data, respectively. You can use these values to normalise the features.

In [160]:
data = np.genfromtxt('./datasets/features.csv', delimiter=',', skip_header=1)

# Calculate minimum and maximum values of each feature
min_val = np.min(data, axis=0)
max_val = np.max(data, axis=0)

# Normalise the features
normalised_data = (data - min_val) / (max_val - min_val)

print("Normalised data:\n", normalised_data)

Normalised data:
 [[0.         0.28571429 0.55555556 ... 0.37209302 0.48888889 0.7661823 ]
 [0.         0.25       0.11111111 ... 0.73255814 0.11111111 0.64068692]
 [0.         0.         0.55555556 ... 0.37209302 0.48888889 0.7661823 ]
 ...
 [0.         0.         0.11111111 ... 0.73255814 0.11111111 0.64068692]
 [0.         0.         0.11111111 ... 0.20930233 0.         0.12417437]
 [0.         0.16071429 0.11111111 ... 0.59302326 1.         0.31175694]]


In this example, we compute the minimum and maximum values along each feature axis using `np.min()` and `np.max()` with `axis=0`. Then, we subtract the minimum and divide by the range (maximum minus minimum) to normalise the features. The resulting `normalised_data` array will have values within the range `[0, 1]` for each feature.

#### Handling missing values
`Numpy` provides functions to handle missing values in datasets. The `np.isnan()` function allows you to identify missing values in an array, and the `np.nan_to_num()` function replaces these missing values with specified values.

In [172]:
data = np.genfromtxt('./datasets/data.csv', delimiter=';', skip_header=1)

# Identify missing values
missing_values = np.isnan(data)
print(missing_values)

# Replace missing values with 0
data_without_missing_values = np.nan_to_num(data, nan=0)

print("\nData without missing values:\n", data_without_missing_values)

[[False False False ... False False  True]
 [False False False ... False False  True]
 [False False False ... False False  True]
 ...
 [False False False ... False False  True]
 [False False False ... False False  True]
 [False False False ... False False  True]]

Data without missing values:
 [[ 1.   17.    5.   ...  1.4   1.74  0.  ]
 [ 1.   15.    1.   ... -0.3   0.79  0.  ]
 [ 1.    1.    5.   ...  1.4   1.74  0.  ]
 ...
 [ 1.    1.    1.   ... -0.3   0.79  0.  ]
 [ 1.    1.    1.   ... -0.8  -3.12  0.  ]
 [ 1.   10.    1.   ...  3.7  -1.7   0.  ]]


In this example, we use `np.isnan()` to identify missing values in the data array. The resulting missing_values array will have `True` where missing values are present and `False` otherwise. Then, we use `np.nan_to_num()` to replace the missing values with `0` in the data array, creating the `data_without_missing_values` array.

These are just a few examples of data preprocessing techniques using `Numpy` for machine learning tasks. `Numpy`'s versatile functions and array operations enable you to efficiently preprocess and transform data, making it ready for training machine learning models.


# [Table of Content](#0)