# Intro to numpy
NumPy, short for Numerical Python, is a fundamental Python library for numerical computing. It provides support for arrays, matrices, and a wide range of mathematical functions to operate on these data structures efficiently. NumPy is widely used in data science, machine learning, and scientific computing due to its performance and ease of use.

What all is covered below?
1. Creating arrays
	1. List vs NumPy array
	1. Arrays with different data types
	1. Matrices and higher-dimensional arrays
1. Basic operations
    1. Transpose
    1. Indexing
	1. Slicing
    1. Reshaping
    1. Stacking and concatenation
1. Broadcasting

In [1]:
import numpy as np
import torch
import matplotlib.pyplot as plt

## Data Structures: list vs array

In [2]:
my_list = [4, 5, 2, 6, 8]
print("List:", my_list, "with type:", type(my_list))
my_array = np.array(my_list)
print("Array:", my_array, "with type:", type(my_array))

List: [4, 5, 2, 6, 8] with type: <class 'list'>
Array: [4 5 2 6 8] with type: <class 'numpy.ndarray'>


## Data Types: int vs float32

In [3]:
my_list = [4, 5, 2, 6, 8]
print("List:", my_list)
my_array = np.array(my_list, dtype='float32')
print("Array with float dtype:", my_array)
print("\nShape of array:", my_array.shape)
print("Size of array:", my_array.size)
print("Dtype of array:", my_array.dtype)

List: [4, 5, 2, 6, 8]
Array with float dtype: [4. 5. 2. 6. 8.]

Shape of array: (5,)
Size of array: 5
Dtype of array: float32


## Matrix / 2D arrays

In [4]:
list_of_lists = [
	[1, 2, 3],
	[4, 5, 6],
	[7, 8, 9]
]
two_d_array = np.array(list_of_lists)
print(two_d_array)
print("\nShape of array:", two_d_array.shape)
print("Size of array:", my_array.size)
print("Dtype of array:", my_array.dtype)

[[1 2 3]
 [4 5 6]
 [7 8 9]]

Shape of array: (3, 3)
Size of array: 5
Dtype of array: float32


### Common numpy functions to auto create matrices

1. np.zeros() - creates a matrix of given shape filled with zeros
1. np.ones() - creates a matrix of given shape filled with ones
1. np.full() - creates a matrix of given shape filled with a specified value
1. np.linspace() - creates an array of evenly spaced values over a specified range
1. np.random.random() - creates an array of given shape filled with random values between 0 and 1
1. np.eye() - creates an identity matrix of given shape

In [5]:
zero_matrix = np.zeros((3,3), dtype='float32')
print("Zeros matrix: \n", zero_matrix)

ones_matrix = np.ones((3,3), dtype='float32')
print("\nOnes matrix: \n", ones_matrix)

full_matrix = np.full((3,3), fill_value=118927, dtype='float32')
print("\nOnes matrix: \n", full_matrix)

between_0_1 = np.linspace(0, 1, 20)
print("\nRandom values between 0 and 1: \n", between_0_1)

randoms_array = np.random.random((5,5))
print("\nCompletely random array: \n", randoms_array)

Zeros matrix: 
 [[0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]]

Ones matrix: 
 [[1. 1. 1.]
 [1. 1. 1.]
 [1. 1. 1.]]

Ones matrix: 
 [[118927. 118927. 118927.]
 [118927. 118927. 118927.]
 [118927. 118927. 118927.]]

Random values between 0 and 1: 
 [0.         0.05263158 0.10526316 0.15789474 0.21052632 0.26315789
 0.31578947 0.36842105 0.42105263 0.47368421 0.52631579 0.57894737
 0.63157895 0.68421053 0.73684211 0.78947368 0.84210526 0.89473684
 0.94736842 1.        ]

Completely random array: 
 [[6.64989812e-01 1.05290255e-01 2.91311251e-01 6.98002282e-02
  6.09994046e-01]
 [2.95909982e-01 8.96106457e-01 6.65940237e-02 8.69614713e-01
  6.37000041e-01]
 [9.31157273e-02 5.13062771e-01 5.91930423e-01 7.95268882e-01
  3.55647716e-02]
 [1.67976703e-01 8.40323692e-04 6.71008847e-01 2.38519667e-02
  5.75349356e-01]
 [5.63374350e-01 5.99448214e-01 8.47815309e-01 3.26644371e-01
  8.67168628e-01]]


### Identity matrix
Identity matrix is a square matrix with ones on the diagonal and zeros elsewhere. It acts as the multiplicative identity in matrix multiplication, meaning that any matrix multiplied by the identity matrix remains unchanged.

In machine learning, identity matrices are often used in various contexts, such as initializing weights in neural networks, performing transformations, and serving as a basis for certain algorithms. They help maintain the integrity of data during operations and are essential in linear algebra computations that underpin many machine learning techniques.

In [6]:
identity_matrix = np.eye(3,3)
print("3x3 identity matrix:\n", identity_matrix)

3x3 identity matrix:
 [[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]


## Common matrix operations

### Transpose
Transposing a matrix involves flipping it over its diagonal, which means converting its rows into columns and vice versa. In NumPy, you can transpose a matrix using the `.T` attribute or the `np.transpose()` function.

In machine learning, transposing matrices is often necessary when aligning data for operations such as dot products, which are fundamental in algorithms like linear regression and neural networks. Transposition helps ensure that the dimensions of matrices are compatible for multiplication, enabling the correct computation of outputs and gradients during training processes.

In [7]:
arr = np.array([
	[0, 1, 2, 3, 4, 5, 6],
	[6, 5, 4, 3, 2, 1, 0]
], dtype='float32')
print("Shape of arr:", arr.shape)
print("Arr: \n", arr)

transposed_arr = arr.T
print("\nShape of arr:", arr.shape)
print("Arr: \n", transposed_arr)

Shape of arr: (2, 7)
Arr: 
 [[0. 1. 2. 3. 4. 5. 6.]
 [6. 5. 4. 3. 2. 1. 0.]]

Shape of arr: (2, 7)
Arr: 
 [[0. 6.]
 [1. 5.]
 [2. 4.]
 [3. 3.]
 [4. 2.]
 [5. 1.]
 [6. 0.]]


### Indexing
Indexing in NumPy allows you to access and manipulate individual elements or groups of elements within an array or matrix using their specific positions (indices). You can use integer indices, slices, or boolean arrays to select data.

In machine learning, indexing is crucial for tasks such as selecting features, extracting samples, and manipulating datasets. It enables efficient data handling and preprocessing, which are essential steps in building and training machine learning models.

In [8]:
print("First value:", arr[0][0])
print("Middle value:", arr[0][3], "and", arr[1][3])
print("Last value:", arr[-1][-1])

First value: 0.0
Middle value: 3.0 and 3.0
Last value: 0.0


### Slicing
Slicing in NumPy allows you to extract specific portions of an array or matrix by specifying ranges of indices. This is done using the colon (`:`) operator, which can define start, stop, and step values for each dimension of the array.

In machine learning, slicing is frequently used to manipulate datasets, such as selecting features, creating training and testing sets, or extracting specific samples. It enables efficient data handling and preprocessing, which are crucial steps in building and training machine learning models.

In [9]:
print("First 4 elemets of first row:", arr[0, :4])
print("All elements after 3 for first row:", arr[0, 3:])
print("Even indices for first row:", arr[0, ::2])
print("Uneven indices:", arr[0, 1::2])
print("Reversed first row:", arr[0, ::-1])
print("Upper square of matrix:\n", arr[:2, :2])

First 4 elemets of first row: [0. 1. 2. 3.]
All elements after 3 for first row: [3. 4. 5. 6.]
Even indices for first row: [0. 2. 4. 6.]
Uneven indices: [1. 3. 5.]
Reversed first row: [6. 5. 4. 3. 2. 1. 0.]
Upper square of matrix:
 [[0. 1.]
 [6. 5.]]


### Reshaping arrays
Reshaping in NumPy allows you to change the shape of an array without altering its data. This is done using the `np.reshape()` function, where you specify the new shape as a tuple.

In machine learning, reshaping is often necessary to prepare data for model input, especially when dealing with images or multi-dimensional data. It helps ensure that the data conforms to the expected input dimensions of algorithms and neural networks, facilitating proper training and evaluation.

In [10]:
reshape_arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9])
print("Original array:\n", reshape_arr)
reshape_arr = np.reshape(reshape_arr, (3, 3))
print("Reshaped array:\n", reshape_arr)

Original array:
 [1 2 3 4 5 6 7 8 9]
Reshaped array:
 [[1 2 3]
 [4 5 6]
 [7 8 9]]


### Creating new axes for arrays
Creating new axes in NumPy involves adding additional dimensions to an array, which can be done using functions like `np.newaxis` or `np.expand_dims()`. This is useful for aligning data shapes for operations such as broadcasting or when preparing data for machine learning models that expect inputs of specific dimensions.

In machine learning, creating new axes is often necessary when dealing with datasets that require specific input shapes, such as images or time series data. It helps ensure that the data conforms to the expected input format of algorithms and neural networks, enabling proper training and evaluation.

In [11]:
axis_arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9])
print("Shape of array:", axis_arr.shape)
print("Array:", axis_arr)

while True:
	which_axis = input("Add axis before or after? (b/a) \n[To EXIT press any other key]: ")

	if which_axis.lower() == 'b':
		axis_arr = axis_arr[np.newaxis, :]
		print("Shape of array after adding new axis:\n", axis_arr.shape)
		print("Array after adding new axis:", axis_arr)
	elif which_axis.lower() == 'a':
		axis_arr = axis_arr[:, np.newaxis]
		print("Shape of array after adding new axis:\n", axis_arr.shape)
		print("Array after adding new axis:", axis_arr)
	else:
		break

Shape of array: (9,)
Array: [1 2 3 4 5 6 7 8 9]
Shape of array after adding new axis:
 (9, 1)
Array after adding new axis: [[1]
 [2]
 [3]
 [4]
 [5]
 [6]
 [7]
 [8]
 [9]]
Shape of array after adding new axis:
 (1, 9, 1)
Array after adding new axis: [[[1]
  [2]
  [3]
  [4]
  [5]
  [6]
  [7]
  [8]
  [9]]]


### Concatenation
Concatenation in NumPy refers to the process of joining two or more arrays along a specified axis. This can be done using functions like `np.concatenate()`, `np.vstack()`, and `np.hstack()`, depending on whether you want to concatenate along rows or columns.

In machine learning, concatenation is often used to combine datasets, features, or model outputs. It allows for the integration of different sources of data or the merging of results from various models, facilitating more comprehensive analyses and improved model performance.

In [12]:
conc_a = np.array([1, 2, 3])
conc_b = np.array([4, 5, 6])
conc_c = np.array([7, 8, 9])

conc_res_1 = np.concatenate((conc_a, conc_b), axis=0)
print("Simple concatenated array:", conc_res_1)

vstack_arr = np.vstack((conc_a, conc_b, conc_c))
print("\nVStacked array:\n", vstack_arr)

vstack_arr = np.hstack((conc_a, conc_b, conc_c))
print("\nHStacked array:\n", vstack_arr)

Simple concatenated array: [1 2 3 4 5 6]

VStacked array:
 [[1 2 3]
 [4 5 6]
 [7 8 9]]

HStacked array:
 [1 2 3 4 5 6 7 8 9]


## Broadcasting
Broadcasting in NumPy is a powerful mechanism that allows operations to be performed on arrays of different shapes and sizes without the need for explicit replication of data. It works by "stretching" the smaller array across the larger one so that they have compatible shapes for element-wise operations.

In machine learning, broadcasting is particularly useful for efficiently handling operations on datasets with varying dimensions, such as applying weights to features or normalizing data. It simplifies code and improves performance by eliminating the need for manual reshaping or replication of arrays, making it easier to implement algorithms and perform computations on large datasets.

In [13]:
m = np.array(
    [[1., 1., 1.],
     [1., 1., 1.],
     [1., 1., 1.]]
)
v = np.array([0, 1, 2])
print("Matrix:\n", m)
print("Vector:\n", v)
print("Broadcasting result:\n", m + v)

Matrix:
 [[1. 1. 1.]
 [1. 1. 1.]
 [1. 1. 1.]]
Vector:
 [0 1 2]
Broadcasting result:
 [[1. 2. 3.]
 [1. 2. 3.]
 [1. 2. 3.]]
