<H1 style="text-align: center">EEEM066 - Fundamentals of Machine Learning</H1>
<H1 style="text-align: center">Week 1 (Part 2): Introduction to NumPy</H1>

> Dr. Xiatian (Eddy) Zhu, Dr Syed Sameed Husain

> xiatian.zhu@surrey.ac.uk, sameed.husain@surrey.ac.uk

**Introduction**

Prerequisites: A basic understanding of the Python Programming Language covered in Week 1 Part 1's lab notebook!


The objective of today’s lab is to introduce you to the NumPy and PyTorch libraries.

“**Num**erical **Py**thon" (NumPy) and **PyTorch** are popular scientific computing libraries in Python that
provide support for arrays of numerical data, specifically multidimensional arrays. Although both NumPy
and PyTorch are written in Python, their underlying numerical operations are often implemented in C or
C++ to provide more efficient and convenient ways to perform operations on arrays.


* NumPy is a fundamental package for scientific computing in Python. It provides support for multi-
dimensional arrays and matrices, as well as functions that operate on them. It is commonly used to
preprocess and manipulate data before feeding it into machine learning models. For example, you can use
NumPy to reshape, normalise, or scale datasets, as well as to split datasets into training and testing sets.


* PyTorch is an open-source machine learning library developed by Meta. It is designed to provide flexibility
and speed for building deep learning models. PyTorch provides a range of functions and tools for building
and training neural networks, including automatic differentiation, GPU acceleration, and distributed training.
Both NumPy and PyTorch can be used to create, manipulate, and transform numerical data in ways that are
essential to building and training machine learning models. They are key tools in the modern data scientist’s
toolkit.


Happy Programming!⚡⚡

# Contents


*   [Arrays](https://colab.research.google.com/drive/1OsdZtZbC3hUsyxubwRA_v4YLp_65onqK#scrollTo=TLWP0Mj8rwgX)
*   [Datatypes](https://colab.research.google.com/drive/1OsdZtZbC3hUsyxubwRA_v4YLp_65onqK#scrollTo=m4UGxdyIuQuH)
*   [Array Broadcasting](https://colab.research.google.com/drive/1OsdZtZbC3hUsyxubwRA_v4YLp_65onqK#scrollTo=mz7HbAAkvSmJ)
*   [Shape Manipulation](https://colab.research.google.com/drive/1OsdZtZbC3hUsyxubwRA_v4YLp_65onqK#scrollTo=_-eiN0rt2uMq)
*   [References](https://colab.research.google.com/drive/1OsdZtZbC3hUsyxubwRA_v4YLp_65onqK#scrollTo=Epc2J0BLdyJo)



## Arrays



Store and manipulate numerical data efficiently.
* homogeneous and multidimensional!

All the elements in the array should be of the same data type, and the array can have one or more dimensions.

In [1]:
import numpy as np

# create a 1-dimensional NumPy array
arr1 = np.array([1, 2, 3, 4, 5])
print(arr1)

# create a 2-dimensional NumPy array
arr2 = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print(arr2)


[1 2 3 4 5]
[[1 2 3]
 [4 5 6]
 [7 8 9]]


In [2]:
# access elements of a NumPy array
print(arr1[0])      # prints 1
print(arr2[1, 2])   # prints 6
print(arr2[:, 1])   # prints [2 5 8]


1
6
[2 5 8]


## Datatypes



Integer data types:
* int8, int16, int32, int64: signed integers of 8, 16, 32, or 64 bits respectively
* uint8, uint16, uint32, uint64: unsigned integers of 8, 16, 32, or 64 bits respectively

In [3]:

arr_int8 = np.array([1, 2, 3], dtype=np.int8)
arr_uint16 = np.array([10, 20, 30], dtype=np.uint16)

print(arr_int8.dtype)   # output: int8
print(arr_uint16.dtype) # output: uint16


int8
uint16


Float data types:
* float16, float32, float64: floating-point numbers of 16, 32, or 64 bits respectively
* complex64, complex128: complex numbers represented by 2 floats of 32 or 64 bits respectively

In [4]:
arr_float32 = np.array([1.0, 2.5, 3.7], dtype=np.float32)
arr_complex128 = np.array([1 + 2j, 3 + 4j], dtype=np.complex128)

print(arr_float32.dtype)      # output: float32
print(arr_complex128.dtype)   # output: complex128


float32
complex128


Boolean data types:
* bool: boolean values True or False represented by a byte

In [5]:
# np.bool has been deprecated in NumPy version 1.24.0
!pip install numpy==1.23



In [6]:
import numpy as np
arr_bool = np.array([True, False, True], dtype=np.bool)

print(arr_bool.dtype)    # output: bool


bool


Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  arr_bool = np.array([True, False, True], dtype=np.bool)


String data types:
* string_: fixed-length strings
* unicode_: fixed-length strings of unicode characters

In [7]:
arr_string = np.array(['hello', 'world'], dtype=np.string_)
arr_unicode = np.array(['こんにちは', '안녕하세요'], dtype=np.unicode_)

print(arr_string.dtype)   # output: |S5 (5 bytes per string)
print(arr_unicode.dtype)  # output: <U5 (5 unicode characters per string)


|S5
<U5


## Array Broadcasting

Why?
NumPy tries to perform the operation on arrays of the same shape, but if the shapes are not compatible, it can use the broadcasting rules to make the arrays compatible.

In [8]:
#1. Broadcasting a scalar value to an array

arr = np.array([1, 2, 3])
scalar = 2

# multiply every element of arr by scalar
result = arr * scalar

print(result) # output: [2, 4, 6]

#scalar value 2 is broadcasted to the same shape as the arr array, which is [1, 2, 3]

[2 4 6]


In [9]:
#2. Broadcasting a one-dimensional array to a two-dimensional array

arr1 = np.array([1, 2, 3])
arr2 = np.array([[1, 2, 3], [4, 5, 6]])

# add arr1 to every row of arr2
result = arr1 + arr2

print(result)
# output: [[2, 4, 6], [5, 7, 9]]
# arr1 is broadcasted to the same shape as each row of the arr2 array, which is [1, 2, 3].

[[2 4 6]
 [5 7 9]]


In [10]:
#3. Broadcasting two arrays with different shapes

arr1 = np.array([[1, 2, 3], [4, 5, 6]])
arr2 = np.array([10, 20, 30])

# add arr2 to every column of arr1
result = arr1 + arr2

print(result)
# output: [[11, 22, 33], [14, 25, 36]]
#  arr2 is broadcasted to the same shape as each column of the arr1 array, which is [[10, 20, 30], [10, 20, 30]]

[[11 22 33]
 [14 25 36]]


## Shape Manipulation


1. Reshaping arrays: You can reshape an array to have a different shape using the reshape method.

In [11]:
#reshape a 1D array to a 2D array using the following code:

# create a 1D array with 9 elements
a = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9])

# reshape the array to a 3x3 matrix
b = a.reshape((3, 3))

print(b)


[[1 2 3]
 [4 5 6]
 [7 8 9]]


2. Transposing arrays: You can transpose an array (i.e., switch its rows and columns) using the transpose method or the T attribute.

In [12]:

# create a 2D array
a = np.array([[1, 2], [3, 4]])

# transpose the array
b = a.transpose()

print(b)


[[1 3]
 [2 4]]


3. Flattening arrays: You can flatten a multi-dimensional array into a 1D array using the flatten method.

In [13]:
# create a 2D array
a = np.array([[1, 2], [3, 4]])

# flatten the array
b = a.flatten()

print(b)


[1 2 3 4]


4. Expanding dimensions: You can add dimensions to an array using the newaxis keyword. For example, you can add a new dimension to a 1D array to create a 2D array with one row and multiple columns.

In [14]:
# create a 1D array with 3 elements
a = np.array([1, 2, 3])

# add a new dimension to create a 2D array
b = a[np.newaxis, :]

print(b)


[[1 2 3]]


## References:


*   https://cs231n.github.io/python-numpy-tutorial/#numpy
*   https://numpy.org/doc/stable/user/quickstart.html
*   https://www.w3schools.com/python/numpy/default.asp



