<img src="img/numpy-logo.png">

# NumPy

In this lab, we will learn the basics of the Python NumPy package. NumPy is a Python package for scientific computing. It is a Python library that provides a multidimensional array object and many routines for fast operations.

There are several important differences between NumPy arrays and the standard Python sequences:

1. Performance:
- Speed: NumPy arrays are more efficient than Python lists because they are implemented in C and use contiguous blocks of memory. This allows for faster access and processing.
- Vectorization: NumPy supports vectorized operations, which means that you can apply operations to entire arrays without the need for explicit loops, leading to significant speed improvements.

2. Memory Efficiency:
- Compact Storage: NumPy arrays use a homogeneous data type, which makes them more memory-efficient than Python lists that can contain elements of different types.
- Efficient Data Representation: NumPy arrays store data in contiguous blocks of memory, reducing the overhead associated with dynamic type checking and pointer dereferencing found in Python lists.

3. Functionality and Flexibility:
- Broad Range of Functions: NumPy provides a wide range of mathematical functions, including statistical, linear algebra, and random number generation functions, which are not available with standard Python sequences.
- Advanced Indexing and Slicing: NumPy supports advanced indexing and slicing operations, allowing for more complex data manipulation compared to standard Python sequences.
-   Automatic Expansion (Broadcasting): NumPy allows operations between arrays of different shapes and sizes through broadcasting, where smaller arrays are automatically expanded to match the dimensions of larger arrays. This simplifies code and improves readability.
- Built-in Mathematical Functions: NumPy comes with a plethora of built-in functions for mathematical operations, reducing the need to implement these manually.

4. Interoperability:
- Integration with Other Libraries: NumPy is the foundation of many other scientific computing libraries in Python, such as SciPy, pandas, and scikit-learn. Using NumPy arrays makes it easier to work with these libraries.
- Compatibility with C/C++: NumPy can interface with code written in C or C++, making it suitable for high-performance applications that require integration with these languages.

5. Numerical Precision:
- Precision Control: NumPy allows you to specify the data type of the elements, enabling control over the precision of the numerical computations.
- Avoiding Overflows: NumPy operations are designed to be numerically stable and to avoid common pitfalls such as overflows and underflows.

# Jupyter Notebook

Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations, and narrative text. It is widely used for data cleaning and transformation, numerical simulation, statistical modeling, data visualization and machine learning. A Jupyter Notebook document is called a notebook and consists of cells. Each cell can contain code or Markdown text (like this cell). Code cells can be executed interactively, and the results appear directly below the cell. 

In [1]:
# this is a code cell
print("Hello, World!")  # with print, you can output text (as in classic Python scripts)
3 + 2  # the last line of a cell is evaluated and its result is printed
# it is similar to write IPython.display.display(3 + 2)  
# not exactly the same as print, since depending on its type, it might enrich its display (e.g., HTML, images, Pandas`s DataFrames)

Hello, World!


5

# NumPy's arrays

NumPy’s main object is the homogeneous multidimensional `ndarray`,  also known by the alias `array`. It is a table of elements (usually numbers), all the same type, indexed by a tuple of non-negative integers. 

In NumPy dimensions are called _axes_. For example, the array for the coordinates of a point in 3D space,  `[1,  2,  1]`, has one axis. That axis has 3 elements in it, so we say it has a length of 3. 

Consider the following array:


In [2]:
import numpy as np
my_array = np.array([(1., 0., 0.), (0., 1., 2.)])
print(f"Dimensions of my_array: {my_array.ndim}.")
print(f"Array: {my_array}")
print(f"Type of the array: {type(my_array)}.")

Dimensions of my_array: 2.
Array: [[1. 0. 0.]
 [0. 1. 2.]]
Type of the array: <class 'numpy.ndarray'>.


In the example above, the array has 2 axes (`ndim` attribute). The first axis has a length of 2, the second axis has a length of 3.


In [3]:
print(f"Length of the first dimension: {len(my_array)}.")
print(f"Length of the second dimension: {len(my_array[0])}.")

Length of the first dimension: 2.
Length of the second dimension: 3.


The `shape` attribute returns a tuple of integers with as many elements as dimensions or axes in the array. The length of the  `shape`  tuple is therefore the number of axes,  `ndim`. Each value in the tuple indicates the length in each dimension. For a matrix with  _n_  rows and  _m_  columns,  `shape`  will be  `(n,m)`. `shape` is a very **important** attribute of the NumPy array.

In [4]:
print(f"Shape of the my_array: {my_array.shape}.")

Shape of the my_array: (2, 3).


The `dtype` attribute is an object describing the type of the elements in the array. NumPy provides types of its own, such as `numpy.int32`, `numpy.int16`, `numpy.float64`, `numpy.float32` and `numpy.bool`.

In [5]:
print(f"Type of the elements in the array: {my_array.dtype}.")
my_int_array = my_array.astype(np.int32)
print(my_int_array)

Type of the elements in the array: float64.
[[1 0 0]
 [0 1 2]]


The `size` attribute is the total number of elements of the array. This, `size` is equal to the product of the elements of `shape`.

In [6]:
print(f"Size of the array: {my_array.size}.")

Size of the array: 6.


# Array Creation

There are several ways to create arrays in NumPy. 

You can create an array from a regular Python list or tuple using the `array` function. `array` transforms sequences of sequences into two-dimensional arrays, sequences of sequences of sequences into three-dimensional arrays, and so on.

In [7]:
my_array = np.array([(1.5, 2, 3), (4, 5, 6)])
print(my_array)

[[1.5 2.  3. ]
 [4.  5.  6. ]]


Notice that the type of the elements in the array is not the same as the type of the elements in the list. Integers have been promoted to `float`s. You can specify the type of the elements in the array using the `dtype` argument.

Often, the elements of an array are originally unknown, but its size is known. Hence, NumPy offers several functions to create arrays with initial placeholder content. You can use the methods `zeros`, `ones` and `empty`to create arrays with placeholder content (0, 1 and uninitialized (exiting values on memory)), respectively).

In [8]:
print(np.zeros((3, 2), dtype=np.float32))
print(np.empty((1, 4)))

[[0. 0.]
 [0. 0.]
 [0. 0.]]
[[2.12199579e-314 8.36066342e-312 4.76279283e-321 3.79442416e-321]]


To create sequences of numbers, NumPy provides functions analogous to `range` that returns arrays instead of lists. The `arange` function returns evenly spaced values within a given interval for integer numbers, while the `linspace` function returns evenly spaced values within a given number of elements (the third parameter).

The `reshape` function returns an array containing the same data with a new shape (the original array is not modified).

In [9]:
print(np.arange(10, 30, 5))  # 5 is the step (increment); 30 is not included
print(np.linspace(10, 30, 5))  # 5 is the number of elements; 30 is included
print("\nShowing a 3D array (2,3,4):")
print(np.arange(1, 2*3*4 + 1).reshape(2, 3, 4), end='\n\n')  
print("\nShowing a 3D array with the size inferred (3, -1=2, 4):")
print(np.arange(1, 2*3*4 + 1).reshape(3, -1, 4))  # -1 means that the size is inferred

[10 15 20 25]
[10. 15. 20. 25. 30.]

Showing a 3D array (2,3,4):
[[[ 1  2  3  4]
  [ 5  6  7  8]
  [ 9 10 11 12]]

 [[13 14 15 16]
  [17 18 19 20]
  [21 22 23 24]]]


Showing a 3D array with the size inferred (3, -1=2, 4):
[[[ 1  2  3  4]
  [ 5  6  7  8]]

 [[ 9 10 11 12]
  [13 14 15 16]]

 [[17 18 19 20]
  [21 22 23 24]]]


# Mathematical Operations

Arithmetic operators on arrays apply elementwise. A new array is created and filled with the result (functional style).

In [10]:
a = np.array([20, 30, 40, 50])
b = np.arange(4)
print(f"Subtraction a-b = {a - b}")
print(f"Exponential: b**2 = {b**2}")
print(f"Scalar product and sin function: 10 * sin(a) = {10 * np.sin(a)}")
print(f"Logical operations: a < 35 {a < 35}")

Subtraction a-b = [20 29 38 47]
Exponential: b**2 = [0 1 4 9]
Scalar product and sin function: 10 * sin(a) = [ 9.12945251 -9.88031624  7.4511316  -2.62374854]
Logical operations: a < 35 [ True  True False False]


Unlike in many matrix languages, the product operator `*` operates elementwise in NumPy arrays. The matrix product can be performed using the `@` operator (in python >=3.5) or the `dot` function or method:

In [11]:
a = np.array([[1, 1],
              [0, 1]])
b = np.array([[2, 0],
              [3, 4]])
print(f"Elementwise product a * b = \n{a * b}") 
print(f"Matrix product a @ b = \n{a @ b}") 
print(f"Matrix product a.dot(b) = \n{a.dot(b)}") 

Elementwise product a * b = 
[[2 0]
 [0 4]]
Matrix product a @ b = 
[[5 4]
 [3 4]]
Matrix product a.dot(b) = 
[[5 4]
 [3 4]]


Many unary operations, such `sum`, `min`, `max`, `exp` and `sqrt`, are implemented as methods of the `ndarray` class.

In [12]:
a = np.empty((2, 3))
print(f"Array: \n{a}")
print(f"Sum of all elements: {a.sum()}")
print(f"Minimum of all elements: {a.min()}")
print(f"Maximum of all elements: {a.max()}")

Array: 
[[1.5 2.  3. ]
 [4.  5.  6. ]]
Sum of all elements: 21.5
Minimum of all elements: 1.5
Maximum of all elements: 6.0


By default, these operations apply to the array as though it were a list of numbers, regardless of its shape. However, by specifying the `axis` parameter you can apply an operation along the specified axis of an array. For example, in a given 2D array, `axis=0` means that the operation will be performed along the rows, while `axis=1` means that the operation will be performed along the columns. Thus, the `sum` method with `axis=0` (along the rows) will return an array with the sum of each column. This `axis` parameter is very **important** as available in many NumPy and Pandas functions.

In [13]:
a = np.arange(12).reshape(3, 4)
print(f"Array: \n{a}")
print(f"Sum along the rows (for each column): {a.sum(axis=0)}")
print(f"Minimum along the columns (for each row): {a.min(axis=1)}")

Array: 
[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]
Sum along the rows (for each column): [12 15 18 21]
Minimum along the columns (for each row): [0 4 8]


# Indexing and Slicing

One-dimensional arrays can be indexed and sliced, much like lists and other Python sequences. You can use the `:` and `::` Python operators to select and modify the elements from an array.


In [14]:
a = np.arange(10)**3
print(f"Array: {a}")
print(f"Element at index 2: {a[2]}")
print(f"Elements from index 2 to 4: {a[2:5]}")  # 5 is not included
print(f"Elements from start to the second last: {a[:-1]}")
print(f"Elements from index 0 to 5 with step 2: {a[:6:2]}")  # 6 is not included
print(f"A new copy of the array: {a[:]}") 
print(f"From one, two by two: {a[1::2]}")  # starts from index 1 and goes to the end with step 2 


Array: [  0   1   8  27  64 125 216 343 512 729]
Element at index 2: 8
Elements from index 2 to 4: [ 8 27 64]
Elements from start to the second last: [  0   1   8  27  64 125 216 343 512]
Elements from index 0 to 5 with step 2: [ 0  8 64]
A new copy of the array: [  0   1   8  27  64 125 216 343 512 729]
From one, two by two: [  1  27 125 343 729]


Multidimensional arrays can have one index per axis. These indices are given in a tuple separated by commas:

In [15]:
a = np.arange(3*4).reshape(3, 4)
print(f"Array: \n{a}")
print(f"Element at index (1, 2): {a[1, 2]}")
print(f"Row at index 1: {a[1]}")
print(f"Row at index 2: {a[2, :]}")
print(f"Column at index 1: {a[:, 1]}")
print(f"Subarray (1:3, 2:4): \n{a[1:3, 2:4]}")  # 3 and 4 are not included

Array: 
[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]
Element at index (1, 2): 6
Row at index 1: [4 5 6 7]
Row at index 2: [ 8  9 10 11]
Column at index 1: [1 5 9]
Subarray (1:3, 2:4): 
[[ 6  7]
 [10 11]]


# Iterating Over Arrays

Iterating over multidimensional arrays is done with respect to the first axis:

In [16]:
print(f"Array: \n{a}")
print("Iterating over the array:")
for index, row in enumerate(a):
    print(f"Row {index}: {row}")

Array: 
[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]
Iterating over the array:
Row 0: [0 1 2 3]
Row 1: [4 5 6 7]
Row 2: [ 8  9 10 11]


# Broadcasting

Broadcasting allows NumPy to work with arrays of different shapes when performing arithmetic operations. Frequently, we have a smaller array and a larger array, and we want to use the smaller array multiple times to perform some operation on the larger array.

In [17]:
# example of broadcasting
a = np.array([1, 2, 3])
b = 2
print(f"Array a: {a}")
print(f"Scalar b: {b}")
print(f"Elementwise sum: a + b = {a + b}")

c = np.array(np.arange(3*3).reshape(3, 3))
print(f"Array c: \n{c}")
print(f"Elementwise sum with broadcasting: a + c = \n{a + c}")  # the first dimension of a is expanded to match the first dimension of b

Array a: [1 2 3]
Scalar b: 2
Elementwise sum: a + b = [3 4 5]
Array c: 
[[0 1 2]
 [3 4 5]
 [6 7 8]]
Elementwise sum with broadcasting: a + c = 
[[ 1  3  5]
 [ 4  6  8]
 [ 7  9 11]]


# Indexing with arrays

NumPy offers more indexing facilities than regular Python sequences. In addition to indexing by integers and slices, as we saw before, arrays can be indexed by arrays of integers and arrays of booleans.

In [18]:
a = np.arange(10)**2
print(f"Array: {a}")
indices = np.array([1, 1, 3, 8])
print(f"Elements at the indices {indices}: {a[indices]}")

bool_array = a > 35
print(f"Boolean array: {bool_array}")
print(f"Elements greater than 35: {a[bool_array]}")

Array: [ 0  1  4  9 16 25 36 49 64 81]
Elements at the indices [1 1 3 8]: [ 1  1  9 64]
Boolean array: [False False False False False False  True  True  True  True]
Elements greater than 35: [36 49 64 81]


# ✨ Activity ✨

1. Create a 2D array with shape (3, 4) containing the numbers from square root of 1 to square root of 5. The values of in the middle of the array should be the square root of equally spaced numbers from 1 to 5 (both included).
2. Print the array.
3. Print the sum of the columns.
4. Print the minimum of the elements of the rows.
5. Print the subarray from the second row and the third column to the end.
6. Print the elements greater than 5.
7. Compute the transpose of the array and print it (do not use the T attribute or the transpose method). Then, compare it with the `T` attribute or the `transpose` method.
8. Print the matrix multiplication of the original matrix and its transpose.


In [19]:
# Write your code here

# SOLUTION
# 1. Create a 2D array with shape (3, 4) containing the numbers from square root of 1 to square root of 5. The values of in the middle of the array should be the square root of equally spaced numbers from 1 to 5 (both included).
a = np.linspace(1, 5, 3*4)
print(f"Values before the square root operation: {a}")
a = np.sqrt(a).reshape(3, 4)

# 2. Print the array.
print(f"Array: \n{a}")

# 3. Print the sum of the columns.
print(f"Sum of the columns: {a.sum(axis=0)}")

# 4. Print the minimum of the elements of the rows.
print(f"Minimum of the elements of the rows: {a.min(axis=1)}")

# 5. Print the subarray from the second row and the third column to the end.
print(f"Subarray from the second row and the second column to the end: \n{a[1:, 2:]}")

# 6. Print the elements greater than 5.
print(f"Elements greater than 5: {a[a > 5]}")

# 7. Compute the transpose of the array and print it (do not use the T attribute or the transpose method). Then, compare it with the `T` attribute or the `transpose` method.
transpose = np.zeros((a.shape[1], a.shape[0]))
for row in range(a.shape[0]):
    transpose[:, row] = a[row, :]
print(f"Transpose of the array: \n{transpose}")
print(f"Transpose of the array using the T attribute: \n{a.T}")

# 8. Print the matrix multiplication of the original matrix and its  transpose.
print(f"Matrix multiplication of the original matrix and its transpose: \n{a @ transpose}")


Values before the square root operation: [1.         1.36363636 1.72727273 2.09090909 2.45454545 2.81818182
 3.18181818 3.54545455 3.90909091 4.27272727 4.63636364 5.        ]
Array: 
[[1.         1.16774842 1.31425748 1.44599761]
 [1.5666989  1.67874412 1.78376517 1.88293774]
 [1.97714211 2.06705764 2.15322169 2.23606798]]
Sum of the columns: [4.54384101 4.91355017 5.25124434 5.56500333]
Minimum of the elements of the rows: [1.         1.5666989  1.97714211]
Subarray from the second row and the second column to the end: 
[[1.78376517 1.88293774]
 [2.15322169 2.23606798]]
Elements greater than 5: []
Transpose of the array: 
[[1.         1.5666989  1.97714211]
 [1.16774842 1.67874412 2.06705764]
 [1.31425748 1.78376517 2.15322169]
 [1.44599761 1.88293774 2.23606798]]
Transpose of the array using the T attribute: 
[[1.         1.5666989  1.97714211]
 [1.16774842 1.67874412 2.06705764]
 [1.31425748 1.78376517 2.15322169]
 [1.44599761 1.88293774 2.23606798]]
Matrix multiplication of the or

In [20]:
print([1, 2, 3])
print([[1, 2, 3],
       [4, 5, 6]])
print([[[1, 2, 3, 4],
       [5, 6, 7, 8],
        [9, 10, 11, 12]],
       
         [[13, 14, 15, 16],
         [17, 18, 19, 20]]])


[1, 2, 3]
[[1, 2, 3], [4, 5, 6]]
[[[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]], [[13, 14, 15, 16], [17, 18, 19, 20]]]
