# Array-Oriented Programming with NumPy - Part 1


<table align="left">
  <td>
    <a href="https://colab.research.google.com/github/phonchi/nsysu-math106A/blob/master/static_files/presentations/09-1_Basic_NumPy.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>
  </td>
  <td>
    <a target="_blank" href="https://kaggle.com/kernels/welcome?src=https://github.com/phonchi/nsysu-math106A/blob/master/static_files/presentations/09-1_Basic_NumPy.ipynb"><img src="https://kaggle.com/static/images/open-in-kaggle.svg" /></a>
  </td>
</table>

Execute the following two cells for the setup:

In [None]:
!pip install jupyterquiz 
!pip install jupytercards

from IPython.display import display, Javascript
display(Javascript('Jupyter.notebook.kernel.restart()'))

In [None]:
from jupyterquiz import display_quiz

path="https://raw.githubusercontent.com/phonchi/nsysu-math106A/refs/heads/main/extra/questions/ch9/"

1. Introduction

2. Creating an array using different approaches (Constructors)

3. Indexing and slicing (Getter and Setter)

4. NumPy calculation methods (Reduction)

## 1. Introduction

The `NumPy` (Numerical Python) library is the favored Python array implementation. It provides a high-performance, feature-rich $n$-dimensional array type called `array`. Array operations are typically one or two orders of magnitude faster than those on `lists`. 

Although the built-in `lists` can also possess multiple dimensions and be processed using nested loops. A key advantage of NumPy is "array-oriented programming," which employs ***functional-style programming*** and ***internal iteration*** to make array manipulation concise and straightforward, reducing the likelihood of bugs that can arise from explicitly programmed loops.

<img src="https://raw.githubusercontent.com/phonchi/nsysu-math106A/refs/heads/main/extra/Figures/img1.png" width="60%" height="60%" style="margin-left:auto; margin-right:auto">

In `Python`, we don't have to declare types or handle memory by hand. Every variable holds more than just the value itself— they also include additional information about the value's <u>type</u> and <u>size</u>:

<img src="https://raw.githubusercontent.com/phonchi/nsysu-math106A/refs/heads/main/extra/Figures/img2.jpg" width="60%" height="60%" style="margin-left:auto; margin-right:auto">

Likewise, a `Python` `list` is very flexible: it can hold `objects` of many different types. But that flexibility comes at a price — because the interpreter has to know what each element is, every item carries its own notes about type, size, and other details.

When all elements happen to share the same type, most of that extra data is just repeated over and over! A fixed‑type `NumPy` `array` avoids this overhead by recording the type only once and storing all the raw values in one tightly packed block of memory, making it far more efficient than a dynamic‑type `list` for large, uniform data.

<center><img src="https://raw.githubusercontent.com/phonchi/nsysu-math106A/refs/heads/main/extra/Figures/img3.jpg" width="60%" height="60%"></center>

From the figure, we can see that at the implementation level, the `array` primarily consists of a single pointer to a contiguous data block. In contrast, the `Python` `list` features a pointer to a block of pointers, each of which points to a `Python` object, such as a `Python` `integer`.

All in all, the primary benefit of the `list` is its flexibility. Since each `list` element is a comprehensive structure containing data and type information, the `list` can accommodate data of any type. While fixed-type `NumPy` `arrays` do not offer this level of adaptability

- They are significantly more efficient for storing and manipulating data. 
- In addition, we know that every object consists of <u>data</u> and <u>methods</u>. The `array` object of the `NumPy` package not only provides efficient storage of array-based data but adds to this efficient operation on that data. 

In the first step, we need to install `NumPy` as follows:

In [None]:
package_name = "numpy"

try:
    __import__(package_name)
    print(f"{package_name} is already installed.")
except ImportError:
    print(f"{package_name} not found. Installing...")
    %pip install {package_name}

The official `NumPy` documentation recommends importing the `numpy` <u>module</u> as `np` so that we can access its methods with `np.`:

In [None]:
import numpy as np

In [None]:
display_quiz(path+"list_array.json", max_width=800)

## 2. Creating  `array` using different approaches (Constructors)

### 2.1 Creating  `array` from fix sequence

The `numpy` module offers numerous <u>functions</u> to create arrays. In this case, we employ the `array()` function, which accepts a sequence of elements and returns a new `array` containing the input elements. For instance, let's pass a `list`:

In [None]:
import numpy as np
numbers = np.array([2, 3, 5, 7, 11])
print(numbers)
numbers, type(numbers)

The `array()` function copies its <u>argument</u>'s contents into the `array`. Note that the type is `numpy.ndarray` and all the output will prefix the data with the <u>keyword</u> `array`.

The `array()` function copies its argument's dimensions. Let's create an `array` from a two-row-by-three-column nested `list`:

In [None]:
np.array([[1, 2, 3], [4, 5, 6]]), type(np.array([[1, 2, 3], [4, 5, 6]]))

A 2D array is a sequence of 1D arrays that represent each row.

####  `array`  Attributes 

The `array` function determines an array's element type from its argument's elements. We can check the element type with an array's `dtype` <u>attribute</u>:

In [None]:
integers = np.array([[1, 2, 3], [4, 5, 6]])
floats = np.array([0.0, 0.1, 0.2, 0.3, 0.4])

integers.dtype, floats.dtype

In the upcoming section, we will notice that several array-creation functions include a `dtype` keyword argument, allowing us to define an array's element type.

> For speed, **NumPy** is implemented in C and adopts C‑level data types. By default, an integer array uses the `int_` dtype, which maps to C's `long` — 32‑bit on a 32‑bit system and 64‑bit on a 64‑bit system. Floating‑point arrays default to `float64`, identical to C’s 64‑bit IEEE *double*.

> In practice you will most often see the dtypes `int64`, `float64`, and `bool`. The complete list is in the NumPy documentation: <https://docs.scipy.org/doc/numpy/user/basics.types.html>.

The attribute `ndim` contains an array's number of dimensions and the attribute `shape` contains a `tuple` specifying an array's dimensions: 

In [None]:
print(integers.ndim)
print(floats.ndim)

In [None]:
print(integers.shape)
print(floats.shape)

Here, integers have 2 rows and 3 columns (6 elements) and floats are one-dimensional, containing 5 floating numbers.

We can view an array's total number of elements with the attribute `size` and the number of bytes required to store each element with `itemsize`:

In [None]:
print(integers.size)
print(integers.itemsize)
print(floats.size)
print(floats.itemsize)

Note that the `size` of the integers is the result of multiplying the values in the `tuple` — two rows with three elements each, totaling six elements. In each instance, `itemsize` is 4 and 8 because integers comprise `int32` values, and as floats consist of `float64` values.

### 2.2 Filling `array` with specific values

`NumPy` offers the functions `zeros()`, `ones()`, and `full()` for creating arrays filled with 0s, 1s, or a specified value, respectively. By default, `zeros()` and `ones()` generate arrays containing `float64` values. We will demonstrate how to customize the element type shortly. The first argument for these functions should be either an `integer` or a `tuple` of integers defining the desired dimensions. When given an integer, each function returns a one-dimensional array containing the specified number of elements:

In [None]:
np.zeros(5)

When provided with a `tuple` of integers, these functions return a multidimensional array featuring the specified dimensions. We can define the array's element type using the `dtype` keyword argument for the `zeros()` and `ones()` functions:

In [None]:
np.ones((2, 4), dtype=np.int32)

The `array` returned by `full()` contains elements with the second argument's value and type: 

In [None]:
np.full((3, 5), 13+2j), np.full((3, 5), 13+2j).dtype

### 2.3 Creating `array` from sequence generated by different methods

#### Creating sequence with fix step by `arange()` 

We can employ `NumPy`'s `arange()` function to create integer ranges, similar to using the built-in `range()` function. The first two arguments of the function determine the starting and ending values of the range, with the ending value excluded from the array. The optional third argument represents the step size which has a default value of 1:

In [None]:
np.arange(5)

In [None]:
np.arange(5, 10)

In [None]:
np.arange(10, 1, -2) 

Note that it is the same as `range()`, which takes three arguments `numpy.arange(start, stop, step)` and the first and third arguments can be omitted.

#### Creating sequence with fix sample number by `linspace()`

Additionally, we can generate evenly spaced floating-point ranges using `NumPy`'s `linspace()` function. The first two arguments of the function determine the starting and ending values of the range, with the ending value **included** in the `array`. The optional keyword argument `num` designates the number of evenly spaced values to create:

In [None]:
np.linspace(0.0, 1.0, num=5)

#### Reshaping an `array` 

We can also first create an `array` using the previous methods and then utilize the `array` method `reshape()` to convert the one-dimensional array into a multidimensional array. Let's generate an array containing values from 1 to 20 and then reshape it into a matrix with four rows and five columns:

In [None]:
np.arange(1, 21).reshape(4, 5)

Note the ***chained method*** calls in the previous example. Initially, `arange()` generates an array containing values 1 to 20. Then, we invoke `reshape()` on that array to obtain the displayed 4-by-5 array. We can `reshape()` any array as long as the new shape contains the same number of elements as the original. Thus, a six-element one-dimensional array can be transformed into a 3-by-2 or 2-by-3 array, and vice versa!

In [None]:
display_quiz(path+"constructors.json", max_width=850)

### Example 1: `List` vs. `array`  performance: Introducing  `%%timeit`  

Most `array` operations execute significantly faster than corresponding `list` operations. To demonstrate, we'll use the `%%timeit` magic command, which benchmarks the average duration of operations. 

In [None]:
import random

Here, let's use the `random` module's `randint()` function with a list comprehension to create a list of six million die rolls and time the operation using `%%timeit`:

In [None]:
%%timeit
rolls_list = [random.randint(1, 6) for i in range(0, 6_000_000)] #_ is use to separate long integer

> By default, `%%timeit` executes a statement in a loop, and it runs the loop seven times. If we do not indicate the number of loops, [`%%timeit`](https://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-timeit) chooses an appropriate value.

Now, let's use the `randint()` function from the `numpy.random` module to create an array

In [None]:
%%timeit 
rolls_array = np.random.randint(1, 7, 6_000_000)

## 3. Indexing and slicing (Getter and Setter)

One-dimensional `arrays` can be *indexed* and *sliced* using the same syntax and techniques applied when handling other sequence data types, such as built-in `lists` or `tuples`.

In [None]:
a = np.array([1,2,3,4,5])
a[0], a[1:3]

To select an element in a two-dimensional array, specify two indices containing the element's row and column indices in square brackets:

In [None]:
grades = np.array([[87, 96, 70], [60, 87, 90],
                   [94, 77, 92], [100, 81, 82]])
grades

In [None]:
grades[0, 1]  # row 0, column 1

To select a single row, we can specify only one index in square brackets:

In [None]:
grades, grades[1]

To select multiple sequential rows, use slice notation:

In [None]:
grades[0:2]

To select multiple non-sequential rows, use a list of row indices which is called ***fancy indexing***:

In [None]:
grades[[1, 3]]

Let's select only the elements in the first column: 

In [None]:
grades, grades[:, 0]

The 0 after the comma signifies that we are selecting only column 0. The `:` before the comma indicates which rows within that column to choose. In this instance, `:` is a slice representing all rows. We can also select consecutive columns using a slice:

In [None]:
grades[:, 1:3]

or specific columns with fancy indexing using a list of column indices:

In [None]:
grades, grades[:, [0, 2]]

`array` is <u>mutable</u>. Therefore, if we want to modify the value of the array, we can use the previous method and put the result on the left-hand side: 

In [None]:
print(grades)
grades[3, 2] = 42
grades

### Views: Shallow copies

***Views*** are objects that see the data in other objects, instead of having their own copies of the data. Views are also referred to as ***shallow copies***. Several `array` methods and slicing operations generate views of an `array`'s data. The `array` method `view()` returns a new `array` object with a view of the original `array` object's data. First, let's create an `array` and a view of that `array`:

In [None]:
numbers = np.arange(1, 6)
numbers2 = numbers.view()

We can use the built-in `id()` function to verify that `numbers` and `numbers2` are different objects:

In [None]:
id(numbers), id(numbers2)

`NumPy` also has a handy function called `shares_memory()` that can be utilized in this scenario:

In [None]:
np.shares_memory(numbers, numbers2)

To prove that `numbers2` views the same data as `numbers`, let's modify an element in `numbers`, then display both arrays:

In [None]:
numbers[1] *= 10
numbers

In [None]:
numbers2

Similarly, changing a value in the view also changes that value in the original array:

In [None]:
numbers2[1] /= 5
numbers, numbers2

Slices also create views. Let's make `numbers2` a slice that views only the first three elements of numbers:

In [None]:
numbers2 = numbers[0:3]
numbers2

Now, let's modify an element both arrays share, then display them. Again, we see that `numbers2` is a view of `numbers`:

In [None]:
numbers[1] *= 20
numbers

In [None]:
numbers2

Note that this behavior is different from `list`, where the slicing will create a new sub `list`! 

### Deep Copies

While views are distinct `array` objects, they save memory by sharing element data with other `arrays`. Nonetheless, when dealing with mutable values, it is occasionally essential to create a ***deep copy*** containing independent copies of the original data.

> This is particularly crucial in multi-core programming, where different components of our program may try to modify our data simultaneously, potentially leading to data corruption.

The `array` method `copy()` returns a new `array` object with a deep copy of the original `array` object's data. First, let's create an `array` and a deep copy of that `array`:

In [None]:
numbers = np.arange(1, 6)
numbers2 = numbers.copy()

To prove that `numbers2` has a separate copy of the data in `numbers`, let's modify an element in `numbers`, then display both arrays: 

In [None]:
numbers[1] *= 5
numbers

In [None]:
numbers2

>  Recall that if we need deep copies of other types of `Python` objects, just pass them to the `copy` module’s `deepcopy()` function. 

In [None]:
display_quiz(path+"view_copy.json", max_width=850)

### More about Reshaping and Transposing 

We've used `array` method `reshape()` to produce two-dimensional arrays from one-dimensional ranges. `NumPy` provides various other ways to reshape arrays.

Both the `reshape()` and `resize()` array methods allow us to alter an array's dimensions. The `reshape()` method returns a view (shallow copy) of the original array with updated dimensions, leaving the original array unaltered:

In [None]:
grades = np.array([[87, 96, 70], [99, 87, 90]])
grades

In [None]:
grades2 = grades.reshape(1, 6)

In [None]:
grades2[0, 0] = 0
grades2, grades

A widely used technique involves using `-1` to specify the shape in `reshape()`. The length of the dimension set to `-1` is automatically deduced based on the specified values of other dimensions:

In [None]:
grades, grades.reshape(-1, 3) # Same as grades.reshape(2, 3)

Method `resize()`, on the other hand, modifies the original `array`'s shape <u>in-place</u>:

In [None]:
grades.resize(1, 6)
grades

We can also do the opposite operation, which takes a multidimensional array and flatten it into a single dimension with the methods `flatten()`. Method `flatten()` deep copies the original array's data:

In [None]:
grades = np.array([[87, 96, 70], [99, 87, 90]])
grades

In [None]:
flattened = grades.flatten()
flattened

In [None]:
flattened[0] = 100
grades # Original array does not change

Additionally, we can transpose an `array`'s rows and columns, the `T` attribute returns a transposed view of the array. 

Assume that the original `grades` `array` presents two students' grades (the rows) across three exams (the columns). Let's transpose the rows and columns to examine the data as the grades for three exams (the rows) taken by two students (the columns):

In [None]:
grades.T

Transposing does not modify the original array:

In [None]:
grades

Finally, we can combine `arrays` by adding more columns or more rows — known as horizontal stacking and vertical stacking. Let's first create another 2-by-3 `array` of grades:

In [None]:
grades2 = np.array([[94, 77, 90], [100, 81, 82]])
grades2

Suppose `grades2` represents three more exam grades for the two students in the `grades` array. We can merge `grades` and `grades2` using `NumPy`'s `hstack()` (horizontal stack) function by passing a `tuple` containing the arrays to combine. The extra parentheses are necessary because `hstack()` expects a single argument:

In [None]:
np.hstack((grades, grades2))

Moving forward, let's suppose that `grades2` represents the grades of two additional students on three exams. In this scenario, we can combine `grades` and `grades2` using `NumPy`'s `vstack()` (vertical stack) function:

In [None]:
np.vstack((grades, grades2))

### Exercise 1: Suppose we are developing a chess game and the chess game provide two special checkerboards as follows:

<center><img src="https://raw.githubusercontent.com/phonchi/nsysu-math106A/refs/heads/main/extra/Figures/chb1.png" width="20%" height="20%"></center>

<center><img src="https://raw.githubusercontent.com/phonchi/nsysu-math106A/refs/heads/main/extra/Figures/chb2.png" width="40%" height="40%"></center>

We decide to use 1 to represent the white square and 0 to represent the black square. Write a program to create two 2D arrays to represent the two checkerboards as follows:

```python
[[1, 0, 1, 0, 1, 0],
 [0, 1, 0, 1, 0, 1],
 [1, 0, 1, 0, 1, 0],
 [0, 1, 0, 1, 0, 1],
 [1, 0, 1, 0, 1, 0],
 [0, 1, 0, 1, 0, 1]]
```

```python
[[1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1],
 [0, 1, 0, 1, 0, 1, 1, 0, 1, 0, 1, 0],
 [1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1]]
```

Note you should not directly hardcode the above arrays. You should use `Numpy` methods to create the arrays. After you have finished the exercise, you can print out the checkerboard using the following code cell.

In [None]:
# Your answer here
chb1 = np.ones((___,___), dtype=int)
chb1[___, ___] = 0
chb1[___, ___] = 0
chb1

In [None]:
# Your answer here
chb2 = np.____((chb1[__,___], chb1[___,___]))
chb2

In [None]:
# Plot the checkerboard
package_name = "matplotlib"

try:
    __import__(package_name)
    print(f"{package_name} is already installed.")
except ImportError:
    print(f"{package_name} not found. Installing...")
    %pip install {package_name}

import matplotlib.pyplot as plt
plt.imshow(chb1, cmap='gray')
plt.show()

## 4. `NumPy` calculation methods (Reduction)

An `array` includes several methods that carry out computations based on its contents. **By default, these methods disregard the array's shape and utilize all the elements in the calculations.** 

For instance, when computing the mean of an array, it sums all of its elements irrespective of its shape, and then divides by the total number of elements. **We can also execute these calculations on each dimension.** For example, in a two-dimensional array, we can determine the mean of each row and each column.

In [None]:
grades = np.array([[87, 96, 70], [100, 87, 90],
                    [94, 77, 90], [100, 81, 82]])
grades

We can use methods to calculate `sum()`, `min()`, `max()`, `mean()`, `std()` (standard deviation) and `var()` (variance) — each is a functional-style programming reduction:

In [None]:
print(grades.sum())
print(grades.min())
print(grades.max())
print(grades.mean())
print(grades.std())
print(grades.var())

### Calculations by Row or Column

Numerous calculation methods can be applied to specific `array` dimensions, referred to as the `array`'s ***axes***. These methods accept an `axis` keyword argument that designates the dimension to be utilized in the calculation, providing a convenient means to perform computations by row or column in a two-dimensional `array`.

Suppose we want to find the maximum grade for each exam, represented by the columns of `grades`. By specifying `axis=0`, the calculation is performed on all the row values within each column:

In [None]:
grades, grades.max(axis=0), grades.argmax(axis=0)

Here, 100 is the maximum value in the first column and its corresponding index (row) is 1 (if there are duplicate elements, the index of the first element will be reported). 96 and 90 are the maximum values in the second and third columns, respectively.

In [None]:
grades, grades.mean(axis=0)

Hence, 95.25 above represents the average of the first column's grades (87, 100, 94, and 100), 85.25 is the average of the second column's grades (96, 87, 77, and 81), and 83 is the average of the third column's grades (70, 90, 90, and 82).

Similarly, specifying `axis=1` performs the calculation on all the column values within each individual row. To determine each student's average grade for all exams, we can use:

In [None]:
grades.mean(axis=1)

This generates four averages — one for the values in each row. Therefore, 84.33333333 is the average of row 0's grades (87, 96, and 70), and the other averages correspond to the remaining rows. For more methods, refer to [https://numpy.org/doc/stable/reference/arrays.ndarray.html](https://numpy.org/doc/stable/reference/arrays.ndarray.html).

<center><img src="https://raw.githubusercontent.com/phonchi/nsysu-math106A/refs/heads/main/extra/Figures/img4.jpg" width="50%" height="50%"></center>

> For more operations such as methods related to linear algebra, we can use the sub-module `numpy.linalg`, which implements basic linear algebra, such as solving linear systems, singular value decomposition, etc. However, it is not guaranteed to be compiled using efficient routines, and thus we recommend the use of `scipy.linalg`, which will introduce in a later chapter.

### Exercise2: Find the maximum and minimum values of the function $f(x) = x^2$ on the interval $[-3, 5]$ by substituting 1000 evenly spaced numbers between $-3$ and $5$ into the function. What is the corresponding $x$ value for the maximum and minimum values and how do they compare with the actual values?

Hint: You may find `np.linspace()`, `np.max()/np.min()` and `np.argmax()/np.argmin()` useful.

In [None]:
# Your answer here
N = 1000   # Number of points to sample in the interval
x = np._____(___,___, num=N)  # Create 1000 evenly spaced values from -3 to 5 (inclusive)
y = ____  # # Compute y = x² for every x in the array

y_max = np.___(y) # Largest value of y (the maximum of the parabola on this interval)
y_min = np.___(y) # Smallest value of y (the minimum of the parabola on this interval)
x_max = x[np.___(y)] # x‑value at which y reaches its maximum
x_min = x[np.___(y)] # x‑value at which y reaches its minimum 

print("max y=", y_max, "x=", x_max)
print("min y=", y_min, "x=", x_min)

In [None]:
from jupytercards import display_flashcards
fpath= "https://raw.githubusercontent.com/phonchi/nsysu-math106A/refs/heads/main/extra/flashcards/"
display_flashcards(fpath + 'ch9-1.json')

## References

1. [https://scipy-lectures.org/intro/numpy/index.html](https://scipy-lectures.org/intro/numpy/index.html)

2. [https://scipy-lectures.org/advanced/advanced_numpy/index.html](https://scipy-lectures.org/advanced/advanced_numpy/index.html)

3. [https://jakevdp.github.io/PythonDataScienceHandbook/02.01-understanding-data-types.html](https://jakevdp.github.io/PythonDataScienceHandbook/02.01-understanding-data-types.html)

## Key terms

- **Array-Oriented Programming**: Working with whole matrix or arrays of data at once, so you don’t have to write a loop for every item.  
- **Functional-Style Programming**: A paradigm that builds programs by composing pure functions—functions.  
- **Internal iteration**: Letting a built-in method or function do the looping for you instead of writing the `for` loop yourself.  
- **Chain Method**: Linking several operations together in one line because each step returns the same item, for example: `text.strip().lower().split()`.  
- **Fancy Indexing**: Picking out multiple items from a list or array using a list of positions or a mask of True/False values instead of one index at a time.  
- **Copies**: Making a totally new version of a data structure so you can change it without affecting the original.  
- **Views**: Making a "window" into the original data so that if you change the view, you also change the original.  
- **Reduction**: Turning a list of values into one single result, like adding everything up or finding the highest value.  