# NumPy

<div class="admonition danger">
    <p class="admonition-title">DRAFT</p>
    <p style="padding-top: 1em">
        This page is a work in progress and is subject to change at any moment.
    </p>
</div>

NumPy is a Python library used for working with arrays.
It provides support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays.
NumPy is a fundamental package for scientific computing with Python and is widely used in the field of data science.

NumPy is particularly well-suited for numerical computations, as it provides an array object that is up to 50x faster than traditional Python lists.
The array object in NumPy is called ndarray, and it provides a lot of supporting functions that make working with ndarray very easy.
Arrays are very frequently used in data science, where speed and resources are very important.

One of the most important features of NumPy is its ability to perform element-wise operations on arrays.
This means that you can perform mathematical operations on entire arrays at once, without having to loop over each element of the array.
NumPy also provides support for linear algebra, Fourier transforms, and other advanced mathematical functions, making it a powerful tool for scientific computing.

Key resources:

-   [NumPy documentation](https://numpy.org/doc/stable)
-   [NumPy learn](https://numpy.org/learn/)

## One-dimensional arrays

A one-dimensional array is a fundamental data structure in programming that represents a collection of elements stored in a linear sequence.
Each element in the array is identified by an index, starting from 0 for the first element.
This sounds a lot like a list (and it is).

Let's start by creating a simple list of `int` values.

In [1]:
data_list = [0, 1, 2, 3, 4, 5]
print(data_list)
print(type(data_list))

[0, 1, 2, 3, 4, 5]
<class 'list'>


We can create something very similar in [NumPy](https://numpy.org/doc/stable) called an [array](https://numpy.org/doc/stable/user/basics.creation.html).

<div class="admonition note">
    <p class="admonition-title">Note</p>
    <p style="padding-top: 1em">
        Note that other programming languages use the term "array" or "vector" for the equivalent data structure of a Python list.
        Whenever we are talking about Python, we will often say "array" to mean specifically a NumPy array.
        If you are ever confused, always ask to clarify if we mean a NumPy array.
    </p>
</div>

Okay, back to creating our Numpy array.

In [2]:
import numpy as np

data_array = np.array([0, 1, 2, 3, 4, 5])
print(data_array)
print(type(data_array))

[0 1 2 3 4 5]
<class 'numpy.ndarray'>


`data_array` contains the same information (`int`s from 0 to 5) with some major usability differences.
First, we notice that there are no commas in between the elements when we print the array; this is mainly just for aesthetic purposes and you *could* tell NumPy to print them.
We also check that, indeed, the data type is not a `list`, but a `numpy.ndarray`.

Let's do some numerical operations with our two data structures and see the differences.
First, let's just add `2` to each element.

For a list we need to create a new list and then [loop](../../python-basics/loops) over each element, add `2`, and then `append` it to `list_added`.

In [3]:
list_added = []
for num in data_list:
    list_added.append(num + 2)
print(f"List:  {list_added}")

List:  [2, 3, 4, 5, 6, 7]


For a NumPy array, we do this.

In [4]:
array_added = data_array + 2
print(f"Array: {array_added}")

Array: [2 3 4 5 6 7]


Wow, that was easy.
And indeed, NumPy is designed to be the de facto [library](../../python-basics/libraries) for numerical operations.

What about a sum?

In [5]:
list_sum = sum(data_list)
print(f"List:  {list_sum}")

List:  15


Okay, not too bad.

In [6]:
array_sum = np.sum(data_array)
print(f"Array: {array_sum}")

Array: 15


There was not much of a difference there.

What about computing the mean?

In [7]:
list_mean = sum(data_list) / len(data_list)
print(f"List:  {list_mean}")

array_mean = np.mean(data_array)
print(f"Array: {array_mean}")

List:  2.5
Array: 2.5


Alright, that does not seem like too big of a difference.
What gives?

## Multi-dimensional arrays

The real selling point for NumPy is the concept of multi-dimensional arrays which are important in computational biology.

For out example, suppose we have a data set of different patient vital signs like `pulse`, `temperature`, and `spo2`.
We could potentially have thousands, but let's stick with only three patients.

While we have not covered it yet, you can actually store a list inside of a list!

In [8]:
patient_data_list = [[57, 99.0, 0.98], [68, 101.2, 0.92], [60, 98.3, 1.00]]
print("List")
print(patient_data_list)

List
[[57, 99.0, 0.98], [68, 101.2, 0.92], [60, 98.3, 1.0]]


In [9]:
patient_data_array = np.array([[57, 99.0, 0.98], [68, 101.2, 0.92], [60, 98.3, 1.00]])
print("Array")
print(patient_data_array)

Array
[[ 57.    99.     0.98]
 [ 68.   101.2    0.92]
 [ 60.    98.3    1.  ]]


We can also use `patient_data_list` to create the array!

In [10]:
patient_data_array = np.array(patient_data_list)
print("Array")
print(patient_data_array)

Array
[[ 57.    99.     0.98]
 [ 68.   101.2    0.92]
 [ 60.    98.3    1.  ]]


Okay, now lets compute the mean of each data category (i.e., column).

In [11]:
# First, we create a list to store our means.
n_patients = len(patient_data_list)  # total number of rows/patients we have
print(n_patients)
patient_data_mean_list = [0.0] * n_patients  # Creates list with zeros for each mean
print(patient_data_mean_list)

3
[0.0, 0.0, 0.0]


In [12]:
for patient_data in patient_data_list:
    for i in range(len(patient_data)):
        patient_data_mean_list[i] = patient_data_mean_list[i] + patient_data[i]

for i in range(len(patient_data_mean_list)):
    patient_data_mean_list[i] = patient_data_mean_list[i] / n_patients

print(patient_data_mean_list)

[61.666666666666664, 99.5, 0.9666666666666667]


In [13]:
patient_data_mean_array = np.mean(patient_data_array, axis=0)
print(f"Array: {patient_data_mean_array}")

Array: [61.66666667 99.5         0.96666667]


## Creating arrays

[Documentation](https://numpy.org/doc/stable/user/basics.creation.html#arrays-creation)

### `np.linspace`

Creates an array with evenly spaced values over a specified range.

[Documentation](https://numpy.org/doc/stable/reference/generated/numpy.linspace.html)

In [14]:
linspace_array = np.linspace(0, 1, 5)
print(linspace_array)

[0.   0.25 0.5  0.75 1.  ]


### `np.arange`

Creates an array with regularly spaced values within a given interval.

[Documentation](https://numpy.org/doc/stable/reference/generated/numpy.arange.html)

In [15]:
arange_array = np.arange(10)
print(arange_array)

[0 1 2 3 4 5 6 7 8 9]


### `np.zeros`

Creates an array filled with zeros.

[Documentation](https://numpy.org/doc/stable/reference/generated/numpy.zeros.html)

In [16]:
zeros_array = np.zeros(5)
print(zeros_array)

[0. 0. 0. 0. 0.]


### `np.full`

Return a new array of given shape and type, filled with fill_value.

[Documentation](https://numpy.org/doc/stable/reference/generated/numpy.full.html)

In [17]:
full_array = np.full((2, 2), np.nan)
print(full_array)

[[nan nan]
 [nan nan]]


## Indexing and slicing

In Python, the colon `:` is used for slicing in sequences, such as strings, lists, and NumPy arrays.
The syntax for slicing is generally `start:stop:step`.
Here's an explanation of each part:

-   `start`: The index at which the slice begins (inclusive). If omitted, it defaults to the beginning of the sequence.
-   `stop`: The index at which the slice ends (exclusive). If omitted, it defaults to the end of the sequence.
-   `step`: The step size or the number of indices between each slice. If omitted, it defaults to 1.

[Documentation](https://numpy.org/doc/stable/user/basics.indexing.html)

In [18]:
array = np.array(
    [
        [0, 1, 2, 3, 4, 5],
        [10, 11, 12, 13, 14, 15],
        [20, 21, 22, 23, 24, 25],
        [30, 31, 32, 33, 34, 35],
        [40, 41, 42, 43, 44, 45],
        [50, 51, 52, 53, 54, 55],
    ]
)
print(array)

[[ 0  1  2  3  4  5]
 [10 11 12 13 14 15]
 [20 21 22 23 24 25]
 [30 31 32 33 34 35]
 [40 41 42 43 44 45]
 [50 51 52 53 54 55]]


In [19]:
array[0]

array([0, 1, 2, 3, 4, 5])

In [20]:
array[0, 3:5]

array([3, 4])

In [21]:
array[4:, 4:]

array([[44, 45],
       [54, 55]])

In [22]:
array[:, 2]

array([ 2, 12, 22, 32, 42, 52])

In [23]:
array[2::2, ::2]

array([[20, 22, 24],
       [40, 42, 44]])

## Acknowledgements

Some material here was adapted with permission from the following sources:

-   [Scientific Python Lectures](https://github.com/scipy-lectures/scientific-python-lectures)