Lecture: AI I - Basics 

Previous:
[**Chapter 2.5: Additionals**](../02_python/05_additionals.ipynb)

---

# Chapter 3.1: Numpy

- [Create Numpy Arrays](#creating-numpy-arrays)
- [Data Types](#data-types)
- [Working with Arrays](#working-with-arrays)
- [Applying Functions to Arrays](#applying-functions-to-arrays)
- [Numpy Arrays as Sequences](#numpy-arrays-as-sequences)
- [Broadcasting](#broadcasting)
- [Aggregation Functions](#aggregation-functions)
- [Reading and Saving Data](#reading-and-saving-data)
- [Advanced Indexing](#advanced-indexing)
- [Expanding, Reducing, Combining Arrays](#expanding-reducing-combining-arrays)
- [Numpy Print Options](#numpy-print-options)


## The Modul Numpy

Python lists are very flexible since they can hold values of different data types and can easily be modified (e.g., with `append`).  
However, this flexibility comes at the cost of performance, making lists less suitable for numerical computations.

The **NumPy** [module](https://numpy.org/doc/stable/user/index.html) therefore defines the n-dimensional **array** data type `numpy.ndarray`, which relies on highly optimized C and Fortran code for efficient numerical calculations.

Arrays can only store values of a single numerical data type (e.g., floating-point values) and are much more rigid than lists.  
Nevertheless, this is exactly what we need for many scientific applications, such as working with datasets!

By convention, we import the NumPy module under the abbreviation `np`:


In [1]:
import numpy as np

### Introductory Example

Built-in Python containers such as `list` provide a flexible way to store and manage data.  
As mentioned earlier, collections usually store only references to objects. While this is very convenient when writing code, it comes with memory performance costs.  

Let’s look at an example. Suppose we conducted an experiment with one million measurements and now want to calculate their average.  
We could do this as follows:

In [2]:
import random 
measurements = [random.randint(150, 200) for _ in range(1_000_000)]
print(measurements[:10])

[187, 168, 165, 163, 163, 154, 161, 176, 200, 183]


That’s quite slow because, in each loop iteration, Python has to bind a new variable and then check whether the `+` operation is supported between the `accumulator` and the current `measurement`.  
This prevents attempts to add objects that aren’t addable—but in our case we’re confident we’re dealing only with integers.  
If we could tell the interpreter that we’re only adding integers, it could skip all that type checking and speed things up.  
This is exactly the use case `numpy` was created for.


In [3]:
def mean(values):
    accumulator = 0
    for value in values:
        accumulator += value
    mean_value = accumulator / len(values)
    return mean_value

%timeit mean(measurements)

17 ms ± 486 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)


We can already achieve faster computations by making use of as many of Python’s built-in functions as possible, such as `sum`.

In [4]:
%timeit sum(measurements) / len(measurements)

3.55 ms ± 21.8 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)


The standard data type in NumPy is the `ndarray` (short for n-dimensional array). In the simplest case, a NumPy array can be created from a Python list.

In [5]:
measurements_array = np.array(measurements)
measurements_array

array([187, 168, 165, ..., 170, 164, 158])

In [6]:
type(measurements_array)

numpy.ndarray

They behave very similarly to lists but have a fixed underlying data type. NumPy automatically detects that all our values are integers and chooses the appropriate data type: a 64-bit integer. For more details, see the documentation: https://docs.scipy.org/doc/numpy-1.13.0/user/basics.types.html

In [7]:
measurements_array.dtype

dtype('int64')

In addition, NumPy provides a wide range of routines for performing mathematical operations on arrays. Let’s see whether using NumPy actually gives us a performance advantage.

In [8]:
%timeit np.mean(measurements_array)

370 μs ± 4.96 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)


A clear speedup compared to pure Python implementations! Now that we’ve seen the usefulness of NumPy, let’s take a closer look at the NumPy array.

## Creating NumPy Arrays

The simplest way to create NumPy arrays is from Python lists, using the `numpy.array` function:


In [9]:
a = np.array([1, 2, 3])   # Create a rank 1 array
print(type(a))            # Prints "<class 'numpy.ndarray'>"

<class 'numpy.ndarray'>


In [10]:
a = np.array([ 1, 2, 3, 5, 8, 13])
a

array([ 1,  2,  3,  5,  8, 13])

In [11]:
b = np.array([[  1.5, 2.2, 3.1 ], [ 4.0, 5.2, 6.7 ]])
b

array([[1.5, 2.2, 3.1],
       [4. , 5.2, 6.7]])

NumPy arrays have several **attributes** that provide useful information about the array.

The number of dimensions of the array:

In [12]:
a.ndim, b.ndim

(1, 2)

The length of the array in each dimension:

In [13]:
a.shape, b.shape

((6,), (2, 3))

The data type of the array:

In [14]:
a.dtype, b.dtype

(dtype('int64'), dtype('float64'))

> **Reminder:** Use `<TAB>` autocompletion and the `?` documentation in Jupyter Notebook if you’re unsure which functions exist or what they do!

In [15]:
values = [[0, 1, 2, 3, 4]] * 3
two_dim_arr = np.array(values)
two_dim_arr

array([[0, 1, 2, 3, 4],
       [0, 1, 2, 3, 4],
       [0, 1, 2, 3, 4]])

In [16]:
two_dim_arr.shape

(3, 5)

In [17]:
two_dim_arr.ndim

2

In [18]:
values = [[[0, 1, 2, 3, 4]] * 3] * 6
three_dim_arr = np.array(values)
three_dim_arr

array([[[0, 1, 2, 3, 4],
        [0, 1, 2, 3, 4],
        [0, 1, 2, 3, 4]],

       [[0, 1, 2, 3, 4],
        [0, 1, 2, 3, 4],
        [0, 1, 2, 3, 4]],

       [[0, 1, 2, 3, 4],
        [0, 1, 2, 3, 4],
        [0, 1, 2, 3, 4]],

       [[0, 1, 2, 3, 4],
        [0, 1, 2, 3, 4],
        [0, 1, 2, 3, 4]],

       [[0, 1, 2, 3, 4],
        [0, 1, 2, 3, 4],
        [0, 1, 2, 3, 4]],

       [[0, 1, 2, 3, 4],
        [0, 1, 2, 3, 4],
        [0, 1, 2, 3, 4]]])

In [19]:
three_dim_arr.shape

(6, 3, 5)

In [20]:
three_dim_arr.ndim

3

---

Lecture: AI I - Basics 

Excersie: [**Excersie 3.1: Numpy**](../03_data/exercises/01_numpy.ipynb)

Next: [**Chapter 3.2: Pandas**](../03_data/02_pandas.ipynb)