Lecture: AI I - Basics 

Previous:
[**Chapter 2.5: Additionals**](../02_python/05_additionals.ipynb)

---

# Chapter 3.1: Numpy

- [Create Numpy Arrays](#creating-numpy-arrays)
- [Data Types](#data-types)
- [Working with Arrays](#working-with-arrays)
- [Applying Functions to Arrays](#applying-functions-to-arrays)
- [Numpy Arrays as Sequences](#numpy-arrays-as-sequences)
- [Broadcasting](#broadcasting)
- [Aggregation Functions](#aggregation-functions)
- [Reading and Saving Data](#reading-and-saving-data)
- [Advanced Indexing](#advanced-indexing)
- [Expanding, Reducing, Combining Arrays](#expanding-reducing-combining-arrays)
- [Numpy Print Options](#numpy-print-options)


## The Modul Numpy

Python lists are very flexible since they can hold values of different data types and can easily be modified (e.g., with `append`).  
However, this flexibility comes at the cost of performance, making lists less suitable for numerical computations.

The **NumPy** [module](https://numpy.org/doc/stable/user/index.html) therefore defines the n-dimensional **array** data type `numpy.ndarray`, which relies on highly optimized C and Fortran code for efficient numerical calculations.

Arrays can only store values of a single numerical data type (e.g., floating-point values) and are much more rigid than lists.  
Nevertheless, this is exactly what we need for many scientific applications, such as working with datasets!

By convention, we import the NumPy module under the abbreviation `np`:


In [1]:
import numpy as np

### Introductory Example

Built-in Python containers such as `list` provide a flexible way to store and manage data.  
As mentioned earlier, collections usually store only references to objects. While this is very convenient when writing code, it comes with memory performance costs.  

Let’s look at an example. Suppose we conducted an experiment with one million measurements and now want to calculate their average.  
We could do this as follows:

In [2]:
import random 
measurements = [random.randint(150, 200) for _ in range(1_000_000)]
print(measurements[:10])

[187, 168, 165, 163, 163, 154, 161, 176, 200, 183]


That’s quite slow because, in each loop iteration, Python has to bind a new variable and then check whether the `+` operation is supported between the `accumulator` and the current `measurement`.  
This prevents attempts to add objects that aren’t addable—but in our case we’re confident we’re dealing only with integers.  
If we could tell the interpreter that we’re only adding integers, it could skip all that type checking and speed things up.  
This is exactly the use case `numpy` was created for.


In [3]:
def mean(values):
    accumulator = 0
    for value in values:
        accumulator += value
    mean_value = accumulator / len(values)
    return mean_value

%timeit mean(measurements)

17 ms ± 486 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)


We can already achieve faster computations by making use of as many of Python’s built-in functions as possible, such as `sum`.

In [4]:
%timeit sum(measurements) / len(measurements)

3.55 ms ± 21.8 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)


The standard data type in NumPy is the `ndarray` (short for n-dimensional array). In the simplest case, a NumPy array can be created from a Python list.

In [5]:
measurements_array = np.array(measurements)
measurements_array

array([187, 168, 165, ..., 170, 164, 158])

In [6]:
type(measurements_array)

numpy.ndarray

They behave very similarly to lists but have a fixed underlying data type. NumPy automatically detects that all our values are integers and chooses the appropriate data type: a 64-bit integer. For more details, see the documentation: https://docs.scipy.org/doc/numpy-1.13.0/user/basics.types.html

In [7]:
measurements_array.dtype

dtype('int64')

In addition, NumPy provides a wide range of routines for performing mathematical operations on arrays. Let’s see whether using NumPy actually gives us a performance advantage.

In [8]:
%timeit np.mean(measurements_array)

370 μs ± 4.96 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)


A clear speedup compared to pure Python implementations! Now that we’ve seen the usefulness of NumPy, let’s take a closer look at the NumPy array.

## Creating NumPy Arrays

The simplest way to create NumPy arrays is from Python lists, using the `numpy.array` function:


In [9]:
a = np.array([1, 2, 3])   # Create a rank 1 array
print(type(a))            # Prints "<class 'numpy.ndarray'>"

<class 'numpy.ndarray'>


In [10]:
a = np.array([ 1, 2, 3, 5, 8, 13])
a

array([ 1,  2,  3,  5,  8, 13])

In [11]:
b = np.array([[  1.5, 2.2, 3.1 ], [ 4.0, 5.2, 6.7 ]])
b

array([[1.5, 2.2, 3.1],
       [4. , 5.2, 6.7]])

NumPy arrays have several **attributes** that provide useful information about the array.

The number of dimensions of the array:

In [12]:
a.ndim, b.ndim

(1, 2)

The length of the array in each dimension:

In [13]:
a.shape, b.shape

((6,), (2, 3))

The data type of the array:

In [14]:
a.dtype, b.dtype

(dtype('int64'), dtype('float64'))

> **Reminder:** Use `<TAB>` autocompletion and the `?` documentation in Jupyter Notebook if you’re unsure which functions exist or what they do!

In [15]:
values = [[0, 1, 2, 3, 4]] * 3
two_dim_arr = np.array(values)
two_dim_arr

array([[0, 1, 2, 3, 4],
       [0, 1, 2, 3, 4],
       [0, 1, 2, 3, 4]])

In [16]:
two_dim_arr.shape

(3, 5)

In [17]:
two_dim_arr.ndim

2

In [18]:
values = [[[0, 1, 2, 3, 4]] * 3] * 6
three_dim_arr = np.array(values)
three_dim_arr

array([[[0, 1, 2, 3, 4],
        [0, 1, 2, 3, 4],
        [0, 1, 2, 3, 4]],

       [[0, 1, 2, 3, 4],
        [0, 1, 2, 3, 4],
        [0, 1, 2, 3, 4]],

       [[0, 1, 2, 3, 4],
        [0, 1, 2, 3, 4],
        [0, 1, 2, 3, 4]],

       [[0, 1, 2, 3, 4],
        [0, 1, 2, 3, 4],
        [0, 1, 2, 3, 4]],

       [[0, 1, 2, 3, 4],
        [0, 1, 2, 3, 4],
        [0, 1, 2, 3, 4]],

       [[0, 1, 2, 3, 4],
        [0, 1, 2, 3, 4],
        [0, 1, 2, 3, 4]]])

In [19]:
three_dim_arr.shape

(6, 3, 5)

In [20]:
three_dim_arr.ndim

3

#### There are many ways to create arrays

- The `numpy.arange` function works similarly to Python’s `range` function, but it can also accept floating-point arguments:


In [21]:
np.arange(10)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [22]:
np.arange(1.5, 2, 0.1)

array([1.5, 1.6, 1.7, 1.8, 1.9])

- Also very useful are `numpy.linspace` and `numpy.logspace`, which generate a sequence of values spaced linearly or logarithmically between two numbers:


In [23]:
np.linspace(10, 20, 4)

array([10.        , 13.33333333, 16.66666667, 20.        ])

In [24]:
np.logspace(1, 3, 4)

array([  10.        ,   46.41588834,  215.443469  , 1000.        ])

- We can create arrays filled with zeros or ones using `numpy.zeros` and `numpy.ones`. By passing a tuple to the `shape` argument instead of a single integer, we can also generate multidimensional arrays:

In [25]:
np.zeros(5)

array([0., 0., 0., 0., 0.])

In [26]:
np.ones((5, 2, 3))

array([[[1., 1., 1.],
        [1., 1., 1.]],

       [[1., 1., 1.],
        [1., 1., 1.]],

       [[1., 1., 1.],
        [1., 1., 1.]],

       [[1., 1., 1.],
        [1., 1., 1.]],

       [[1., 1., 1.],
        [1., 1., 1.]]])

In [27]:
c = np.full((2,2,2,2), 4)  # Create a constant array
print(c)               # Prints "[[ 7.  7.]
                       #          [ 7.  7.]]"

[[[[4 4]
   [4 4]]

  [[4 4]
   [4 4]]]


 [[[4 4]
   [4 4]]

  [[4 4]
   [4 4]]]]


In [28]:
# Corresponds to whatever was left in memory. Using zeros for initialising arrays is usually saver.
np.empty(shape=(2, 3, 2))

array([[[ 4.68293136e-310,  0.00000000e+000],
        [ 5.83665407e-315,  0.00000000e+000],
        [ 1.30077963e-258,  2.10966031e-321]],

       [[-1.11190339e-262,  5.28614192e-308],
        [ 1.40406108e-309,  3.73305803e-301],
        [ 1.34164568e-301,  6.72812621e-310]]])

In [29]:
# empty is faster than initialized arrays but usually doesn't make a difference
%timeit np.empty(shape=100_000)
%timeit np.ones(shape=100_000)

141 ns ± 1.34 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
12.5 μs ± 33.8 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)


In [30]:
d = np.eye(3)         # Create a 2x2 identity matrix
print(d)              # Prints "[[ 1.  0.]
                      #          [ 0.  1.]]"

[[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]


In [31]:
e = np.random.random((2,2,2,2))     # Create an array filled with random values
print(e)                            # Might print "[[ 0.91940167  0.08143941]
                                    #               [ 0.68744134  0.87236687]]"

[[[[0.66813678 0.51223623]
   [0.99186461 0.10947994]]

  [[0.73995002 0.53591944]
   [0.03925507 0.30572672]]]


 [[[0.29953368 0.05070483]
   [0.77964292 0.1909645 ]]

  [[0.48108779 0.87617989]
   [0.56279217 0.5683731 ]]]]


You can read about other methods of creating arrays in the [documentation](http://docs.scipy.org/doc/numpy/user/basics.creation.html#arrays-creation).


## Data Types

Every NumPy array is a grid of elements of the same type.  
NumPy provides a wide range of numerical data types that you can use to construct arrays.  
When you create an array, NumPy tries to infer the data type automatically, but functions that construct arrays usually also include an optional argument to explicitly specify the data type.  

Here’s an example:


In [32]:
x = np.array([1, 2])   # Let numpy choose the datatype
print(x.dtype)         # Prints "int64"

int64


In [33]:
x = np.array([1.0, 2.0])   # Let numpy choose the datatype
print(x.dtype)             # Prints "float64"

float64


In [34]:
x = np.array([1, 2], dtype=np.int64)   # Force a particular datatype
print(x.dtype)

int64


### dtype

`dtype` provides information about the data type.  
Arrays can contain bools, ints, unsigned ints, floats, or complex numbers of various byte sizes.  
They can also store strings or Python objects, but this has very few practical use cases.


In [35]:
values = [0, 1, 2, 3, 4]
int_arr = np.array(values, dtype='int')
int_arr, int_arr.dtype

(array([0, 1, 2, 3, 4]), dtype('int64'))

If the specified `dtype` does not match the given values, NumPy will cast everything to that data type.


In [36]:
bool_arr = np.array(values, dtype='bool')
bool_arr, bool_arr.dtype

(array([False,  True,  True,  True,  True]), dtype('bool'))

If no explicit data type is specified, NumPy chooses the “smallest common denominator.”  
In the following example, everything is converted to a float, since integers can be represented as floats, but not the other way around.


In [37]:
values = [0, 1, 2.5, 3, 4]
float_arr = np.array(values)
float_arr, float_arr.dtype

(array([0. , 1. , 2.5, 3. , 4. ]), dtype('float64'))

Once the data type has been defined, all values are strictly enforced to match that type.


In [38]:
int_arr[1] = 2.5
int_arr, int_arr.dtype

(array([0, 2, 2, 3, 4]), dtype('int64'))

These non-Python data types force us to once again consider issues such as overflow and similar limitations.


In [39]:
values = [0, 1, 2, 3, 4]
uint_arr = np.array(values, dtype='uint8')
uint_arr, uint_arr.dtype

(array([0, 1, 2, 3, 4], dtype=uint8), dtype('uint8'))

In [40]:
uint_arr[1] += 255
uint_arr

  uint_arr[1] += 255


array([0, 0, 2, 3, 4], dtype=uint8)

...and this can lead to certain issues when comparing them with standard Python types.


In [41]:
print(type(uint_arr[0]), type(183))

<class 'numpy.uint8'> <class 'int'>


In [42]:
val = 1.2 - 1.0
arr = np.array([val], dtype=np.float32)
print(f'{val} == {arr[0]} -> {val == arr[0]}')

0.19999999999999996 == 0.20000000298023224 -> True


For a more reliable comparison, you can use an epsilon value:


In [43]:
epsilon = 1e-6  # 1*10^(-6); 0.000001
abs(arr[0] - val) < epsilon

np.True_

For a deeper dive into why floating-point calculations can be inaccurate, see the [Python documentation](https://docs.python.org/3/tutorial/floatingpoint.html).

You can read all about NumPy data types in the [documentation](https://numpy.org/doc/stable/reference/arrays.dtypes.html).  


## Working with Arrays

Arrays can be combined **element-wise** using the standard operators `+-*/**`:


In [44]:
x = np.array([[1,2],[3,4]], dtype=np.float64)
y = np.array([[5,6],[7,8]], dtype=np.float64)
print(x)
print(y)

[[1. 2.]
 [3. 4.]]
[[5. 6.]
 [7. 8.]]


In [45]:
# Elementwise sum; both produce the array
# [[ 6.0  8.0]
#  [10.0 12.0]]
print(x + y)
print(np.add(x, y))

[[ 6.  8.]
 [10. 12.]]
[[ 6.  8.]
 [10. 12.]]


In [46]:
# Elementwise difference; both produce the array
# [[-4.0 -4.0]
#  [-4.0 -4.0]]
print(x - y)
print(np.subtract(x, y))

[[-4. -4.]
 [-4. -4.]]
[[-4. -4.]
 [-4. -4.]]


In [47]:
# Elementwise product; both produce the array
# [[ 5.0 12.0]
#  [21.0 32.0]]
print(x * y)
print(np.multiply(x, y))

[[ 5. 12.]
 [21. 32.]]
[[ 5. 12.]
 [21. 32.]]


In [48]:
# Elementwise division; both produce the array
# [[ 0.2         0.33333333]
#  [ 0.42857143  0.5       ]]
print(x / y)
print(np.divide(x, y))

[[0.2        0.33333333]
 [0.42857143 0.5       ]]
[[0.2        0.33333333]
 [0.42857143 0.5       ]]


In [49]:
# Elementwise square root; produces the array
# [[ 1.          1.41421356]
#  [ 1.73205081  2.        ]]
print(np.sqrt(x))

[[1.         1.41421356]
 [1.73205081 2.        ]]


In [50]:
x = np.array([1,2,3])
y = np.array([4,5,6])
print(x)
print(y)

[1 2 3]
[4 5 6]


In [51]:
x + 2 * y

array([ 9, 12, 15])

In [52]:
x ** y

array([  1,  32, 729])

With `@` you can even perform matrix multiplication.  
In the case of 1D arrays, this corresponds to the inner product between two vectors.


In [53]:
x @ y

np.int64(32)

In [54]:
# That's the same as
np.sum(x * y)

np.int64(32)

In [55]:
x.dot(y)

np.int64(32)

> **Note:** For Python lists, these operators are defined completely differently!


Note that in contrast to MATLAB, `*` performs element-wise multiplication rather than matrix multiplication.  
Instead, we use the `dot` function to compute inner products of vectors, multiply a vector by a matrix, and multiply matrices.  

`dot` is available both as a function in the NumPy module and as an instance method of array objects:


In [56]:
x = np.array([[1,2],[3,4]])
y = np.array([[5,6],[7,8]])

v = np.array([9,10])
w = np.array([11, 12])

In [57]:
# Inner product of vectors; both produce 219
print(v.dot(w))
print(np.dot(v, w))

219
219


In [58]:
# Inner product of vectors; both produce 219
print(v.dot(w))
print(np.dot(v, w))

219
219


In [59]:
# Matrix / matrix product; both produce the rank 2 array
# [[19 22]
#  [43 50]]
print(x.dot(y))
print(np.dot(x, y))

[[19 22]
 [43 50]]
[[19 22]
 [43 50]]


## Applying Functions to Arrays

While functions from the `math` module such as `sin` or `exp` can be applied to single numbers, the corresponding functions from the `numpy` module can be applied directly to arrays.  
**The function is applied to every element of the array** and i


In [60]:
phi = np.linspace(0, 2 * np.pi, 10) # 10 values between 0 and 2π
np.sin(phi) # The sine of each of these values

array([ 0.00000000e+00,  6.42787610e-01,  9.84807753e-01,  8.66025404e-01,
        3.42020143e-01, -3.42020143e-01, -8.66025404e-01, -9.84807753e-01,
       -6.42787610e-01, -2.44929360e-16])

In [61]:
arr = np.arange(-9, 9)
arr

array([-9, -8, -7, -6, -5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5,  6,  7,
        8])

In [62]:
np.log(arr)

  np.log(arr)
  np.log(arr)


array([       nan,        nan,        nan,        nan,        nan,
              nan,        nan,        nan,        nan,       -inf,
       0.        , 0.69314718, 1.09861229, 1.38629436, 1.60943791,
       1.79175947, 1.94591015, 2.07944154])

In [63]:
np.exp(arr)

array([1.23409804e-04, 3.35462628e-04, 9.11881966e-04, 2.47875218e-03,
       6.73794700e-03, 1.83156389e-02, 4.97870684e-02, 1.35335283e-01,
       3.67879441e-01, 1.00000000e+00, 2.71828183e+00, 7.38905610e+00,
       2.00855369e+01, 5.45981500e+01, 1.48413159e+02, 4.03428793e+02,
       1.09663316e+03, 2.98095799e+03])

In [64]:
np.sin(arr)

array([-0.41211849, -0.98935825, -0.6569866 ,  0.2794155 ,  0.95892427,
        0.7568025 , -0.14112001, -0.90929743, -0.84147098,  0.        ,
        0.84147098,  0.90929743,  0.14112001, -0.7568025 , -0.95892427,
       -0.2794155 ,  0.6569866 ,  0.98935825])

`np.sign` returns -1 for negative values, +1 for positive values, and 0 for zero:

In [65]:
np.sign(arr)

array([-1, -1, -1, -1, -1, -1, -1, -1, -1,  0,  1,  1,  1,  1,  1,  1,  1,
        1])

NumPy provides many useful functions for performing computations on arrays; one of the most useful is `sum`:


In [66]:
x = np.array([[1,2],[3,4]])

print(np.sum(x))  # Compute sum of all elements; prints "10"
print(np.sum(x, axis=0))  # Compute sum of each column; prints "[4 6]"
print(np.sum(x, axis=1))  # Compute sum of each row; prints "[3 7]"

10
[4 6]
[3 7]


In addition, there are other functions that compute properties of an array:


In [67]:
x = np.linspace(0, 10, 100)
np.sum(x), np.mean(x), np.std(x)

(np.float64(500.0), np.float64(5.0), np.float64(2.9157646512850626))

These functions generalize to multiple dimensions by specifying the axis along which the computation should be performed:


In [68]:
x = np.array([[ 1, 2 ], [ 3, 4 ]])
np.sum(x), np.sum(x, axis=0), np.sum(x, axis=1)

(np.int64(10), array([4, 6]), array([3, 7]))

Apart from computing mathematical functions with arrays, we often need to reshape or otherwise manipulate data within arrays.  
The simplest example of this type of operation is transposing a matrix. To transpose a matrix, simply use the `T` attribute of an array object:


In [69]:
x = np.array([[1,2], [3,4]])
print(x)    # Prints "[[1 2]
            #          [3 4]]"
print(x.T)  # Prints "[[1 3]
            #          [2 4]]"

[[1 2]
 [3 4]]
[[1 3]
 [2 4]]


In [70]:
# Note that taking the transpose of a rank 1 array does nothing:
v = np.array([1, 2, 3])
print(v)    # Prints "[1 2 3]"
print(v.T)  # Prints "[1 2 3]"

[1 2 3]
[1 2 3]


Always try to use vectorized ufuncs instead of explicit loops!  

Using these operators/universal functions is generally faster than writing out the operations manually:


In [71]:
func1 = lambda: np.repeat(np.arange(1, 4), 30).reshape(3, -1).T.flatten()  # noqa: E731
func2 = lambda: np.arange(3 * 30) % 3 + 1  # noqa: E731
func3 = lambda: np.array([[1, 2, 3] for _ in range(30)]).flatten()  # noqa: E731

print(func1())
print(func2())
print(func3())

%timeit func1()
%timeit func2()
%timeit func3()

[1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1
 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2
 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3]
[1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1
 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2
 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3]
[1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1
 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2
 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3]


2.04 μs ± 25.5 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.74 μs ± 10.6 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
6.41 μs ± 72.9 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)


### Random Values

`np.random` includes a variety of functions for generating arrays filled with random values from different probability distributions.


In [72]:
np.random.random((3, 3))

array([[0.59890043, 0.36160733, 0.35258061],
       [0.4445614 , 0.6260483 , 0.50785895],
       [0.89112932, 0.77679426, 0.86881184]])

In [73]:
np.random.randint(0, 10, (5, 5))

array([[7, 0, 5, 3, 7],
       [5, 6, 3, 2, 1],
       [6, 0, 1, 8, 6],
       [9, 6, 6, 5, 9],
       [3, 5, 8, 9, 4]])

With `np.random.randint` and a boolean dtype, you can generate random boolean arrays!


### Repeating Values

With `np.repeat`, elements of an array are repeated:


In [74]:
np.repeat(3, 5)

array([3, 3, 3, 3, 3])

In [75]:
np.repeat([[1,2], [3,4]], 2)

array([1, 1, 2, 2, 3, 3, 4, 4])

`np.tile` is another way to repeat values with NumPy.


In [76]:
print('Repeat:', np.repeat([1, 2, 3], 3))
print('Tile:', np.tile([1, 2, 3], 3))

Repeat: [1 1 1 2 2 2 3 3 3]
Tile: [1 2 3 1 2 3 1 2 3]


### Reshape

In [77]:
a = np.arange(start=2, stop=14)
print(a.shape)
a

(12,)


array([ 2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13])

In [78]:
b = a.reshape(3, 4)
b

array([[ 2,  3,  4,  5],
       [ 6,  7,  8,  9],
       [10, 11, 12, 13]])

-1 as the axis automatically infers the size of the corresponding dimension


In [79]:
a.reshape(-1, 2)

array([[ 2,  3],
       [ 4,  5],
       [ 6,  7],
       [ 8,  9],
       [10, 11],
       [12, 13]])

Example: We want to create a 2D array where each row is `[1, 2, 3]`, and the array should have 10 rows.


In [80]:
print(np.repeat(np.arange(1, 4), 10).reshape(-1, 10).T, "\n")
print(np.tile(np.arange(1, 4), 10).reshape(10, -1))

[[1 2 3]
 [1 2 3]
 [1 2 3]
 [1 2 3]
 [1 2 3]
 [1 2 3]
 [1 2 3]
 [1 2 3]
 [1 2 3]
 [1 2 3]] 

[[1 2 3]
 [1 2 3]
 [1 2 3]
 [1 2 3]
 [1 2 3]
 [1 2 3]
 [1 2 3]
 [1 2 3]
 [1 2 3]
 [1 2 3]]


### Comparing Arrays


In [81]:
epsilon = 0.000000000001
a = np.zeros((3, 3))
a[0, 0] += epsilon  # a[0][0] -> list

b = np.zeros((3, 3))
print(a)
print(b)

[[1.e-12 0.e+00 0.e+00]
 [0.e+00 0.e+00 0.e+00]
 [0.e+00 0.e+00 0.e+00]]
[[0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]]


In [82]:
a == b

array([[False,  True,  True],
       [ True,  True,  True],
       [ True,  True,  True]])

In [83]:
(a == b).all()

np.False_

In [84]:
c = np.array([])
d = np.array([1])
(c == d).all()

np.True_

Issues with this approach:
* If either `a` or `b` is empty and the other contains a single element, it will return `True` (the comparison `a == b` yields an empty array, for which the `all` operator returns `True`).
* If `a` and `b` do not have the same shape and are not broadcastable, this approach will raise an error.

Instead, use NumPy’s built-in functions!



In [85]:
np.array_equal(c, d)

False

In [86]:
np.allclose(a, b)

True

In [87]:
np.isclose(a, b)

array([[ True,  True,  True],
       [ True,  True,  True],
       [ True,  True,  True]])

The complete list of mathematical functions provided by NumPy can be found in the [documentation](https://numpy.org/doc/stable/reference/routines.math.html).


## NumPy Arrays as Sequences

We can apply all functions to NumPy arrays that are defined for sequences.


In [88]:
a = np.arange(3)
len(a)

3

In [89]:
for x in a:
    print(x)

0
1
2


We can access elements using square brackets:

In [90]:
a[0]

np.int64(0)

In [91]:
a[0] = 5                 
print(a)                  

[5 1 2]


#### Slicing selects parts of an array

We have already seen the slicing syntax for sequences. It allows us to access individual elements or parts of a sequence:

```python
a[start:stop:step]
```

Extended to multidimensional arrays:
```python
b[start:stop:step, start:stop:step]

```


In [92]:
x = np.arange(10)
print(x[:5])
print(x[::2])

[0 1 2 3 4]
[0 2 4 6 8]


In [93]:
a = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])
a

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

In [94]:
b = a[:2, 1:3]
b

array([[2, 3],
       [6, 7]])

In [95]:
print(a[0, 1])
b[0, 0] = 77     # b[0, 0] is the same piece of data as a[0, 1]
print(a[0, 1])  

2
77


Alternatively, instead of a single index we can also provide a **list of indices** (also called *fancy indexing* — we’ll revisit this in more detail later if time allows) in the subscript, which returns the corresponding elements from the array:


In [96]:
x = np.array([1, 6, 4, 7, 9])
indices = [1, 0, 2, 1]
x[indices]

array([6, 1, 4, 6])

You can also combine integer indexing with slice indexing.  
However, this results in an array of lower rank than the original array.  
Note that this behavior differs from how MATLAB handles array slicing:


In [97]:
a = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])
a

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

In [98]:
row_r1 = a[1, :]    # Rank 1 view of the second row of a
row_r2 = a[1:2, :]  # Rank 2 view of the second row of a
print(row_r1, row_r1.shape)
print(row_r2, row_r2.shape) 

[5 6 7 8] (4,)
[[5 6 7 8]] (1, 4)


In [99]:
# We can make the same distinction when accessing columns of an array:
col_r1 = a[:, 1]
col_r2 = a[:, 1:2]
print(col_r1, col_r1.shape) 
print(col_r2, col_r2.shape)  

[ 2  6 10] (3,)
[[ 2]
 [ 6]
 [10]] (3, 1)


**Integer Array Indexing**:  
When you index NumPy arrays with slices, the resulting array view is always a subarray of the original array.  
In contrast, integer array indexing allows you to construct arbitrary arrays by pulling data from another array.  

Here’s an example:


In [100]:
a = np.array([[1, 2], 
              [3, 4], 
              [5, 6]])

In [101]:
print(a[[0, 1, 2], [0, 1, 0]]) 

[1 4 5]


In [102]:
print(np.array([a[0, 0], a[1, 1], a[2, 0]]))

[1 4 5]


In [103]:
print(a[[0, 0], [1, 1]])

[2 2]


In [104]:
print(np.array([a[0, 1], a[0, 1]]))

[2 2]


A useful trick with integer array indexing is selecting or modifying one element from each row of a matrix:


In [105]:
a = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]])

print(a)

[[ 1  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]]


In [106]:
# Create an array of indices
b = np.array([0, 2, 0, 1])

# Select one element from each row of a using the indices in b
print(a[np.arange(4), b])

[ 1  6  7 11]


In [107]:
# Mutate one element from each row of a using the indices in b
a[np.arange(4), b] += 10

print(a)

[[11  2  3]
 [ 4  5 16]
 [17  8  9]
 [10 21 12]]


#### Masking filters an array

NumPy also extends this syntax with **masking** functionality.  
Here, we provide an **array of booleans** in the subscript (with the same length), and only the elements corresponding to `True` are returned:


In [108]:
x = np.array([1, 6, 4, 7, 9])
mask = np.array([True, True, False, False, True])
x[mask]

array([1, 6, 9])

Masking is extremely useful because **comparison operators**, when used with NumPy arrays, return boolean arrays:


In [109]:
x > 4

array([False,  True, False,  True,  True])

This way, we can filter parts of an array that meet a specific **condition**:


In [110]:
x[x > 4]

array([6, 7, 9])

Conditions can be combined using the `&` operator:


In [111]:
x[(x > 4) & (x < 8)]

array([6, 7])

#### Slices or masks of an array can also be assigned to

When a slice or a mask of an array appears on the left-hand side of an assignment, the corresponding part of the original array is updated:


In [112]:
x = np.array([1, 6, 4, 7, 9])
x[x > 4] = 0
x

array([1, 0, 4, 0, 0])

For the sake of brevity, we have omitted many details about additional indexing options in NumPy arrays.  
If you’d like to learn more, check out the [documentation](http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html).


## Broadcasting

Broadcasting is a powerful mechanism that allows NumPy to work with arrays of different shapes when performing arithmetic operations.  
Often, we have a smaller array and a larger array, and we want to use the smaller array multiple times to apply an operation to the larger array.

NumPy attempts to expand the arrays according to three rules so that their shapes match, allowing the operation to be applied element-wise.

**Rule 1** If the arrays have a different number of dimensions, the shape of the smaller array is padded with ones on the left.  
&nbsp;&nbsp;&nbsp;&nbsp;Example: (5 × 3) + (3) → (5 × 3) + (**1** × 3)  

**Rule 2** If the arrays have the same number of dimensions, but the size of one dimension differs, the dimension with size 1 is stretched to match.  
&nbsp;&nbsp;&nbsp;&nbsp;Example: (5 × 3) + (1 × 3) → (5 × 3) + (**5** × 3)  

**Rule 3** If the shapes still do not align after applying Rules 1 and 2, a broadcasting error is raised.

![](https://jakevdp.github.io/PythonDataScienceHandbook/figures/02.05-broadcasting.png)

The [NumPy documentation](https://numpy.org/doc/stable/user/basics.broadcasting.html) provides more insights.

For example, suppose we want to add a constant vector to every row of a matrix. We could do it like this:


In [113]:
x = np.array([[1, 2, 3], 
              [4, 5, 6], 
              [7, 8, 9], 
              [10, 11, 12]])

v = np.array([1, 0, 1])
y = np.empty_like(x)   

for i in range(4):
    y[i, :] = x[i, :] + v
    
print(y)

[[ 2  2  4]
 [ 5  5  7]
 [ 8  8 10]
 [11 11 13]]


This works; however, if the matrix `x` is very large, running an explicit loop in Python can be slow.  
Note that adding the vector `v` to each row of the matrix `x` is equivalent to constructing a matrix `vv` by vertically stacking multiple copies of `v`, and then performing an element-wise sum of `x` and `vv`.  
We could implement this approach as follows:


In [114]:
x = np.array([[1, 2, 3], 
              [4, 5, 6], 
              [7, 8, 9], 
              [10, 11, 12]])

v = np.array([1, 0, 1])
vv = np.tile(v, (4, 1))   
print(vv)                 

[[1 0 1]
 [1 0 1]
 [1 0 1]
 [1 0 1]]


In [115]:
y = x + vv
print(y)

[[ 2  2  4]
 [ 5  5  7]
 [ 8  8 10]
 [11 11 13]]


With NumPy broadcasting, we can perform this computation without actually creating multiple copies of `v`.  
Consider this broadcasting-based version:


In [116]:

x = np.array([[1, 2, 3], 
              [4, 5, 6], 
              [7, 8, 9], 
              [10, 11, 12]])

v = np.array([1, 0, 1])
y = x + v 

print(y)  

[[ 2  2  4]
 [ 5  5  7]
 [ 8  8 10]
 [11 11 13]]


The line `y = x + v` works even though `x` has the shape `(4, 3)` and `v` has the shape `(3,)`; thanks to broadcasting, this line behaves as if `v` actually had the shape `(4, 3)`, with each row being a copy of `v`, and the sum is carried out element-wise.

The rules for combining two arrays are as follows:

1. If the arrays do not have the same rank, the shape of the array with the lower rank is padded with ones on the left until both shapes have the same length.  
2. The two arrays are considered compatible in a dimension if they have the same size in that dimension or if one of them has size 1 in that dimension.  
3. The arrays can be broadcast together if they are compatible in all dimensions.  
4. After broadcasting, each array behaves as if it had the shape equal to the element-wise maximum of the input shapes.  
5. In any dimension where one array has size 1 and the other has a size greater than 1, the first array behaves as if it were copied along that dimension.  

If this explanation doesn’t make sense, try reading the [documentation explanation](https://numpy.org/doc/stable/user/basics.broadcasting.html) or [this alternative explanation](https://scipy.github.io/old-wiki/pages/EricsBroadcastingDoc.html).

Functions that support broadcasting are called *universal functions*.  
You can find the complete list of universal functions in the [documentation](https://numpy.org/doc/stable/reference/ufuncs.html#available-ufuncs).

Here are some applications of broadcasting:


In [117]:
# Compute outer product of vectors
v = np.array([1, 2, 3])  # v has shape (3,)
w = np.array([4, 5])    # w has shape (2,)
# To compute an outer product, we first reshape v to be a column
# vector of shape (3, 1); we can then broadcast it against w to yield
# an output of shape (3, 2), which is the outer product of v and w:
# [[ 4  5]
#  [ 8 10]
#  [12 15]]
print(np.reshape(v, (3, 1)) * w)

[[ 4  5]
 [ 8 10]
 [12 15]]


In [118]:
# Add a vector to each row of a matrix
x = np.array([[1, 2, 3], [4, 5, 6]])
# x has shape (2, 3) and v has shape (3,) so they broadcast to (2, 3),
# giving the following matrix:
# [[2 4 6]
#  [5 7 9]]
print(x + v)

[[2 4 6]
 [5 7 9]]


In [119]:
# Add a vector to each column of a matrix
# x has shape (2, 3) and w has shape (2,).
# If we transpose x then it has shape (3, 2) and can be broadcast
# against w to yield a result of shape (3, 2); transposing this result
# yields the final result of shape (2, 3) which is the matrix x with
# the vector w added to each column. Gives the following matrix:
# [[ 5  6  7]
#  [ 9 10 11]]
print((x.T + w).T)

[[ 5  6  7]
 [ 9 10 11]]


In [120]:
# Another solution is to reshape w to be a column vector of shape (2, 1);
# we can then broadcast it directly against x to produce the same
# output.
print(x + np.reshape(w, (2, 1)))

[[ 5  6  7]
 [ 9 10 11]]


In [121]:
# Multiply a matrix by a constant:
# x has shape (2, 3). Numpy treats scalars as arrays of shape ();
# these can be broadcast together to shape (2, 3), producing the
# following array:
# [[ 2  4  6]
#  [ 8 10 12]]
print(x * 2)

[[ 2  4  6]
 [ 8 10 12]]


Broadcasting usually makes your code more concise and faster, so you should make an effort to use it whenever possible.


## Aggregation Functions

Aggregation functions are functions that reduce the dimensionality of an array.  
They provide an `axis` argument to specify along which dimension the reduction should be performed.


In [122]:
np.random.seed(1)
two_dim_arr = np.random.randint(0, high=20, size=(4, 4))
two_dim_arr

array([[ 5, 11, 12,  8],
       [ 9, 11,  5, 15],
       [ 0, 16,  1, 12],
       [ 7, 13,  6, 18]])

If only the array is passed, the aggregation operation is applied over the entire array.


In [123]:
np.min(two_dim_arr)

np.int64(0)

With the optional `axis` argument, we can specify which dimension should be aggregated.  
You can think of this as applying the operation to all entries that result when keeping the indices fixed in all dimensions except the specified `axis`.

Let’s look at the result of the minimum operation with `axis=0`:


In [124]:
np.min(two_dim_arr, axis=0)

array([ 0, 11,  1,  8])

The concept of axes extends to more than one dimension.


In [125]:
np.random.seed(1)
three_dim_arr = np.random.randint(0, high=20, size=(4, 4, 4))
three_dim_arr

array([[[ 5, 11, 12,  8],
        [ 9, 11,  5, 15],
        [ 0, 16,  1, 12],
        [ 7, 13,  6, 18]],

       [[ 5, 18, 11, 10],
        [14, 18,  4,  9],
        [17,  0, 13,  9],
        [ 9,  7,  1,  0]],

       [[17,  8, 13, 19],
        [15, 10,  8,  7],
        [ 3,  6, 17,  3],
        [ 4, 17, 11, 12]],

       [[16, 13, 19,  9],
        [18, 15,  0,  4],
        [15,  2,  7,  8],
        [ 9,  3,  7,  4]]])

In [126]:
np.min(three_dim_arr, axis=0)

array([[ 5,  8, 11,  8],
       [ 9, 10,  0,  4],
       [ 0,  0,  1,  3],
       [ 4,  3,  1,  0]])

Here, the entry at index `[0, 0]`, i.e. `5`, is the minimum of the following values:


In [127]:
for i in range(4):
    print(three_dim_arr[i, 0, 0])

5
5
17
16


Let’s once again demonstrate all axes using another three-dimensional array:


In [128]:
a = np.array([[[2, 4], [6, 9]], [[3, 1], [7, 8]], [[4, 5], [9, 0]]])
a, a.shape

(array([[[2, 4],
         [6, 9]],
 
        [[3, 1],
         [7, 8]],
 
        [[4, 5],
         [9, 0]]]),
 (3, 2, 2))

In [129]:
np.min(a)

np.int64(0)

In [130]:
np.min(a, axis=0)

array([[2, 1],
       [6, 0]])

Setting the axis argument is equivalent to iterating through all other axes of the array in sequence and returning the corresponding aggregate for each combination.


In [131]:
for i in range(a.shape[1]):
    for j in range(a.shape[2]):
        print(a[:, i, j])

[2 3 4]
[4 1 5]
[6 7 9]
[9 8 0]


For `axis=1`, a loop is performed over axis 0 and axis 2:


In [132]:
a

array([[[2, 4],
        [6, 9]],

       [[3, 1],
        [7, 8]],

       [[4, 5],
        [9, 0]]])

In [133]:
np.min(a, axis=1)

array([[2, 4],
       [3, 1],
       [4, 0]])

In [134]:
for i in range(a.shape[0]):
    for j in range(a.shape[2]):
        print(a[i, :, j])

[2 6]
[4 9]
[3 7]
[1 8]
[4 9]
[5 0]


...and finally, for `axis=2`, we loop through axes 0 and 1:


In [135]:
a

array([[[2, 4],
        [6, 9]],

       [[3, 1],
        [7, 8]],

       [[4, 5],
        [9, 0]]])

In [136]:
np.min(a, axis=2)

array([[2, 6],
       [1, 7],
       [4, 0]])

In [137]:
for i in range(a.shape[0]):
    for j in range(a.shape[1]):
        print(a[i, j, :])

[2 4]
[6 9]
[3 1]
[7 8]
[4 5]
[9 0]


The shape of the resulting array is simply the shape of the original array with the specified axis removed:


In [138]:
mins = []
for i in range(a.shape[0]):
    for j in range(a.shape[1]):
        mins.append(min(a[i,j,:]))
np.array(mins).reshape([a.shape[0], a.shape[1]])

array([[2, 6],
       [1, 7],
       [4, 0]])

...however, using NumPy is of course much faster than looping over the array manually:


In [139]:
def find_min_manual(arr):
    mins = []
    for i in range(arr.shape[0]):
        for j in range(arr.shape[1]):
            mins.append(min(arr[i,j,:]))
    np.array(mins).reshape([arr.shape[0], arr.shape[1]])

%timeit find_min_manual(a)
%timeit np.min(a, axis=2)

5.95 μs ± 37.4 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.91 μs ± 10.3 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)


#### np.nan

In [140]:
np.nan == np.nan

False

In [141]:
np.isnan(np.nan)

np.True_

In [142]:
a = np.r_[np.arange(5), np.repeat(0, 5)]
a

array([0, 1, 2, 3, 4, 0, 0, 0, 0, 0])

In [143]:
b = a / a
b

  b = a / a


array([nan,  1.,  1.,  1.,  1., nan, nan, nan, nan, nan])

In [144]:
~np.isnan(b)

array([False,  True,  True,  True,  True, False, False, False, False,
       False])

In [145]:
b[~np.isnan(b)]

array([1., 1., 1., 1.])

In [146]:
np.divide(a, a, out=np.zeros(a.shape), where=(a!=0)) 
# at the positions where a!=0, make the division,
# at other indices use what's specified as "out"

array([0., 1., 1., 1., 1., 0., 0., 0., 0., 0.])

### More Than One Dimension

Aggregation functions can also aggregate across more than one dimension at a time.


In [147]:
three_dim_arr = np.random.randint(0, 10, (4, 2, 3))
three_dim_arr

array([[[5, 9, 3],
        [6, 8, 0]],

       [[2, 7, 7],
        [9, 7, 3]],

       [[0, 8, 7],
        [7, 1, 1]],

       [[3, 0, 8],
        [6, 4, 5]]])

In [148]:
np.min(three_dim_arr, axis=(1, 2))

array([0, 2, 0, 0])

### Other Aggregation Functions


In [149]:
two_dim_arr

array([[ 5, 11, 12,  8],
       [ 9, 11,  5, 15],
       [ 0, 16,  1, 12],
       [ 7, 13,  6, 18]])

In [150]:
np.max(two_dim_arr)

np.int64(18)

In [151]:
np.max(two_dim_arr, axis=0)

array([ 9, 16, 12, 18])

In [152]:
np.max(two_dim_arr, axis=1)

array([12, 15, 16, 18])

In [153]:
np.sum(two_dim_arr)

np.int64(149)

In [154]:
np.sum(two_dim_arr, axis=0)

array([21, 51, 24, 53])

In [155]:
np.sum(two_dim_arr, axis=1)

array([36, 40, 29, 44])

Many of these functions are also available as methods on the array object itself.


In [156]:
two_dim_arr.sum(axis=0)

array([21, 51, 24, 53])

### Flatten

We want to convert any array into a 1D array.


In [157]:
a = np.arange(64).reshape((2, 2, 2, 2, 2, 2))
a

array([[[[[[ 0,  1],
           [ 2,  3]],

          [[ 4,  5],
           [ 6,  7]]],


         [[[ 8,  9],
           [10, 11]],

          [[12, 13],
           [14, 15]]]],



        [[[[16, 17],
           [18, 19]],

          [[20, 21],
           [22, 23]]],


         [[[24, 25],
           [26, 27]],

          [[28, 29],
           [30, 31]]]]],




       [[[[[32, 33],
           [34, 35]],

          [[36, 37],
           [38, 39]]],


         [[[40, 41],
           [42, 43]],

          [[44, 45],
           [46, 47]]]],



        [[[[48, 49],
           [50, 51]],

          [[52, 53],
           [54, 55]]],


         [[[56, 57],
           [58, 59]],

          [[60, 61],
           [62, 63]]]]]])

In [158]:
a.flatten()

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
       34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50,
       51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63])

## Reading and Saving Data

With the `numpy.loadtxt` function, we can read data from a file into a NumPy array:


In [159]:
data = np.loadtxt('data/matplotlib/temperatures.txt')
data.shape

(6679, 2)

The function returns a two-dimensional array with the _rows_ of the file that was read.  
We can access all values of a specific _column_ through slicing:


In [160]:
date = data[:,0] # All rows, first column
T = data[:,1] # All rows, second column
date, T

(array([1995.00274, 1995.00548, 1995.00821, ..., 2013.27926, 2013.282  ,
        2013.28474]),
 array([ 0.944444, -1.61111 , -3.55556 , ..., 10.5556  ,  8.94444 ,
        11.1667  ]))

> **Note:** The `numpy.loadtxt` function can also directly return an array for each column when the argument `unpack=True` is passed:
>
> ```python
> date, T = np.loadtxt('data/04/temperatures.txt', unpack=True)
> ```
>
> Additional useful options, such as skipping the first few lines, can be found in the documentation.  
> Remove the '`#`' character in the following cell and take a look at the options:


In [161]:
np.loadtxt?

[31mSignature:[39m
np.loadtxt(
    fname,
    dtype=<[38;5;28;01mclass[39;00m [33m'float'[39m>,
    comments=[33m'#'[39m,
    delimiter=[38;5;28;01mNone[39;00m,
    converters=[38;5;28;01mNone[39;00m,
    skiprows=[32m0[39m,
    usecols=[38;5;28;01mNone[39;00m,
    unpack=[38;5;28;01mFalse[39;00m,
    ndmin=[32m0[39m,
    encoding=[38;5;28;01mNone[39;00m,
    max_rows=[38;5;28;01mNone[39;00m,
    *,
    quotechar=[38;5;28;01mNone[39;00m,
    like=[38;5;28;01mNone[39;00m,
)
[31mDocstring:[39m
Load data from a text file.

Parameters
----------
fname : file, str, pathlib.Path, list of str, generator
    File, filename, list, or generator to read.  If the filename
    extension is ``.gz`` or ``.bz2``, the file is first decompressed. Note
    that generators must return bytes or strings. The strings
    in a list or produced by a generator are treated as lines.
dtype : data-type, optional
    Data-type of the resulting array; default: float.  If this is a
    s

With the related `np.savetxt` function, we can save data as a text file:


In [162]:
np.savetxt?

[31mSignature:[39m      
np.savetxt(
    fname,
    X,
    fmt=[33m'%.18e'[39m,
    delimiter=[33m' '[39m,
    newline=[33m'\n'[39m,
    header=[33m''[39m,
    footer=[33m''[39m,
    comments=[33m'# '[39m,
    encoding=[38;5;28;01mNone[39;00m,
)
[31mCall signature:[39m  np.savetxt(*args, **kwargs)
[31mType:[39m            _ArrayFunctionDispatcher
[31mString form:[39m     <function savetxt at 0x7bda88556340>
[31mFile:[39m            /workspaces/.venv/lib/python3.12/site-packages/numpy/lib/_npyio_impl.py
[31mDocstring:[39m      
Save an array to a text file.

Parameters
----------
fname : filename, file handle or pathlib.Path
    If the filename ends in ``.gz``, the file is automatically saved in
    compressed gzip format.  `loadtxt` understands gzipped files
    transparently.
X : 1D or 2D array_like
    Data to be saved to a text file.
fmt : str or sequence of strs, optional
    A single format (%10.5f), a sequence of formats, or a
    multi-format string, e.

#### Caching computations with `numpy.save`

The `numpy.loadtxt` and `numpy.savetxt` functions work with text files.  
However, if you just want to temporarily store a NumPy array—for example, the result of a long numerical computation—you can save it as a `.npy` binary file using `numpy.save`:


In [163]:
result = np.random.random(10)

print(result)

np.save('data/numpy/result.npy', result)

[0.51488911 0.94459476 0.58655504 0.90340192 0.1374747  0.13927635
 0.80739129 0.39767684 0.1653542  0.92750858]


Instead of having to repeat the computation each time, you can simply load the cached result with `numpy.load`:


In [164]:
result = np.load('data/numpy/result.npy')
print(result)

[0.51488911 0.94459476 0.58655504 0.90340192 0.1374747  0.13927635
 0.80739129 0.39767684 0.1653542  0.92750858]


> **Note:** This approach can save a lot of time while working on parts of your program that do not involve the numerical computation itself, such as generating graphical output (plots).


## Advanced Indexing

NumPy provides indexing methods that go beyond the standard indexing techniques known from regular Python sequences.


### Multidimensional Indexing

You can use a colon to select all values from that dimension.


In [165]:
large_two_dim_arr = np.arange(81).reshape((9, 9))
large_two_dim_arr

array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8],
       [ 9, 10, 11, 12, 13, 14, 15, 16, 17],
       [18, 19, 20, 21, 22, 23, 24, 25, 26],
       [27, 28, 29, 30, 31, 32, 33, 34, 35],
       [36, 37, 38, 39, 40, 41, 42, 43, 44],
       [45, 46, 47, 48, 49, 50, 51, 52, 53],
       [54, 55, 56, 57, 58, 59, 60, 61, 62],
       [63, 64, 65, 66, 67, 68, 69, 70, 71],
       [72, 73, 74, 75, 76, 77, 78, 79, 80]])

In [166]:
large_two_dim_arr[:, 1]

array([ 1, 10, 19, 28, 37, 46, 55, 64, 73])

Standard slicing with `(start, stop, step)` works as expected.


In [167]:
large_two_dim_arr[:, 1:3]

array([[ 1,  2],
       [10, 11],
       [19, 20],
       [28, 29],
       [37, 38],
       [46, 47],
       [55, 56],
       [64, 65],
       [73, 74]])

In [168]:
large_two_dim_arr[:, 2:7:2]

array([[ 2,  4,  6],
       [11, 13, 15],
       [20, 22, 24],
       [29, 31, 33],
       [38, 40, 42],
       [47, 49, 51],
       [56, 58, 60],
       [65, 67, 69],
       [74, 76, 78]])

Slices of an array are always `views`.  
This means you are “looking at” the same part of the array from a different perspective.  
This saves a lot of memory, but it also means that modifying the view will change the original array.


In [169]:
arr_slice = large_two_dim_arr[:, 1]
arr_slice[:] = 0
large_two_dim_arr

array([[ 0,  0,  2,  3,  4,  5,  6,  7,  8],
       [ 9,  0, 11, 12, 13, 14, 15, 16, 17],
       [18,  0, 20, 21, 22, 23, 24, 25, 26],
       [27,  0, 29, 30, 31, 32, 33, 34, 35],
       [36,  0, 38, 39, 40, 41, 42, 43, 44],
       [45,  0, 47, 48, 49, 50, 51, 52, 53],
       [54,  0, 56, 57, 58, 59, 60, 61, 62],
       [63,  0, 65, 66, 67, 68, 69, 70, 71],
       [72,  0, 74, 75, 76, 77, 78, 79, 80]])

In [170]:
large_two_dim_arr[:, 2] = 0
large_two_dim_arr

array([[ 0,  0,  0,  3,  4,  5,  6,  7,  8],
       [ 9,  0,  0, 12, 13, 14, 15, 16, 17],
       [18,  0,  0, 21, 22, 23, 24, 25, 26],
       [27,  0,  0, 30, 31, 32, 33, 34, 35],
       [36,  0,  0, 39, 40, 41, 42, 43, 44],
       [45,  0,  0, 48, 49, 50, 51, 52, 53],
       [54,  0,  0, 57, 58, 59, 60, 61, 62],
       [63,  0,  0, 66, 67, 68, 69, 70, 71],
       [72,  0,  0, 75, 76, 77, 78, 79, 80]])

In [171]:
l2 = np.copy(large_two_dim_arr)
l2[:, 6] = 0
large_two_dim_arr

array([[ 0,  0,  0,  3,  4,  5,  6,  7,  8],
       [ 9,  0,  0, 12, 13, 14, 15, 16, 17],
       [18,  0,  0, 21, 22, 23, 24, 25, 26],
       [27,  0,  0, 30, 31, 32, 33, 34, 35],
       [36,  0,  0, 39, 40, 41, 42, 43, 44],
       [45,  0,  0, 48, 49, 50, 51, 52, 53],
       [54,  0,  0, 57, 58, 59, 60, 61, 62],
       [63,  0,  0, 66, 67, 68, 69, 70, 71],
       [72,  0,  0, 75, 76, 77, 78, 79, 80]])

In [172]:
l2

array([[ 0,  0,  0,  3,  4,  5,  0,  7,  8],
       [ 9,  0,  0, 12, 13, 14,  0, 16, 17],
       [18,  0,  0, 21, 22, 23,  0, 25, 26],
       [27,  0,  0, 30, 31, 32,  0, 34, 35],
       [36,  0,  0, 39, 40, 41,  0, 43, 44],
       [45,  0,  0, 48, 49, 50,  0, 52, 53],
       [54,  0,  0, 57, 58, 59,  0, 61, 62],
       [63,  0,  0, 66, 67, 68,  0, 70, 71],
       [72,  0,  0, 75, 76, 77,  0, 79, 80]])

If you need all values from several consecutive dimensions, you can use an ellipsis (`...`) as a shorthand.


In [173]:
# Ellipsis is an actual Python object.
print(...)

Ellipsis


In [174]:
# np.stack joins arrays along a new axis.
four_dim_arr = np.stack((
    np.ones((3, 3, 3)), 
    np.ones((3, 3, 3)) * 2, 
    np.ones((3, 3, 3)) * 3, 
    np.ones((3, 3, 3)) * 4
))
four_dim_arr

array([[[[1., 1., 1.],
         [1., 1., 1.],
         [1., 1., 1.]],

        [[1., 1., 1.],
         [1., 1., 1.],
         [1., 1., 1.]],

        [[1., 1., 1.],
         [1., 1., 1.],
         [1., 1., 1.]]],


       [[[2., 2., 2.],
         [2., 2., 2.],
         [2., 2., 2.]],

        [[2., 2., 2.],
         [2., 2., 2.],
         [2., 2., 2.]],

        [[2., 2., 2.],
         [2., 2., 2.],
         [2., 2., 2.]]],


       [[[3., 3., 3.],
         [3., 3., 3.],
         [3., 3., 3.]],

        [[3., 3., 3.],
         [3., 3., 3.],
         [3., 3., 3.]],

        [[3., 3., 3.],
         [3., 3., 3.],
         [3., 3., 3.]]],


       [[[4., 4., 4.],
         [4., 4., 4.],
         [4., 4., 4.]],

        [[4., 4., 4.],
         [4., 4., 4.],
         [4., 4., 4.]],

        [[4., 4., 4.],
         [4., 4., 4.],
         [4., 4., 4.]]]])

In [175]:
four_dim_arr.shape

(4, 3, 3, 3)

In [176]:
four_dim_arr[3, :, :, :]

array([[[4., 4., 4.],
        [4., 4., 4.],
        [4., 4., 4.]],

       [[4., 4., 4.],
        [4., 4., 4.],
        [4., 4., 4.]],

       [[4., 4., 4.],
        [4., 4., 4.],
        [4., 4., 4.]]])

In [177]:
four_dim_arr[1,..., 1]

array([[2., 2., 2.],
       [2., 2., 2.],
       [2., 2., 2.]])

In [178]:
four_dim_arr[..., 1]

array([[[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]],

       [[2., 2., 2.],
        [2., 2., 2.],
        [2., 2., 2.]],

       [[3., 3., 3.],
        [3., 3., 3.],
        [3., 3., 3.]],

       [[4., 4., 4.],
        [4., 4., 4.],
        [4., 4., 4.]]])

### Fancy Indexing

You can pass an array of indices, which is especially useful for selecting random elements from an array.


In [179]:
arr = np.arange(9) + 10
arr

array([10, 11, 12, 13, 14, 15, 16, 17, 18])

In [180]:
indices = np.array([1, 4, 5])
arr[indices]

array([11, 14, 15])

The resulting array will reflect the shape of the index array.


In [181]:
indices = np.array([[1, 4], [5, 7]])
arr[indices]

array([[11, 14],
       [15, 17]])

You can index each dimension separately.


In [182]:
two_dim_arr = np.arange(25).reshape(5, 5)
two_dim_arr

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24]])

In [183]:
x_indices = np.array([3, 4])
y_indices = np.array([1, 2])
two_dim_arr[x_indices, y_indices] # Corresponds to indexing at [3, 1] and [4, 2].

array([16, 22])

With `np.argsort`, you can obtain the indices that would sort an array, allowing you to sort other arrays in the same way (note that `vstack` is used here only for demonstration purposes):


In [184]:
a = np.random.randint(0, 10, 10)
b = a ** 2
np.vstack((a, b))

array([[ 0,  2,  0,  7,  1,  7,  9,  8,  4,  0],
       [ 0,  4,  0, 49,  1, 49, 81, 64, 16,  0]])

In [185]:
indices = a.argsort()
indices

array([0, 2, 9, 4, 1, 8, 3, 5, 7, 6])

In [186]:
np.vstack((a[indices], b[indices]))

array([[ 0,  0,  0,  1,  2,  4,  7,  7,  8,  9],
       [ 0,  0,  0,  1,  4, 16, 49, 49, 64, 81]])

### Advanced Masking

In [187]:
arr = np.arange(1, 7)
arr

array([1, 2, 3, 4, 5, 6])

Different masks can be combined using bitwise logical operators.  
These are the vectorized versions of the logical operators and should not be confused with `and`, `or`, and `not`, which evaluate the truth value of an entire object.


In [188]:
smaller_or_equal_four = (arr <= 4)
smaller_or_equal_four   

array([ True,  True,  True,  True, False, False])

In [189]:
greater_two = (arr > 2)
greater_two

array([False, False,  True,  True,  True,  True])

Bitwise `and` `&`.


In [190]:
greater_two & smaller_or_equal_four

array([False, False,  True,  True, False, False])

In [191]:
# This does not work.
try:
    greater_two and smaller_or_equal_four
except ValueError as e:
    print("ValueError:", e)

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()


In [192]:
arr

array([1, 2, 3, 4, 5, 6])

In [193]:
arr[greater_two & smaller_or_equal_four]

array([3, 4])

Bitwise `or` with `|`.


In [194]:
arr[greater_two | smaller_or_equal_four]

array([1, 2, 3, 4, 5, 6])

Bitwise `xor` with `^`.



In [195]:
arr

array([1, 2, 3, 4, 5, 6])

In [196]:
arr[greater_two ^ smaller_or_equal_four]

array([1, 2, 5, 6])

Bitwise negation with `~`.


In [197]:
arr[~((arr < 2) ^ (arr > 2))]

array([2])

In [198]:
arr[~greater_two]

array([1, 2])

In [199]:
# Gives everything smaller or equal to 2.
arr[~greater_two] = 2
arr

array([2, 2, 3, 4, 5, 6])

#### Using `np.where`

Masking always modifies the original array, whereas sometimes we want the original array to remain unchanged.  
`np.where` returns the indices of an array where the specified condition is true.


In [200]:
a = np.arange(9).reshape(3, 3)
a[a % 3 == 0] = 0
a

array([[0, 1, 2],
       [0, 4, 5],
       [0, 7, 8]])

In [201]:
a = np.arange(9).reshape(3, 3)
indices = np.where(a % 3 == 0)
indices

(array([0, 1, 2]), array([0, 0, 0]))

In [202]:
b = np.ones((3, 3))
b[indices] = 0
b

array([[0., 1., 1.],
       [0., 1., 1.],
       [0., 1., 1.]])

`where` can also be used to assign values to a new array:


In [203]:
np.where(a % 3 == 0, 0, a)

array([[0, 1, 2],
       [0, 4, 5],
       [0, 7, 8]])

In [204]:
a

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

`np.argwhere` returns the indices grouped by element:


In [205]:
a = np.eye(4) * np.arange(16).reshape(4,4)
a

array([[ 0.,  0.,  0.,  0.],
       [ 0.,  5.,  0.,  0.],
       [ 0.,  0., 10.,  0.],
       [ 0.,  0.,  0., 15.]])

In [206]:
np.argwhere(a)

array([[1, 1],
       [2, 2],
       [3, 3]])

In [207]:
np.where(a)

(array([1, 2, 3]), array([1, 2, 3]))

## Expanding, Reducing, Combining Arrays

### Adding New Dimensions with `np.newaxis`

Instead of `np.newaxis`, you can also use `None`.


In [208]:
one_dim_arr = np.arange(5)
one_dim_arr, one_dim_arr.shape

(array([0, 1, 2, 3, 4]), (5,))

In [209]:
two_dim_arr = one_dim_arr[np.newaxis, :]
two_dim_arr, two_dim_arr.shape

(array([[0, 1, 2, 3, 4]]), (1, 5))

In [210]:
two_dim_arr = one_dim_arr[:, np.newaxis, None]
two_dim_arr, two_dim_arr.shape

(array([[[0]],
 
        [[1]],
 
        [[2]],
 
        [[3]],
 
        [[4]]]),
 (5, 1, 1))

Adding new dimensions is useful, for example, when using TensorFlow for batch inputs but you want to provide a single data point for prediction:


In [211]:
one_dim_arr[:, None]

array([[0],
       [1],
       [2],
       [3],
       [4]])

### Removing Dimensions

`arr.squeeze()` removes dimensions of size 1:


In [212]:
one_dim_arr = np.arange(5)
two_dim_arr = one_dim_arr[np.newaxis, :]
two_dim_arr, two_dim_arr.shape

(array([[0, 1, 2, 3, 4]]), (1, 5))

In [213]:
two_dim_arr.squeeze(), two_dim_arr.squeeze().shape

(array([0, 1, 2, 3, 4]), (5,))

In [214]:
a = np.arange(5).reshape(1, -1, 1, 1)
a

array([[[[0]],

        [[1]],

        [[2]],

        [[3]],

        [[4]]]])

In [215]:
a.squeeze()

array([0, 1, 2, 3, 4])

### Combining Arrays

There are many ways to combine existing arrays, such as `np.append`, `np.concatenate`, and `np.stack`.  
However, these operations always require copying the entire array.  
Therefore, it is often more efficient to allocate an array in the required final size beforehand and then fill in only the necessary parts.


In [216]:
np.concatenate((np.arange(10), np.arange(10)[::-1]))

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0])

A quick and easy way to combine scalars and arrays is to use `np.r_`, with the desired arrays, lists, or numbers inside square brackets:


In [217]:
np.r_[2, 2, 2, np.arange(10), np.arange(10)[::-1], [0, 1, 2]]

array([2, 2, 2, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 9, 8, 7, 6, 5, 4, 3, 2, 1,
       0, 0, 1, 2])

`np.append` internally uses concatenation:


In [218]:
np.append(np.arange(10), np.arange(10))

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

For higher-dimensional arrays, other functions are more appropriate:

In [219]:
np.stack((np.arange(10), np.arange(10)))

array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
       [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])

There are also the functions `np.vstack` (stacking by rows) and `np.hstack` (stacking by columns):  

* `hstack` is equivalent to concatenation along the second axis, except for 1-D arrays, where it concatenates along the first axis.  
* `vstack` is equivalent to concatenation along the first axis, after reshaping 1-D arrays from shape `(N,)` to `(1, N)`.


In [220]:
two_dim_arr = np.arange(16).reshape(4, -1)
two_dim_arr_2 = np.arange(16).reshape(4, -1) + 16
two_dim_arr, two_dim_arr_2

(array([[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11],
        [12, 13, 14, 15]]),
 array([[16, 17, 18, 19],
        [20, 21, 22, 23],
        [24, 25, 26, 27],
        [28, 29, 30, 31]]))

In [221]:
np.hstack((two_dim_arr, two_dim_arr_2))

array([[ 0,  1,  2,  3, 16, 17, 18, 19],
       [ 4,  5,  6,  7, 20, 21, 22, 23],
       [ 8,  9, 10, 11, 24, 25, 26, 27],
       [12, 13, 14, 15, 28, 29, 30, 31]])

In [222]:
np.vstack((two_dim_arr, two_dim_arr_2))

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15],
       [16, 17, 18, 19],
       [20, 21, 22, 23],
       [24, 25, 26, 27],
       [28, 29, 30, 31]])

### Zufallsgesteuerter Zugriff mit random.seed

Wenn ein Zufalls-Seed gesetzt ist, verwendet der Zufallszahlen-Generator immer wieder die gleichen Zahlen. Dies ist sehr nützlich zum Testen, aber natürlich nimmt dies jegliche Zufälligkeit aus allem heraus, so dass es nicht im endgültigen Code verwendet werden sollte:

In [223]:
for _ in range(5):
    np.random.seed(0)
    print(np.random.random(5))

[0.5488135  0.71518937 0.60276338 0.54488318 0.4236548 ]
[0.5488135  0.71518937 0.60276338 0.54488318 0.4236548 ]
[0.5488135  0.71518937 0.60276338 0.54488318 0.4236548 ]
[0.5488135  0.71518937 0.60276338 0.54488318 0.4236548 ]
[0.5488135  0.71518937 0.60276338 0.54488318 0.4236548 ]


To disable it, set the random seed to `None`.  
In this case, NumPy generates random numbers using your system’s randomness source (or the system time, …).


In [224]:
for _ in range(5):
    np.random.seed(None)
    print(np.random.random(5))

[0.60258284 0.00743434 0.47123958 0.03191451 0.93058322]
[0.09845678 0.54235906 0.10451891 0.90807219 0.98687735]
[0.83357324 0.8571592  0.05444303 0.7225635  0.84661718]
[0.28018492 0.6070292  0.01711465 0.10579684 0.66800474]
[0.47879055 0.91306722 0.30292259 0.48621762 0.43568078]


### Shuffling Arrays

`np.random.shuffle` shuffles an array along the first dimension.  
This means a one-dimensional array is fully shuffled, while for multidimensional arrays the subarrays from the second dimension onward remain intact.


In [225]:
a = np.arange(10)
np.random.shuffle(a)
a

array([3, 5, 8, 4, 2, 1, 6, 7, 9, 0])

In [226]:
a = np.arange(9).reshape(3, 3)
np.random.shuffle(a)
a

array([[3, 4, 5],
       [0, 1, 2],
       [6, 7, 8]])

To shuffle the array completely, you can flatten it and then reshape it back to its original form:


In [227]:
a = np.arange(9).reshape(3, 3)
a

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

In [228]:
b = a.flatten()
np.random.shuffle(b)
a = b.reshape(a.shape)
a

array([[3, 2, 8],
       [4, 5, 0],
       [6, 1, 7]])

Note that `np.shuffle` shuffles the array in place.  
To return a permuted copy instead, you would use `np.permutation`:


In [229]:
a = np.arange(9).reshape(3, 3)
a

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

In [230]:
np.random.permutation(a)

array([[3, 4, 5],
       [6, 7, 8],
       [0, 1, 2]])

In [231]:
a

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

If you are working with multiple arrays that you want to shuffle without breaking their alignment, it is better to shuffle the indices instead:


In [232]:
a = np.arange(9) + 1
b = a ** 2
np.vstack((a, b))

array([[ 1,  2,  3,  4,  5,  6,  7,  8,  9],
       [ 1,  4,  9, 16, 25, 36, 49, 64, 81]])

In [233]:
order = np.random.permutation(a.shape[0])
np.vstack((a[order], b[order]))

array([[ 8,  9,  6,  4,  5,  2,  7,  3,  1],
       [64, 81, 36, 16, 25,  4, 49,  9,  1]])

### np.random.choice

`np.random.choice` generates a sub-array from a given 1D array:


In [234]:
np.random.seed(1)
a = np.arange(10)
np.random.choice(a, size=5)

array([5, 8, 9, 5, 0])

In [235]:
np.random.choice(a, size=5, replace=False)

array([6, 3, 9, 2, 8])

In [236]:
np.random.choice(a, size=5, replace=False)

array([9, 6, 1, 0, 8])

You can also specify probabilities for selecting certain elements.  
For example, to create another array in which about one quarter of the elements are `True`, you can use:


In [237]:
np.random.choice(np.array([True, False]), size=(5, 5), p=[0.01, 0.99])

array([[False, False, False, False, False],
       [False, False, False, False, False],
       [False, False, False, False, False],
       [False, False, False, False, False],
       [False, False, False, False, False]])

## NumPy `print` Options

By default, NumPy prints numbers only up to a certain precision, or summarizes the output of large arrays.  
This behavior can be modified using `np.set_printoptions`:


In [238]:
rand_arr = np.random.random((5, 3))
rand_arr

array([[0.98886109, 0.74816565, 0.28044399],
       [0.78927933, 0.10322601, 0.44789353],
       [0.9085955 , 0.29361415, 0.28777534],
       [0.13002857, 0.01936696, 0.67883553],
       [0.21162812, 0.26554666, 0.49157316]])

In [239]:
np.set_printoptions(precision=3)
rand_arr

array([[0.989, 0.748, 0.28 ],
       [0.789, 0.103, 0.448],
       [0.909, 0.294, 0.288],
       [0.13 , 0.019, 0.679],
       [0.212, 0.266, 0.492]])

In [240]:
rand_arr = rand_arr / 1e3
rand_arr

array([[9.889e-04, 7.482e-04, 2.804e-04],
       [7.893e-04, 1.032e-04, 4.479e-04],
       [9.086e-04, 2.936e-04, 2.878e-04],
       [1.300e-04, 1.937e-05, 6.788e-04],
       [2.116e-04, 2.655e-04, 4.916e-04]])

In [241]:
np.set_printoptions(suppress=True, precision=6)
rand_arr

array([[0.000989, 0.000748, 0.00028 ],
       [0.000789, 0.000103, 0.000448],
       [0.000909, 0.000294, 0.000288],
       [0.00013 , 0.000019, 0.000679],
       [0.000212, 0.000266, 0.000492]])

In [242]:
arr = np.arange(90)
arr

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
       34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50,
       51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67,
       68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84,
       85, 86, 87, 88, 89])

In [243]:
np.set_printoptions(threshold=10)
arr

array([ 0,  1,  2, ..., 87, 88, 89])

## Further Reading

NumPy chapter from the *Python Data Science Handbook* by Jake VanderPlas  
https://jakevdp.github.io/PythonDataScienceHandbook/02.00-introduction-to-numpy.html  

### NumPy Documentation

This short overview has covered many of the important things you need to know about NumPy, but it is by no means complete.  
Take a look at the [_NumPy_ documentation](https://numpy.org/doc/stable/reference/) to learn much more about _NumPy_.


---

Lecture: AI I - Basics 

Excersie: [**Excersie 3.1: Numpy**](../03_data/exercises/01_numpy.ipynb)

Next: [**Chapter 3.2: Pandas**](../03_data/02_pandas.ipynb)