Lecture: AI I - Basics 

Previous:
[**Chapter 2.5: Additionals**](../02_python/05_additionals.ipynb)

---

# Chapter 3.1: Numpy

- [Create Numpy Arrays](#creating-numpy-arrays)
- [Data Types](#data-types)
- [Working with Arrays](#working-with-arrays)
- [Applying Functions to Arrays](#applying-functions-to-arrays)
- [Numpy Arrays as Sequences](#numpy-arrays-as-sequences)
- [Broadcasting](#broadcasting)
- [Aggregation Functions](#aggregation-functions)
- [Reading and Saving Data](#reading-and-saving-data)
- [Advanced Indexing](#advanced-indexing)
- [Expanding, Reducing, Combining Arrays](#expanding-reducing-combining-arrays)
- [Numpy Print Options](#numpy-print-options)


## The Modul Numpy

Python lists are very flexible since they can hold values of different data types and can easily be modified (e.g., with `append`).  
However, this flexibility comes at the cost of performance, making lists less suitable for numerical computations.

The **NumPy** [module](https://numpy.org/doc/stable/user/index.html) therefore defines the n-dimensional **array** data type `numpy.ndarray`, which relies on highly optimized C and Fortran code for efficient numerical calculations.

Arrays can only store values of a single numerical data type (e.g., floating-point values) and are much more rigid than lists.  
Nevertheless, this is exactly what we need for many scientific applications, such as working with datasets!

By convention, we import the NumPy module under the abbreviation `np`:


In [1]:
import numpy as np

### Introductory Example

Built-in Python containers such as `list` provide a flexible way to store and manage data.  
As mentioned earlier, collections usually store only references to objects. While this is very convenient when writing code, it comes with memory performance costs.  

Let’s look at an example. Suppose we conducted an experiment with one million measurements and now want to calculate their average.  
We could do this as follows:

In [2]:
import random 
measurements = [random.randint(150, 200) for _ in range(1_000_000)]
print(measurements[:10])

[187, 168, 165, 163, 163, 154, 161, 176, 200, 183]


That’s quite slow because, in each loop iteration, Python has to bind a new variable and then check whether the `+` operation is supported between the `accumulator` and the current `measurement`.  
This prevents attempts to add objects that aren’t addable—but in our case we’re confident we’re dealing only with integers.  
If we could tell the interpreter that we’re only adding integers, it could skip all that type checking and speed things up.  
This is exactly the use case `numpy` was created for.


In [3]:
def mean(values):
    accumulator = 0
    for value in values:
        accumulator += value
    mean_value = accumulator / len(values)
    return mean_value

%timeit mean(measurements)

17 ms ± 486 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)


We can already achieve faster computations by making use of as many of Python’s built-in functions as possible, such as `sum`.

In [4]:
%timeit sum(measurements) / len(measurements)

3.55 ms ± 21.8 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)


The standard data type in NumPy is the `ndarray` (short for n-dimensional array). In the simplest case, a NumPy array can be created from a Python list.

In [5]:
measurements_array = np.array(measurements)
measurements_array

array([187, 168, 165, ..., 170, 164, 158])

In [6]:
type(measurements_array)

numpy.ndarray

They behave very similarly to lists but have a fixed underlying data type. NumPy automatically detects that all our values are integers and chooses the appropriate data type: a 64-bit integer. For more details, see the documentation: https://docs.scipy.org/doc/numpy-1.13.0/user/basics.types.html

In [7]:
measurements_array.dtype

dtype('int64')

In addition, NumPy provides a wide range of routines for performing mathematical operations on arrays. Let’s see whether using NumPy actually gives us a performance advantage.

In [8]:
%timeit np.mean(measurements_array)

370 μs ± 4.96 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)


A clear speedup compared to pure Python implementations! Now that we’ve seen the usefulness of NumPy, let’s take a closer look at the NumPy array.

## Creating NumPy Arrays

The simplest way to create NumPy arrays is from Python lists, using the `numpy.array` function:


In [9]:
a = np.array([1, 2, 3])   # Create a rank 1 array
print(type(a))            # Prints "<class 'numpy.ndarray'>"

<class 'numpy.ndarray'>


In [10]:
a = np.array([ 1, 2, 3, 5, 8, 13])
a

array([ 1,  2,  3,  5,  8, 13])

In [11]:
b = np.array([[  1.5, 2.2, 3.1 ], [ 4.0, 5.2, 6.7 ]])
b

array([[1.5, 2.2, 3.1],
       [4. , 5.2, 6.7]])

NumPy arrays have several **attributes** that provide useful information about the array.

The number of dimensions of the array:

In [12]:
a.ndim, b.ndim

(1, 2)

The length of the array in each dimension:

In [13]:
a.shape, b.shape

((6,), (2, 3))

The data type of the array:

In [14]:
a.dtype, b.dtype

(dtype('int64'), dtype('float64'))

> **Reminder:** Use `<TAB>` autocompletion and the `?` documentation in Jupyter Notebook if you’re unsure which functions exist or what they do!

In [15]:
values = [[0, 1, 2, 3, 4]] * 3
two_dim_arr = np.array(values)
two_dim_arr

array([[0, 1, 2, 3, 4],
       [0, 1, 2, 3, 4],
       [0, 1, 2, 3, 4]])

In [16]:
two_dim_arr.shape

(3, 5)

In [17]:
two_dim_arr.ndim

2

In [18]:
values = [[[0, 1, 2, 3, 4]] * 3] * 6
three_dim_arr = np.array(values)
three_dim_arr

array([[[0, 1, 2, 3, 4],
        [0, 1, 2, 3, 4],
        [0, 1, 2, 3, 4]],

       [[0, 1, 2, 3, 4],
        [0, 1, 2, 3, 4],
        [0, 1, 2, 3, 4]],

       [[0, 1, 2, 3, 4],
        [0, 1, 2, 3, 4],
        [0, 1, 2, 3, 4]],

       [[0, 1, 2, 3, 4],
        [0, 1, 2, 3, 4],
        [0, 1, 2, 3, 4]],

       [[0, 1, 2, 3, 4],
        [0, 1, 2, 3, 4],
        [0, 1, 2, 3, 4]],

       [[0, 1, 2, 3, 4],
        [0, 1, 2, 3, 4],
        [0, 1, 2, 3, 4]]])

In [19]:
three_dim_arr.shape

(6, 3, 5)

In [20]:
three_dim_arr.ndim

3

#### There are many ways to create arrays

- The `numpy.arange` function works similarly to Python’s `range` function, but it can also accept floating-point arguments:


In [21]:
np.arange(10)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [22]:
np.arange(1.5, 2, 0.1)

array([1.5, 1.6, 1.7, 1.8, 1.9])

- Also very useful are `numpy.linspace` and `numpy.logspace`, which generate a sequence of values spaced linearly or logarithmically between two numbers:


In [23]:
np.linspace(10, 20, 4)

array([10.        , 13.33333333, 16.66666667, 20.        ])

In [24]:
np.logspace(1, 3, 4)

array([  10.        ,   46.41588834,  215.443469  , 1000.        ])

- We can create arrays filled with zeros or ones using `numpy.zeros` and `numpy.ones`. By passing a tuple to the `shape` argument instead of a single integer, we can also generate multidimensional arrays:

In [25]:
np.zeros(5)

array([0., 0., 0., 0., 0.])

In [26]:
np.ones((5, 2, 3))

array([[[1., 1., 1.],
        [1., 1., 1.]],

       [[1., 1., 1.],
        [1., 1., 1.]],

       [[1., 1., 1.],
        [1., 1., 1.]],

       [[1., 1., 1.],
        [1., 1., 1.]],

       [[1., 1., 1.],
        [1., 1., 1.]]])

In [27]:
c = np.full((2,2,2,2), 4)  # Create a constant array
print(c)               # Prints "[[ 7.  7.]
                       #          [ 7.  7.]]"

[[[[4 4]
   [4 4]]

  [[4 4]
   [4 4]]]


 [[[4 4]
   [4 4]]

  [[4 4]
   [4 4]]]]


In [28]:
# Corresponds to whatever was left in memory. Using zeros for initialising arrays is usually saver.
np.empty(shape=(2, 3, 2))

array([[[ 4.68293136e-310,  0.00000000e+000],
        [ 5.83665407e-315,  0.00000000e+000],
        [ 1.30077963e-258,  2.10966031e-321]],

       [[-1.11190339e-262,  5.28614192e-308],
        [ 1.40406108e-309,  3.73305803e-301],
        [ 1.34164568e-301,  6.72812621e-310]]])

In [29]:
# empty is faster than initialized arrays but usually doesn't make a difference
%timeit np.empty(shape=100_000)
%timeit np.ones(shape=100_000)

141 ns ± 1.34 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
12.5 μs ± 33.8 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)


In [30]:
d = np.eye(3)         # Create a 2x2 identity matrix
print(d)              # Prints "[[ 1.  0.]
                      #          [ 0.  1.]]"

[[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]


In [31]:
e = np.random.random((2,2,2,2))     # Create an array filled with random values
print(e)                            # Might print "[[ 0.91940167  0.08143941]
                                    #               [ 0.68744134  0.87236687]]"

[[[[0.66813678 0.51223623]
   [0.99186461 0.10947994]]

  [[0.73995002 0.53591944]
   [0.03925507 0.30572672]]]


 [[[0.29953368 0.05070483]
   [0.77964292 0.1909645 ]]

  [[0.48108779 0.87617989]
   [0.56279217 0.5683731 ]]]]


You can read about other methods of creating arrays in the [documentation](http://docs.scipy.org/doc/numpy/user/basics.creation.html#arrays-creation).


## Data Types

Every NumPy array is a grid of elements of the same type.  
NumPy provides a wide range of numerical data types that you can use to construct arrays.  
When you create an array, NumPy tries to infer the data type automatically, but functions that construct arrays usually also include an optional argument to explicitly specify the data type.  

Here’s an example:


In [32]:
x = np.array([1, 2])   # Let numpy choose the datatype
print(x.dtype)         # Prints "int64"

int64


In [33]:
x = np.array([1.0, 2.0])   # Let numpy choose the datatype
print(x.dtype)             # Prints "float64"

float64


In [34]:
x = np.array([1, 2], dtype=np.int64)   # Force a particular datatype
print(x.dtype)

int64


### dtype

`dtype` provides information about the data type.  
Arrays can contain bools, ints, unsigned ints, floats, or complex numbers of various byte sizes.  
They can also store strings or Python objects, but this has very few practical use cases.


In [35]:
values = [0, 1, 2, 3, 4]
int_arr = np.array(values, dtype='int')
int_arr, int_arr.dtype

(array([0, 1, 2, 3, 4]), dtype('int64'))

If the specified `dtype` does not match the given values, NumPy will cast everything to that data type.


In [36]:
bool_arr = np.array(values, dtype='bool')
bool_arr, bool_arr.dtype

(array([False,  True,  True,  True,  True]), dtype('bool'))

If no explicit data type is specified, NumPy chooses the “smallest common denominator.”  
In the following example, everything is converted to a float, since integers can be represented as floats, but not the other way around.


In [37]:
values = [0, 1, 2.5, 3, 4]
float_arr = np.array(values)
float_arr, float_arr.dtype

(array([0. , 1. , 2.5, 3. , 4. ]), dtype('float64'))

Once the data type has been defined, all values are strictly enforced to match that type.


In [38]:
int_arr[1] = 2.5
int_arr, int_arr.dtype

(array([0, 2, 2, 3, 4]), dtype('int64'))

These non-Python data types force us to once again consider issues such as overflow and similar limitations.


In [39]:
values = [0, 1, 2, 3, 4]
uint_arr = np.array(values, dtype='uint8')
uint_arr, uint_arr.dtype

(array([0, 1, 2, 3, 4], dtype=uint8), dtype('uint8'))

In [40]:
uint_arr[1] += 255
uint_arr

  uint_arr[1] += 255


array([0, 0, 2, 3, 4], dtype=uint8)

...and this can lead to certain issues when comparing them with standard Python types.


In [41]:
print(type(uint_arr[0]), type(183))

<class 'numpy.uint8'> <class 'int'>


In [42]:
val = 1.2 - 1.0
arr = np.array([val], dtype=np.float32)
print(f'{val} == {arr[0]} -> {val == arr[0]}')

0.19999999999999996 == 0.20000000298023224 -> True


For a more reliable comparison, you can use an epsilon value:


In [43]:
epsilon = 1e-6  # 1*10^(-6); 0.000001
abs(arr[0] - val) < epsilon

np.True_

For a deeper dive into why floating-point calculations can be inaccurate, see the [Python documentation](https://docs.python.org/3/tutorial/floatingpoint.html).

You can read all about NumPy data types in the [documentation](https://numpy.org/doc/stable/reference/arrays.dtypes.html).  


## Working with Arrays

Arrays can be combined **element-wise** using the standard operators `+-*/**`:


In [44]:
x = np.array([[1,2],[3,4]], dtype=np.float64)
y = np.array([[5,6],[7,8]], dtype=np.float64)
print(x)
print(y)

[[1. 2.]
 [3. 4.]]
[[5. 6.]
 [7. 8.]]


In [45]:
# Elementwise sum; both produce the array
# [[ 6.0  8.0]
#  [10.0 12.0]]
print(x + y)
print(np.add(x, y))

[[ 6.  8.]
 [10. 12.]]
[[ 6.  8.]
 [10. 12.]]


In [46]:
# Elementwise difference; both produce the array
# [[-4.0 -4.0]
#  [-4.0 -4.0]]
print(x - y)
print(np.subtract(x, y))

[[-4. -4.]
 [-4. -4.]]
[[-4. -4.]
 [-4. -4.]]


In [47]:
# Elementwise product; both produce the array
# [[ 5.0 12.0]
#  [21.0 32.0]]
print(x * y)
print(np.multiply(x, y))

[[ 5. 12.]
 [21. 32.]]
[[ 5. 12.]
 [21. 32.]]


In [48]:
# Elementwise division; both produce the array
# [[ 0.2         0.33333333]
#  [ 0.42857143  0.5       ]]
print(x / y)
print(np.divide(x, y))

[[0.2        0.33333333]
 [0.42857143 0.5       ]]
[[0.2        0.33333333]
 [0.42857143 0.5       ]]


In [49]:
# Elementwise square root; produces the array
# [[ 1.          1.41421356]
#  [ 1.73205081  2.        ]]
print(np.sqrt(x))

[[1.         1.41421356]
 [1.73205081 2.        ]]


In [50]:
x = np.array([1,2,3])
y = np.array([4,5,6])
print(x)
print(y)

[1 2 3]
[4 5 6]


In [51]:
x + 2 * y

array([ 9, 12, 15])

In [52]:
x ** y

array([  1,  32, 729])

With `@` you can even perform matrix multiplication.  
In the case of 1D arrays, this corresponds to the inner product between two vectors.


In [53]:
x @ y

np.int64(32)

In [54]:
# That's the same as
np.sum(x * y)

np.int64(32)

In [55]:
x.dot(y)

np.int64(32)

> **Note:** For Python lists, these operators are defined completely differently!


Note that in contrast to MATLAB, `*` performs element-wise multiplication rather than matrix multiplication.  
Instead, we use the `dot` function to compute inner products of vectors, multiply a vector by a matrix, and multiply matrices.  

`dot` is available both as a function in the NumPy module and as an instance method of array objects:


In [56]:
x = np.array([[1,2],[3,4]])
y = np.array([[5,6],[7,8]])

v = np.array([9,10])
w = np.array([11, 12])

In [57]:
# Inner product of vectors; both produce 219
print(v.dot(w))
print(np.dot(v, w))

219
219


In [58]:
# Inner product of vectors; both produce 219
print(v.dot(w))
print(np.dot(v, w))

219
219


In [59]:
# Matrix / matrix product; both produce the rank 2 array
# [[19 22]
#  [43 50]]
print(x.dot(y))
print(np.dot(x, y))

[[19 22]
 [43 50]]
[[19 22]
 [43 50]]


## Applying Functions to Arrays

While functions from the `math` module such as `sin` or `exp` can be applied to single numbers, the corresponding functions from the `numpy` module can be applied directly to arrays.  
**The function is applied to every element of the array** and i


In [60]:
phi = np.linspace(0, 2 * np.pi, 10) # 10 values between 0 and 2π
np.sin(phi) # The sine of each of these values

array([ 0.00000000e+00,  6.42787610e-01,  9.84807753e-01,  8.66025404e-01,
        3.42020143e-01, -3.42020143e-01, -8.66025404e-01, -9.84807753e-01,
       -6.42787610e-01, -2.44929360e-16])

In [61]:
arr = np.arange(-9, 9)
arr

array([-9, -8, -7, -6, -5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5,  6,  7,
        8])

In [62]:
np.log(arr)

  np.log(arr)
  np.log(arr)


array([       nan,        nan,        nan,        nan,        nan,
              nan,        nan,        nan,        nan,       -inf,
       0.        , 0.69314718, 1.09861229, 1.38629436, 1.60943791,
       1.79175947, 1.94591015, 2.07944154])

In [63]:
np.exp(arr)

array([1.23409804e-04, 3.35462628e-04, 9.11881966e-04, 2.47875218e-03,
       6.73794700e-03, 1.83156389e-02, 4.97870684e-02, 1.35335283e-01,
       3.67879441e-01, 1.00000000e+00, 2.71828183e+00, 7.38905610e+00,
       2.00855369e+01, 5.45981500e+01, 1.48413159e+02, 4.03428793e+02,
       1.09663316e+03, 2.98095799e+03])

In [64]:
np.sin(arr)

array([-0.41211849, -0.98935825, -0.6569866 ,  0.2794155 ,  0.95892427,
        0.7568025 , -0.14112001, -0.90929743, -0.84147098,  0.        ,
        0.84147098,  0.90929743,  0.14112001, -0.7568025 , -0.95892427,
       -0.2794155 ,  0.6569866 ,  0.98935825])

`np.sign` returns -1 for negative values, +1 for positive values, and 0 for zero:

In [65]:
np.sign(arr)

array([-1, -1, -1, -1, -1, -1, -1, -1, -1,  0,  1,  1,  1,  1,  1,  1,  1,
        1])

NumPy provides many useful functions for performing computations on arrays; one of the most useful is `sum`:


In [66]:
x = np.array([[1,2],[3,4]])

print(np.sum(x))  # Compute sum of all elements; prints "10"
print(np.sum(x, axis=0))  # Compute sum of each column; prints "[4 6]"
print(np.sum(x, axis=1))  # Compute sum of each row; prints "[3 7]"

10
[4 6]
[3 7]


In addition, there are other functions that compute properties of an array:


In [67]:
x = np.linspace(0, 10, 100)
np.sum(x), np.mean(x), np.std(x)

(np.float64(500.0), np.float64(5.0), np.float64(2.9157646512850626))

These functions generalize to multiple dimensions by specifying the axis along which the computation should be performed:


In [68]:
x = np.array([[ 1, 2 ], [ 3, 4 ]])
np.sum(x), np.sum(x, axis=0), np.sum(x, axis=1)

(np.int64(10), array([4, 6]), array([3, 7]))

Apart from computing mathematical functions with arrays, we often need to reshape or otherwise manipulate data within arrays.  
The simplest example of this type of operation is transposing a matrix. To transpose a matrix, simply use the `T` attribute of an array object:


In [69]:
x = np.array([[1,2], [3,4]])
print(x)    # Prints "[[1 2]
            #          [3 4]]"
print(x.T)  # Prints "[[1 3]
            #          [2 4]]"

[[1 2]
 [3 4]]
[[1 3]
 [2 4]]


In [70]:
# Note that taking the transpose of a rank 1 array does nothing:
v = np.array([1, 2, 3])
print(v)    # Prints "[1 2 3]"
print(v.T)  # Prints "[1 2 3]"

[1 2 3]
[1 2 3]


Always try to use vectorized ufuncs instead of explicit loops!  

Using these operators/universal functions is generally faster than writing out the operations manually:


In [71]:
func1 = lambda: np.repeat(np.arange(1, 4), 30).reshape(3, -1).T.flatten()  # noqa: E731
func2 = lambda: np.arange(3 * 30) % 3 + 1  # noqa: E731
func3 = lambda: np.array([[1, 2, 3] for _ in range(30)]).flatten()  # noqa: E731

print(func1())
print(func2())
print(func3())

%timeit func1()
%timeit func2()
%timeit func3()

[1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1
 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2
 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3]
[1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1
 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2
 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3]
[1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1
 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2
 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3]


2.04 μs ± 25.5 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.74 μs ± 10.6 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
6.41 μs ± 72.9 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)


### Random Values

`np.random` includes a variety of functions for generating arrays filled with random values from different probability distributions.


In [72]:
np.random.random((3, 3))

array([[0.59890043, 0.36160733, 0.35258061],
       [0.4445614 , 0.6260483 , 0.50785895],
       [0.89112932, 0.77679426, 0.86881184]])

In [73]:
np.random.randint(0, 10, (5, 5))

array([[7, 0, 5, 3, 7],
       [5, 6, 3, 2, 1],
       [6, 0, 1, 8, 6],
       [9, 6, 6, 5, 9],
       [3, 5, 8, 9, 4]])

With `np.random.randint` and a boolean dtype, you can generate random boolean arrays!


### Repeating Values

With `np.repeat`, elements of an array are repeated:


In [74]:
np.repeat(3, 5)

array([3, 3, 3, 3, 3])

In [75]:
np.repeat([[1,2], [3,4]], 2)

array([1, 1, 2, 2, 3, 3, 4, 4])

`np.tile` is another way to repeat values with NumPy.


In [76]:
print('Repeat:', np.repeat([1, 2, 3], 3))
print('Tile:', np.tile([1, 2, 3], 3))

Repeat: [1 1 1 2 2 2 3 3 3]
Tile: [1 2 3 1 2 3 1 2 3]


### Reshape

In [77]:
a = np.arange(start=2, stop=14)
print(a.shape)
a

(12,)


array([ 2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13])

In [78]:
b = a.reshape(3, 4)
b

array([[ 2,  3,  4,  5],
       [ 6,  7,  8,  9],
       [10, 11, 12, 13]])

-1 as the axis automatically infers the size of the corresponding dimension


In [79]:
a.reshape(-1, 2)

array([[ 2,  3],
       [ 4,  5],
       [ 6,  7],
       [ 8,  9],
       [10, 11],
       [12, 13]])

Example: We want to create a 2D array where each row is `[1, 2, 3]`, and the array should have 10 rows.


In [80]:
print(np.repeat(np.arange(1, 4), 10).reshape(-1, 10).T, "\n")
print(np.tile(np.arange(1, 4), 10).reshape(10, -1))

[[1 2 3]
 [1 2 3]
 [1 2 3]
 [1 2 3]
 [1 2 3]
 [1 2 3]
 [1 2 3]
 [1 2 3]
 [1 2 3]
 [1 2 3]] 

[[1 2 3]
 [1 2 3]
 [1 2 3]
 [1 2 3]
 [1 2 3]
 [1 2 3]
 [1 2 3]
 [1 2 3]
 [1 2 3]
 [1 2 3]]


### Comparing Arrays


In [81]:
epsilon = 0.000000000001
a = np.zeros((3, 3))
a[0, 0] += epsilon  # a[0][0] -> list

b = np.zeros((3, 3))
print(a)
print(b)

[[1.e-12 0.e+00 0.e+00]
 [0.e+00 0.e+00 0.e+00]
 [0.e+00 0.e+00 0.e+00]]
[[0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]]


In [82]:
a == b

array([[False,  True,  True],
       [ True,  True,  True],
       [ True,  True,  True]])

In [83]:
(a == b).all()

np.False_

In [84]:
c = np.array([])
d = np.array([1])
(c == d).all()

np.True_

Issues with this approach:
* If either `a` or `b` is empty and the other contains a single element, it will return `True` (the comparison `a == b` yields an empty array, for which the `all` operator returns `True`).
* If `a` and `b` do not have the same shape and are not broadcastable, this approach will raise an error.

Instead, use NumPy’s built-in functions!



In [85]:
np.array_equal(c, d)

False

In [86]:
np.allclose(a, b)

True

In [87]:
np.isclose(a, b)

array([[ True,  True,  True],
       [ True,  True,  True],
       [ True,  True,  True]])

The complete list of mathematical functions provided by NumPy can be found in the [documentation](https://numpy.org/doc/stable/reference/routines.math.html).


---

Lecture: AI I - Basics 

Excersie: [**Excersie 3.1: Numpy**](../03_data/exercises/01_numpy.ipynb)

Next: [**Chapter 3.2: Pandas**](../03_data/02_pandas.ipynb)