## NumPy Demo Examples
Demo examples based on and selected from material by UZH ZI & David Pinezich.

In [1]:
import numpy as np

np.__version__


'1.26.2'

## NumPy Arrays
* Python's vanilla lists are heterogeneous: Each item in the list can be of a different data type
 * Comes at a cost: Each item in the list must contain its own type info and other information 
 * It is much more efficient to store data in a fixed-type array (all elements are of the same type)
* NumPy arrays are homogeneous: Each item in the list is of the same type
 * They are much more efficient for storing and manipulating data

## Creating NumPy Arrays
* Use the `np.array()` method to create a NumPy array:

In [67]:
example = np.array([0,1,2,5])
example


numpy.ndarray

## Multidimensional NumPy Arrays
* _One-dimensional_ array: we only need one coordinate to address a single item, namely an integer index
* _Multidimensional_ array: we now need multiple indices to address a single item
 * For an $n$-dimensional array we need up to $n$ indices to address a single item
 * We're going to mainly work with two-dimensional arrays in this course, i.e. $n=2$ 

In [98]:
twodim = np.array([[1,2,3],
                   [4,5,6],
                   [7,8,9]])
one = np.array([1,2,3])
print(one[:1])

[1]


## Array Indexing
* Array indexing for one-dimensional arrays works as usual: `onedim[0]`
* Accessing items in a two-dimensional array requires you to specify two indices: `twodim[0,1]`
* First index is the row number (here `0`), second index is the column number (here `1`)
 
### NumPy Array Attributes
* The type of a NumPy array is `numpy.ndarray` ($n$-dimensional array):

In [4]:
example = np.array([0,1,2,3])
type(example)

numpy.ndarray

* Useful array attributes
 * `ndim`: The number of dimensions, e.g. for a two-dimensional array its just 2 
 * `shape`: Tuple containing the size of each dimension
 * `size`: The total size of the array (total number of elements)

In [9]:
rng = np.random.RandomState(41) # Ensure that the same random numbers are generated each time we run this code
x1 = rng.randint(10, size=6) # One-dimensional array
x2 = rng.randint(10, size=(3, 4)) # Two-dimensional array
print("x2 ndim: ", x2.ndim)
print("x2 shape:", x2.shape)
print("x2 size: ", x2.size)
print("x2 dtype: ", x2.dtype)

x2 ndim:  2
x2 shape: (3, 4)
x2 size:  12
x2 dtype:  int32


## Creating Arrays from Scratch
* NumPy provides a wide range of functions for the creation of arrays:<br>
  https://docs.scipy.org/doc/numpy-1.15.4/reference/routines.array-creation.html#routines-array-creation 
 * For example: `np.arange`, `np.zeros`, `np.ones`, `np.linspace`, etc.
* NumPy also provides functions to create arrays filled with random data:<br>
  https://docs.scipy.org/doc/numpy-1.15.1/reference/routines.random.html
 * For example: `np.random.random`, `np.random.randint`, etc.

In [5]:
np.zeros(10, dtype=int)

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

In [7]:
np.full((3, 5), 3.14)

array([[3.14, 3.14, 3.14, 3.14, 3.14],
       [3.14, 3.14, 3.14, 3.14, 3.14],
       [3.14, 3.14, 3.14, 3.14, 3.14]])

In [8]:
np.arange(0, 20, 2)

array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18])

In [10]:
np.random.random((3, 3))

array([[0.41729229, 0.25587389, 0.96706724],
       [0.16044103, 0.28557146, 0.95684705],
       [0.54043281, 0.93135554, 0.44030644]])

In [11]:
np.random.randint(0, 10, (3, 3))

array([[3, 9, 0],
       [5, 7, 4],
       [0, 2, 1]])

## Array Slicing: One-Dimensional Subarrays
* The NumPy slicing syntax follows that of the standard Python list: `x[start:stop:step]`

In [13]:
x = np.arange(10)
x

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [14]:
x[:5]

array([0, 1, 2, 3, 4])

In [15]:
x[5:]

array([5, 6, 7, 8, 9])

In [16]:
x[::-1]

array([9, 8, 7, 6, 5, 4, 3, 2, 1, 0])

## Array Slicing: Multidimensional Subarrays
* Let `x2` be a two-dimensional NumPy array. Multiple slices are now separated by commas: `x2[start:stop:step, start:stop:step]`

In [None]:
x2 = np.array([[9, 7, 5, 8], [3, 3, 2, 6], [0, 4, 6, 9]])

array([[9, 7, 5, 8],
       [3, 3, 2, 6],
       [0, 4, 6, 9]])

In [18]:
x2[:2, :3]

array([[9, 7, 5],
       [3, 3, 2]])

In [19]:
x2[:3, ::2] # All rows, every other column

array([[9, 5],
       [3, 2],
       [0, 6]])

In [20]:
x2[:, 0] # Select the first column of x2

array([9, 3, 0])

In [21]:
x2[1, :] # Select the second row of x2

array([3, 3, 2, 6])

In [22]:
x2[1] # Select the second row of x2

array([3, 3, 2, 6])

## Array Views and Copies
* With Python lists, the slices will be _copies_: If we modify the subarray, only the copy gets changed
* With NumPy arrays, the slices will be _direct views_: If we modify the subarray, the original array gets changed, too
 * Very useful: When working with large datasets, we don't need to copy any data (costly operation)
* Creating copies: We can use the `copy()` method of a slice to create a copy of the specific subarray
 * Note: The type of a slice is again `numpy.ndarray`

In [11]:
x2_sub_copy = x2[:2, :2].copy()
x2_sub_copy

array([[9, 7],
       [3, 3]])

In [12]:
x2_sub_copy[0, 0] = 42

In [13]:
x2

array([[9, 7, 5, 8],
       [3, 3, 2, 6],
       [0, 4, 6, 9]])

In [14]:
x2_sub_copy

array([[42,  7],
       [ 3,  3]])

## Array Concatenation and Splitting
* Concatenation, or joining of two or multiple arrays in NumPy can be accomplished through the functions `np.concatenate, np.vstack, and np.hstack`
 * Join multiple two-dimensional arrays: `np.concatenate([twodim1, twodim2,…], axis=0)`
   * A two-dimensional array has two axes: The first running vertically downwards across rows (axis `0`), and the second running horizontally across columns (axis `1`)
* The opposite of concatenation is splitting, which is provided by the functions `np.split, np.hsplit` (split horizontally), and `np.vsplit` (split vertically)
 * For each of these we can pass a list of indices giving the split points

In [21]:
x = np.array([1, 2, 3])
y = np.array([3, 2, 1])
np.concatenate([x, y])

array([1, 2, 3, 3, 2, 1])

In [22]:
grid = np.array([[1, 2, 3], [4, 5, 6]])
np.concatenate([grid, grid])

array([[1, 2, 3],
       [4, 5, 6],
       [1, 2, 3],
       [4, 5, 6]])

In [24]:
np.concatenate([grid, grid, grid, grid], axis=1)

array([[1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3],
       [4, 5, 6, 4, 5, 6, 4, 5, 6, 4, 5, 6]])

In [27]:
x = np.array([1, 2, 3])
grid = np.array([[9, 8, 7],
                 [6, 5, 4]])

np.vstack([x, grid])

array([[1, 2, 3],
       [9, 8, 7],
       [6, 5, 4]])

In [28]:
y = np.array([[99],
              [99]])

np.hstack([grid, y])

array([[ 9,  8,  7, 99],
       [ 6,  5,  4, 99]])

In [29]:
grid = np.arange(16).reshape((4, 4))
grid

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]])

In [33]:
upper, lower = np.hsplit(grid, [2])

In [34]:
upper

array([[ 0,  1],
       [ 4,  5],
       [ 8,  9],
       [12, 13]])

In [35]:
lower

array([[ 2,  3],
       [ 6,  7],
       [10, 11],
       [14, 15]])

## Aggregations
* If we want to compute summary statistics for the data in question, aggregates are very useful
  * Common summary statistics: mean, standard deviation, median, minimum, maximum, quantiles, etc.
* NumPy provides fast built-in aggregation functions for working with arrays:

In [45]:
x = np.random.random(10000)
%timeit np.max(x) # NumPy ufunc
%timeit max(x)    # Python function

10.5 µs ± 289 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
764 µs ± 43.3 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)


* Summing values in an array:

In [46]:
%timeit np.sum(x) # NumPy ufunc
%timeit sum(x)    # Python function

13.2 µs ± 1.05 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
1.32 ms ± 457 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)


## Multidimensional Aggregates
* By default, each NumPy aggregation function will return the aggregate over the entire array
* Aggregation functions take an additional argument specifying the axis along which the aggregate is computed
 * For example, we can find the minimum value within each column by specifying `axis=0`:

In [46]:
twodim = np.array([[1,2,3],[0.12, -1, 0.41],[10,9,8]])
twodim.min(axis=0)

array([ 0.12, -1.  ,  0.41])

## Comparison Operators as ufuncs
* NumPy also implements comparison operators as element-wise ufuncs
* The result of these comparison operators is always an array with a Boolean data type:

In [47]:
np.array([1,2,3]) < 2

array([ True, False, False])

* It is also possible to do an element-by-element comparison of two arrays:

In [48]:
np.array([1,2,3]) < np.array([0,4,2])

array([False,  True, False])

## Working with Boolean Arrays: Counting Entries
* The `np.count_nonzero()` function will count the number of `True` entries in a Boolean array

In [49]:
nums = np.array([1,2,3,4,5])
np.count_nonzero(nums < 4)

3

* We can also use the `np.sum()` function to accomplish the same. In this case, `True` is interpreted as `1` and `False` as `0`:

In [50]:
np.sum(nums < 4)

3

* NumPy also implements bitwise logic operators as element-wise ufuncs
* We can use these bitwise logic operators to construct compound conditions (consisting of multiple conditions)

In [51]:
(nums < 2) | (nums > 3)

array([ True, False, False,  True,  True])

## Boolean Arrays as Masks
* In the previous slides we looked at aggregates computed directly on Boolean arrays
* Once we have a Boolean array from lets say a comparison, we can select the entries that meet the condition by using the Boolean array as a _mask_

In [52]:
x = np.array([[3,1,5],[10,32,100],[-1,3,4]])
x[x<5]

array([ 3,  1, -1,  3,  4])