## NumPy Demo Examples
Demo examples based on and selected from material by UZH ZI & David Pinezich.

In [1]:
import numpy as np
np.__version__


'2.3.3'

## NumPy Arrays
* Python's vanilla lists are heterogeneous: Each item in the list can be of a different data type
 * Comes at a cost: Each item in the list must contain its own type info and other information 
 * It is much more efficient to store data in a fixed-type array (all elements are of the same type)
* NumPy arrays are homogeneous: Each item in the list is of the same type
 * They are much more efficient for storing and manipulating data

## Creating NumPy Arrays
* Use the `np.array()` method to create a NumPy array:

In [14]:
example = np.zeros((4,4,4), dtype=int)  #
print(example.ndim)
example

3


array([[[0, 0, 0, 0],
        [0, 0, 0, 0],
        [0, 0, 0, 0],
        [0, 0, 0, 0]],

       [[0, 0, 0, 0],
        [0, 0, 0, 0],
        [0, 0, 0, 0],
        [0, 0, 0, 0]],

       [[0, 0, 0, 0],
        [0, 0, 0, 0],
        [0, 0, 0, 0],
        [0, 0, 0, 0]],

       [[0, 0, 0, 0],
        [0, 0, 0, 0],
        [0, 0, 0, 0],
        [0, 0, 0, 0]]])

In [7]:
ex2 = np.arange(3,31,3)
ex2

array([ 3,  6,  9, 12, 15, 18, 21, 24, 27, 30])

## Multidimensional NumPy Arrays
* _One-dimensional_ array: we only need one coordinate to address a single item, namely an integer index
* _Multidimensional_ array: we now need multiple indices to address a single item
 * For an $n$-dimensional array we need up to $n$ indices to address a single item
 * We're going to mainly work with two-dimensional arrays in this course, i.e. $n=2$ 

In [12]:
twodim = np.array([[1,2,3],
                   [4,5,6],
                   [7,8,9]])

twodim[0,2] = 111
print(twodim.ndim)
print(twodim.shape)

2
(3, 3)


## Array Indexing
* Array indexing for one-dimensional arrays works as usual: `onedim[0]`
* Accessing items in a two-dimensional array requires you to specify two indices: `twodim[0,1]`
* First index is the row number (here `0`), second index is the column number (here `1`)
 
### NumPy Array Attributes
* The type of a NumPy array is `numpy.ndarray` ($n$-dimensional array):

In [9]:
example = np.array([0,1,2,3])
type(example)

numpy.ndarray

* Useful array attributes
 * `ndim`: The number of dimensions, e.g. for a two-dimensional array its just 2 
 * `shape`: Tuple containing the size of each dimension
 * `size`: The total size of the array (total number of elements)

In [10]:
rng = np.random.RandomState(41) # Ensure that the same random numbers are generated each time we run this code
x1 = rng.randint(10, size=6) # One-dimensional array
x2 = rng.randint(10, size=(3, 4)) # Two-dimensional array
print("x2 ndim: ", x2.ndim)
print("x2 shape:", x2.shape)
print("x2 size: ", x2.size)
print("x2 dtype: ", x2.dtype)

x2 ndim:  2
x2 shape: (3, 4)
x2 size:  12
x2 dtype:  int64


## Creating Arrays from Scratch
* NumPy provides a wide range of functions for the creation of arrays:<br>
  https://docs.scipy.org/doc/numpy-1.15.4/reference/routines.array-creation.html#routines-array-creation 
 * For example: `np.arange`, `np.zeros`, `np.ones`, `np.linspace`, etc.
* NumPy also provides functions to create arrays filled with random data:<br>
  https://docs.scipy.org/doc/numpy-1.15.1/reference/routines.random.html
 * For example: `np.random.random`, `np.random.randint`, etc.

In [5]:
np.zeros(10, dtype=int)

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

In [7]:
np.full((3, 5), 3.14)

array([[3.14, 3.14, 3.14, 3.14, 3.14],
       [3.14, 3.14, 3.14, 3.14, 3.14],
       [3.14, 3.14, 3.14, 3.14, 3.14]])

In [8]:
np.arange(0, 20, 2)

array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18])

In [10]:
np.random.random((3, 3))

array([[0.41729229, 0.25587389, 0.96706724],
       [0.16044103, 0.28557146, 0.95684705],
       [0.54043281, 0.93135554, 0.44030644]])

In [11]:
np.random.randint(0, 10, (3, 3))

array([[3, 9, 0],
       [5, 7, 4],
       [0, 2, 1]])

## Array Slicing: One-Dimensional Subarrays
* The NumPy slicing syntax follows that of the standard Python list: `x[start:stop:step]`

In [15]:
x = np.arange(10)
x

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [21]:
print(x[::2])
print(x[1::2])
print(x[3::3])

[0 2 4 6 8]
[1 3 5 7 9]
[3 6 9]


In [22]:
x[5:]

array([5, 6, 7, 8, 9])

In [23]:
x[::-1]

array([9, 8, 7, 6, 5, 4, 3, 2, 1, 0])

## Array Slicing: Multidimensional Subarrays
* Let `x2` be a two-dimensional NumPy array. Multiple slices are now separated by commas: `x2[start:stop:step, start:stop:step]`

In [24]:
x2 = np.array([[9, 7, 5, 8], [3, 3, 2, 6], [0, 4, 6, 9]])
x2

array([[9, 7, 5, 8],
       [3, 3, 2, 6],
       [0, 4, 6, 9]])

In [31]:
x2[::-1]

array([[0, 4, 6, 9],
       [3, 3, 2, 6],
       [9, 7, 5, 8]])

In [26]:
x2[::-1, ::-1]

array([[9, 6, 4, 0],
       [6, 2, 3, 3],
       [8, 5, 7, 9]])

In [29]:
x2[:3, ::2] # All rows, every other column, equal to x2[:, ::2]

array([[9, 5],
       [3, 2],
       [0, 6]])

In [30]:
x2[:, 0] # Select the first column of x2

array([9, 3, 0])

In [21]:
x2[1, :] # Select the second row of x2

array([3, 3, 2, 6])

In [32]:
x2[1] # Select the second row of x2

array([3, 3, 2, 6])

## Array Views and Copies
* With Python lists, the slices will be _copies_: If we modify the subarray, only the copy gets changed
* With NumPy arrays, the slices will be _direct views_: If we modify the subarray, the original array gets changed, too
 * Very useful: When working with large datasets, we don't need to copy any data (costly operation)
* Creating copies: We can use the `copy()` method of a slice to create a copy of the specific subarray
 * Note: The type of a slice is again `numpy.ndarray`

In [33]:
x2_sub_copy = x2[:2, :2].copy()
x2_sub_copy

array([[9, 7],
       [3, 3]])

In [34]:
x2_sub_copy[0, 0] = 42

In [50]:
x2[x2<5] = 100
x2

array([[  9,   7,   5,   8],
       [100, 100, 100,   6],
       [100, 100,   6,   9]])

In [51]:
x2_sub_copy

array([[42,  7],
       [ 3,  3]])

## Array Concatenation and Splitting
* Concatenation, or joining of two or multiple arrays in NumPy can be accomplished through the functions `np.concatenate, np.vstack, and np.hstack`
 * Join multiple two-dimensional arrays: `np.concatenate([twodim1, twodim2,…], axis=0)`
   * A two-dimensional array has two axes: The first running vertically downwards across rows (axis `0`), and the second running horizontally across columns (axis `1`)
* The opposite of concatenation is splitting, which is provided by the functions `np.split, np.hsplit` (split horizontally), and `np.vsplit` (split vertically)
 * For each of these we can pass a list of indices giving the split points

In [52]:
x = np.array([1, 2, 3])
y = np.array([3, 2, 1])
np.concatenate([x, y])

array([1, 2, 3, 3, 2, 1])

In [56]:
grid = np.array([[1, 2, 3], [4, 5, 6]])
np.concatenate([grid, grid], axis=0)

array([[1, 2, 3],
       [4, 5, 6],
       [1, 2, 3],
       [4, 5, 6]])

In [57]:
np.concatenate([grid, grid, grid, grid], axis=1)

array([[1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3],
       [4, 5, 6, 4, 5, 6, 4, 5, 6, 4, 5, 6]])

In [58]:
x = np.array([1, 2, 3])
grid = np.array([[9, 8, 7],
                 [6, 5, 4]])

np.vstack([x, grid])

array([[1, 2, 3],
       [9, 8, 7],
       [6, 5, 4]])

In [59]:
y = np.array([[99],
              [99]])

np.hstack([grid, y])

array([[ 9,  8,  7, 99],
       [ 6,  5,  4, 99]])

In [62]:
grid = np.arange(16).reshape((4, 4))
grid

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]])

In [86]:
upper, lower = np.hsplit(grid, [2]) #MD: i think naming it left and right makes more sense

In [87]:
upper

array([[ 0,  1],
       [ 4,  5],
       [ 8,  9],
       [12, 13]])

In [88]:
lower

array([[ 2,  3],
       [ 6,  7],
       [10, 11],
       [14, 15]])

In [89]:
top, mid, down = np.vsplit(grid, [1,2])

In [90]:
top

array([[0, 1, 2, 3]])

In [91]:
mid

array([[4, 5, 6, 7]])

In [93]:
down

array([[ 8,  9, 10, 11],
       [12, 13, 14, 15]])

In [102]:
arrone = np.ones(shape=[10,10])
arrone[1:-1, 1:-1] = 0
arrone

array([[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 0., 0., 0., 0., 0., 0., 0., 0., 1.],
       [1., 0., 0., 0., 0., 0., 0., 0., 0., 1.],
       [1., 0., 0., 0., 0., 0., 0., 0., 0., 1.],
       [1., 0., 0., 0., 0., 0., 0., 0., 0., 1.],
       [1., 0., 0., 0., 0., 0., 0., 0., 0., 1.],
       [1., 0., 0., 0., 0., 0., 0., 0., 0., 1.],
       [1., 0., 0., 0., 0., 0., 0., 0., 0., 1.],
       [1., 0., 0., 0., 0., 0., 0., 0., 0., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]])

## Aggregations
* If we want to compute summary statistics for the data in question, aggregates are very useful
  * Common summary statistics: mean, standard deviation, median, minimum, maximum, quantiles, etc.
* NumPy provides fast built-in aggregation functions for working with arrays:

In [94]:
x = np.random.random(10000)
# %timeit np.max(x) # NumPy ufunc
# %timeit max(x)    # Python function

* Summing values in an array:

In [96]:
%timeit np.sum(x) # NumPy ufunc
%timeit sum(x)    # Python function

7.71 μs ± 139 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
793 μs ± 9.18 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)


## Multidimensional Aggregates
* By default, each NumPy aggregation function will return the aggregate over the entire array
* Aggregation functions take an additional argument specifying the axis along which the aggregate is computed
 * For example, we can find the minimum value within each column by specifying `axis=0`:

In [108]:
twodim = np.array([[1,2,3],[0.12, -1, 0.41],[10,9,8]])
print(twodim.min())
print(twodim.min(axis=0))
print(twodim.min(axis=1))

-1.0
[ 0.12 -1.    0.41]
[ 1. -1.  8.]


## Comparison Operators as ufuncs
* NumPy also implements comparison operators as element-wise ufuncs
* The result of these comparison operators is always an array with a Boolean data type:

In [109]:
np.array([1,2,3]) < 2

array([ True, False, False])

* It is also possible to do an element-by-element comparison of two arrays:

In [110]:
np.array([1,2,3]) < np.array([0,4,2])

array([False,  True, False])

## Working with Boolean Arrays: Counting Entries
* The `np.count_nonzero()` function will count the number of `True` entries in a Boolean array

In [111]:
nums = np.array([1,2,3,4,5])
np.count_nonzero(nums < 4)

np.int64(3)

* We can also use the `np.sum()` function to accomplish the same. In this case, `True` is interpreted as `1` and `False` as `0`:

In [117]:
print(np.sum(nums < 4)) # occurences True's of entries <4
print(np.sum(nums[nums <4])) # sum of all entries <4

3
6


* NumPy also implements bitwise logic operators as element-wise ufuncs
* We can use these bitwise logic operators to construct compound conditions (consisting of multiple conditions)

In [120]:
(nums < 2) | (nums > 3)

array([ True, False, False,  True,  True])

## Boolean Arrays as Masks
* In the previous slides we looked at aggregates computed directly on Boolean arrays
* Once we have a Boolean array from lets say a comparison, we can select the entries that meet the condition by using the Boolean array as a _mask_

In [121]:
x = np.array([[3,1,5],[10,32,100],[-1,3,4]])
x[x<5]

array([ 3,  1, -1,  3,  4])