# Data are often represented as multi-dimensional arrays

# Disclaimer

### Many of the images are from https://jalammar.github.io/visual-numpy/

# Spread sheet = 2D array

![](images/csv-sheet.png)

# Time series = 1D array

### e.g. audio waveform

![](images/audio-waveform.png)

# Bunch of EEGs from multiple leads = 2D array

![](images/eeg-waveforms.png)

# Repeated trials for EEG recordings from multiple leads = 3D array

![](images/batch-sequences.png)

# Grayscale image = 2D array

![](images/gray-image.png)

# Color image = 3D array

![](images/rgb-image.png)

# Series of color images = 4D array

### Hard to visualize, but easy to represent as 4D array.

# We can represent multi-dimensional arrays with nested lists, but it's annoying and slow.

# Learning Goals

* You will be able to import and us the numpy module.
* You will be able to manipulate multi-dimensional data arrays using NumPy.
* You will be able to time your code.
* You will appreciate that without NumPy, Python would NOT be a very useful language for data analysis.

# NumPy module

### https://numpy.org/

### See https://jalammar.github.io/visual-numpy/ for a good tutorial.

In [1]:
import numpy as np

data = np.array([1, 2, 5, 8])

data

array([1, 2, 5, 8])

In [2]:
print(data)

[1 2 5 8]


# 1D arrays

![](images/np-array1d.png)
![](images/np-zeros-ones-random-1d.png)

# 1D arrays

In [3]:
a = np.array([1, 2, 3])
z = np.zeros(3)
o = np.ones(3)
r = np.random.random(3)  # random numbers between 0 and 1

a, z, o, r

(array([1, 2, 3]),
 array([0., 0., 0.]),
 array([1., 1., 1.]),
 array([0.34017108, 0.31045504, 0.33215336]))

# NumPy simplifies getting common statistics about our data

In [4]:
data = np.array([1, 2, 5, 8])

data.min(), data.max(), data.sum(), data.prod()

(1, 8, 16, 80)

In [5]:
data.mean(), data.var(), data.std()

(4.0, 7.5, 2.7386127875258306)

In [6]:
np.max(data)

8

# <font color=red>Exercise</font> <font color=blue>~2:10 pm</font>
    
Use NumPy to compute the mean and variance of the array of measurements given below.

In [7]:
np.random.seed(0)  # so we always get the SAME random values
measurements = np.random.random(3)

measurements

array([0.5488135 , 0.71518937, 0.60276338])

In [8]:
measureMean = ...
measureVariance = ...

measureMean, measureVariance

(Ellipsis, Ellipsis)

# NumPy makes array math easy

![](images/np-math1d-add.png)
![](images/np-math1d-sub-mul-div.png)

# NumPy makes array math easy

In [9]:
a = np.array([1, 2, 3])
b = np.ones(3)

a, b

(array([1, 2, 3]), array([1., 1., 1.]))

In [10]:
a+b, a-b, a*a, a/a, a**2

(array([2., 3., 4.]),
 array([0., 1., 2.]),
 array([1, 4, 9]),
 array([1., 1., 1.]),
 array([1, 4, 9]))

# <font color=red>Quiz 1</font> <font color=blue>~2:20 pm</font>

```python
x = np.array([0, 1, 2])
y = np.array([1, 2, 3])
```

What is the reult of $x^2 + y$

*On your honor, please do not test this in a code cell.*

    A) [1, 3, 5]
    B) [1, 5, 11]
    C) [1, 3, 7]

# Broadcasting

![](images/np-broadcast1d.png)

In [11]:
a = np.array([1, 2, 3])
b = a * 10

b

array([10, 20, 30])

In [12]:
a += 10

a

array([11, 12, 13])

# Indexing same as for lists

![](images/np-index1d.png)

In [13]:
data = np.array([1, 2, 3])

data[::2] *= 10

data

array([10,  2, 30])

In [14]:
# !!! Can also index with an arbitrary array.
data[[0, 2]]

array([10, 30])

# <font color=red>Exercise</font> <font color=blue>~2:30 pm</font>

In [None]:
data = np.array([8, 4, 1, 5])

# Add 3 to the last two values data.
...

# Take a minute to collect your thoughts...

# Is the presentation clear?

![](../../images/zoom-yes-no.png)

# Do you want me to go slower or faster?

![](../../images/zoom-slower-faster.png)

# 2D arrays

![](images/np-array2d.png)
![](images/np-zeros-ones-random-2d.png)

# 2D indexing

![](images/np-index2d.png)

# 2D indexing

![](images/np-indexing2d.png)

# <font color=red>Quiz 2</font> <font color=blue>~2:40 pm</font>

![](images/np-data3x2.png)

How would you multiply the right column only by 10?

    A) data[:,1] *= 10
    B) data[:,2] *= 10
    C) data[1,:] *= 10
    D) data[1:,1] *= 10

# Matrix math

![](images/np-math2d-add.png)

In [16]:
data = np.array([[1, 2], [3, 4]])
ones = np.ones((2,2))

data + ones

array([[2., 3.],
       [4., 5.]])

# Broadcasting

![](images/np-broadcast2d.png)

# <font color=red>Quiz 3</font> <font color=blue>~2:45 pm</font>

Is the following operation possible?

```python
np.ones((3,3)) / np.random.random((1,2))
```

*On your honor, please do not test this in a code cell.*

    A) Yes
    B) No

# Matrix multiplication (dot product)

![](images/np-dot-prod.png)

In [17]:
data = np.array([[1, 2, 3]])
powers = np.array([[1, 10], [100, 1000], [10000, 100000]])

dotprod = data.dot(powers)
newdotprod = data @ powers

print(dotprod)
print(newdotprod)

[[ 30201 302010]]
[[ 30201 302010]]


# Transpose

![](images/np-transpose.png)

# 2D array stats

![](images/np-min-max-sum-2d.png)

# Compute stats along a particular dimension

![](images/np-max-axis-2d.png)

# <font color=red>Exercise</font> <font color=blue>~2:55 pm</font>

Mean of audio waveforms.

In [18]:
np.random.seed(0)  # So we always get the SAME random waveforms.

# Three short audio waveforms stored as rows of a matrix.
waveforms = np.random.random((3,5))

waveforms

array([[0.5488135 , 0.71518937, 0.60276338, 0.54488318, 0.4236548 ],
       [0.64589411, 0.43758721, 0.891773  , 0.96366276, 0.38344152],
       [0.79172504, 0.52889492, 0.56804456, 0.92559664, 0.07103606]])

In [19]:
# Compute the average of the three waveforms.
...

# Take a minute to collect your thoughts...

# Is the presentation clear?

![](../../images/zoom-yes-no.png)

# Do you want me to go slower or faster?

![](../../images/zoom-slower-faster.png)

# 3D arrays (and higher dimensions)

![](images/np-array3d.png)
![](images/np-zeros-ones-random-3d.png)

# <font color=red>Quiz 4</font> <font color=blue>~ 3:00 pm</font>

![](images/np-zeros-ones-random-3d.png)

How would you reference the value 0.7 in the random array on the right (assume it is refered by the variable name data)?

    A) data[1,0]
    B) data[2,1,1]
    C) data[0,1,0]
    D) data[1,0,0]

# Array shape

![](images/np-shape.png)

# Array shape

In [20]:
a = np.zeros((2,3))

a.shape

(2, 3)

In [21]:
a.shape[0]

2

In [22]:
(rows, cols) = a.shape

rows, cols

(2, 3)

# Reshape array

![](images/np-reshape.png)

# <font color=red>Exercise</font> <font color=blue>~ 3:05 pm</font>

Extract chunk of EEG recording.

In [23]:
np.random.seed(0)  # So we always get the SAME random EEGs.

# EEG waveforms from 32 channels (leads) over 99 trials.
# eegs[channel,time,trial]
eegs = np.random.random((32,1000,99))

In [None]:
# Extract EEGs for trial 8 only.
eegs_trial8 = ...

In [None]:
# Extract the EEG from the 32nd channel on trial 82.
eeg_32nd_t82 = ...

In [None]:
# For the single extracted EEG above,
# extract only the first 5 time points of the recording.
first5 = ...

first5

# Timing your code (NumPy vs pure Python)

In [27]:
%%timeit
# time this cell

tot = 0
for i in range(50000):
    tot += i

2.26 ms ± 17.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [28]:
# time a single line
%timeit np.sum(np.arange(50000))

39.7 µs ± 1.02 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)


# Unlike lists, NumPy slices are references, NOT copies

In [29]:
data = np.random.random((3,2))

data

array([[0.59856554, 0.18788058],
       [0.28943415, 0.2114099 ],
       [0.12899101, 0.61919738]])

In [30]:
col0 = data[:,0]

col0

array([0.59856554, 0.28943415, 0.12899101])

In [31]:
col0[1] = 0

data

array([[0.59856554, 0.18788058],
       [0.        , 0.2114099 ],
       [0.12899101, 0.61919738]])

# Copying slices

In [32]:
data = np.random.random((3,2))

data

array([[0.61661391, 0.60851789],
       [0.59828832, 0.71833817],
       [0.53472013, 0.51904142]])

In [33]:
col0 = data[:,0].copy()
col0[1] = 0

col0

array([0.61661391, 0.        , 0.53472013])

In [34]:
data

array([[0.61661391, 0.60851789],
       [0.59828832, 0.71833817],
       [0.53472013, 0.51904142]])

# Read/Write a NumPy array from/to file

In [35]:
data = np.random.random((2,3))
                 
data

array([[0.02101152, 0.96046833, 0.06756613],
       [0.48304444, 0.6445569 , 0.59771609]])

In [36]:
np.save("data.npy", data)  # also see np.savetxt

In [37]:
data2 = np.load("data.npy")  # also see np.loadtxt

data2

array([[0.02101152, 0.96046833, 0.06756613],
       [0.48304444, 0.6445569 , 0.59771609]])

# Learning Goals

* You will be able to import and us the numpy module.
* You will be able to manipulate multi-dimensional data arrays using NumPy.
* You will be able to time your code.
* You will appreciate that without NumPy, Python would NOT be a very useful language for data analysis.