# About me

My name is Ricardo Cruz. I am doing a PhD in Informatics, and I am working with machine learning and computer vision. https://rpmcruz.github.io/

If you interested in neural networks and etc, I recommend you do your Masters thesis with **Professor Jaime S. Cardoso**.

<img src="https://drive.google.com/uc?export=view&id=1wv592v9tLQz6P1MVyOn3Aly43xuXjvit">

# Python Packages

We will focus today on packages for scientific programming.

<img src="https://drive.google.com/uc?export=view&id=1mfqhkkfNZQSW0rjhqp_mhrWqQndpbPA6">

\* the ones to be discussed today

# NumPy

We have been creating matrices using lists of lists.

For example, we would write $$M=\begin{pmatrix}1&2&3\\4&5&6\end{pmatrix}$$

as

In [0]:
M = [
    [1, 2, 3],
    [4, 5, 6]
]
M

In [0]:
M = [
    [1, 2, 3],
    [4, 5, 6]
]

What do I do if I want to select the **first row**?

In [0]:
# demonstration

In [0]:
M = [
    [1, 2, 3],
    [4, 5, 6]
]

What do I do if I want to select the **third column**?

<img width="200" height="200" src="https://drive.google.com/uc?export=view&id=1_dtTWsVYnwHV1b4vTT6Wa30XcB8B9sYg">

It isn't as easy, uh?

In [0]:
# demonstration

# NumPy to the rescue

In [0]:
M = [
    [1, 2, 3],
    [4, 5, 6]
]

# demonstration

# YES !!

<img src="https://drive.google.com/uc?export=view&id=1ZoGkLqZoTQwYlRQcvXsUuvMJscdDjEpn">

Furthermore: we don't need to be concerned about aliasing. `M.copy()` copies everything.

# NumPy
## How to create arrays

| Type | Example |
|:-|:-|
| From lists | `np.array(M)` |
| Zeros  | `np.zeros((2, 3))` |
| Ones   | `np.ones((2, 3))` |
| Random uniform | `np.random.rand(2, 3)` |
| Random gaussian | `np.random.randn(2, 3)` |

(Why does numpy call them *arrays*?)

In [0]:
# demonstration
# (show .shape)

# NumPy
## Indexing & slicing

The difference in slicing between python list and numpy array is that we can access different axis at the same time:

In [0]:
a = np.zeros((5, 10, 2, 7))
a.shape

In [0]:
b = a[:, 3:10, -1, :]
b.shape

# Notice that indexing using negative values `a[-2]` is supported like in Python lists.

Let's create a matrix of zeros with a square of ones:

$$\begin{pmatrix}0&0&0&0&0\\0&1&1&1&0\\0&1&1&1&0\\0&1&1&1&0\\0&0&0&0&0\\\end{pmatrix}$$

In [0]:
# demonstration
M = ...

In [0]:
import matplotlib.pyplot as plt
plt.imshow(M)
plt.show()

Let's create a circle:

In [0]:
# demonstration
M = ...

In [0]:
import matplotlib.pyplot as plt
plt.imshow(M)
plt.show()

## What is an image?

An image is simply three matrices, each matrix represents a color: <b><span style="color:red">R</span><span style="color:green">G</span><span style="color:blue">B</span></b>.

```python
R = ...
G = ...
B = ...
```

Let's draw a red circle.

In [0]:
# demonstration (use np.stack)

# NumPy
## Aggregate operations

| Type | Example |
|:-|:-|
| Sum | `np.sum(array, axis)` |
| Average  | `np.mean(array, axis)` |
| Standard deviation   | `np.std(array, axis)` |

In [0]:
# demonstration

# NumPy
## Arithmetic operations

| Type | Example |
|:-|:-|
| element-wise addition | `A+B` |
| element-wise product | `A*B` |
| matrix multiplication | `np.dot(A, B)` |

In [0]:
# demonstration

## Arithmetic broadcasting

NumPy supports arithmetic broadcasting:

In [0]:
A = np.array([[1, 2, 3], [4, 5, 6]])
B = np.array([[1, 2, 3]])
print('A:', A.shape, 'B:', B.shape)

In [0]:
A*B

## Arithmetic broadcasting

As long as A and B have the same dimensions (axes), NumPy automatically (implicitly) repeats the arithmetic operation along the dimension of different size.

This is a time-saver but can also be a source of bugs. "Implicit is better than explicit." Not in NumPy. :P

*Other languages:* MATLAB 2016b added this feature which Octave already supported. Julia supports broadcasting but requires being explicit annotation.

# NumPy application

<img src="https://drive.google.com/uc?export=view&id=1Be2pZIIovo0pCogsiEX_EFrtZdXGIeFE">

1. Face recognition?
1. Background replacement?

[change slide]

The main Python package for computer vision is **OpenCV**:
* It is quite old (2000)
* The primary languages are C and C++
* Python is the main secundary language
* OpenCV is a bit archaic but it has many features.

In [0]:
import cv2
import numpy as np

cap = cv2.VideoCapture(0)

while True:
    _, frame = cap.read()

    ...
    ...
    
    cv2.imshow('frame', frame)
    if cv2.waitKey(1) == 27:  # Esc
        break

cap.release()
cv2.destroyAllWindows()

# Pandas

<img src="https://drive.google.com/uc?export=view&id=17zVEbtr5m_tdbZ9m_bWZMFXnSxkJFRqQ">

<img src="https://drive.google.com/uc?export=view&id=1mfqhkkfNZQSW0rjhqp_mhrWqQndpbPA6">

Working with matrices in NumPy is better than using lists, right?

But it could be even better. Pandas extends NumPy and Matplotlib:

1. You can easily import/export to CSV or Excel
1. You can access columns and rows using names
1. It has functions to merge matrices according to a certain column
1. It has functions to group and aggregate by categories
1. It has functions to work with time series

Matrices in Pandas are called **data frames** like in R and other statistical languages.

Why use Python when other languages exist for statistics? Python is a generic programming language, therefore it is easier to interoperate and deploy in real systems.

# Pandas
## Analyze students grades

First, let us get your grades: https://moodle.up.pt/grade/report/grader/index.php?id=2126

In [0]:
from pandas_ods_reader import read_ods
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [0]:
df = read_ods('FEUP-EIC0005-20192020-1S Pauta.ods', 1)

In [0]:
df.columns

In [0]:
# Let us get some things we want
df = df[['Endereço de e-mail', 'Total da unidade curricular (Real)']]

In [0]:
df

In [0]:
df.columns = ['email', 'nota']
df

In [0]:
df['nota']

In [0]:
np.mean(df['nota'])

In [0]:
df['nota'] = df['nota'].replace('-', np.nan)

In [0]:
np.mean(df['nota'])

In [0]:
df

In [0]:
# LE=10%
# RE=10%
# PE=50%
# TE=30%
df['nota'] = df['nota'] / 0.7

In [0]:
df

In [0]:
df['nota'].plot(kind='hist')

In [0]:
plt.hist(df['nota'])

In [0]:
df.boxplot('nota')

<img src="https://drive.google.com/uc?export=view&id=1N8fn2wpFSVX8a4qliVJdN8YOVo8zXJCf">

I suggest we either:

1. Break grades by class
1. Break grades by gender
1. Break grades by student year.

# Conclusion

We have learned about:
1. Numpy
1. Pandas
1. Matplotlib
1. OpenCV

<img src="https://drive.google.com/uc?export=view&id=1pLbwUJwCBQV9M1oZRX8DaQ79RooWAUQ6">

### Questions?