___
<h1> Machine Learning </h1>
<h2> M. Sc. in Electrical and Computer Engineering </h2>
<h3> Instituto Superior de Engenharia / Universidade do Algarve </h3>

[LESTI](https://ise.ualg.pt/curso/1941) / [ISE](https://ise.ualg.pt) / [UAlg](https://www.ualg.pt)

Pedro J. S. Cardoso (pcardoso@ualg.pt)

___

# Up and running?

Let us check if everything is up and running.

Try to run the following cell. If it works, you are ready to go!


In [None]:
import numpy as np
import matplotlib
import pandas as pd
import seaborn as sns
import sklearn
import scipy

print("numpy version: ", np.__version__)
print("matplotlib version: ", matplotlib.__version__)
print("pandas version: ", pd.__version__)
print("seaborn version: ", sns.__version__)
print("sklearn version: ", sklearn.__version__)
print("scipy version: ", scipy.__version__)

If anything is missing, try to install it using the following command:

In [None]:
!pip install numpy matplotlib pandas seaborn scikit-learn scipy IPython

So, if you are here, you are ready to go!


# Matplotlib
During this course, we will use the [matplotlib](https://matplotlib.org/) library to plot our data. For example, let us plot a simple function, $f(x) = x^2$.

In [None]:
import matplotlib.pyplot as plt
import numpy as np

x = np.linspace(-10, 10, 100)
y = x**2

plt.plot(x, y)

# Add labels
plt.xlabel('x')
plt.ylabel('y')

# Add title
plt.title('Plot of $f(x) = x^2$')

# Show plot - this is not needed in Jupyter notebooks
plt.show()

## Exercise 1

Plot the function $f(x) = \sin(x)$ in the interval $[-\pi, \pi]$. Remember to add labels and a title to the plot.

In [None]:
# Your code here

## Exercise 2

Plot the function $f(x) = \sin(x)$ in the interval $[-\pi, \pi]$ and the function $g(x) = \cos(x)$ in the same plot. Remember to add legend, labels and a title to the plot.

In [None]:
# Your code here

# Numpy

Numpy is a library for scientific computing. It provides a high-performance multidimensional array object, and tools for working with these arrays. For a quick introduction to numpy, see [here](https://numpy.org/devdocs/user/quickstart.html).

With numpy, we can easily create arrays. For example, let us create a 1-dimensional array with 10 elements, all equal to 0.

In [None]:
import numpy as np

a = np.zeros(10)
print(a)

We can also create a 2-dimensional array with 10 rows and 5 columns, all equal to 1.

In [None]:
b = np.ones((10, 5))
print(b)


Operations with arrays are very easy. For example, let us create two arrays, $a$ and $b$, and compute $c = a + b$.

In [None]:
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
c = a + b
print(c)

This is called element-wise addition. We can also compute the dot product of two arrays, $a$ and $b$, using the `dot` function.

In [None]:
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
c = np.dot(a, b)
print(c)

Computing the inverse of a matrix is also very easy. For example, let us compute the inverse of the following matrix:

$$
A = \begin{bmatrix}
1 & 2 & 3 \\
4 & 5 & 6 \\
7 & 8 & 9
\end{bmatrix}
$$


In [None]:
A = np.array([[2, 2, 3], [4, 54, 6], [7, 8, 19]])
A_inv = np.linalg.inv(A)
print(A_inv)

And checking if the inverse is correct:

In [None]:
print(np.dot(A, A_inv))

To print with a given precision, we can use the following:

In [None]:
np.set_printoptions(precision=2)
print(np.dot(A, A_inv))

We can also compute the transpose of a matrix:

In [None]:
print(A.T)

## Exercise 3

Define the following matrices in numpy:

$$
A = \begin{bmatrix}
1 & 2 & 3 \\
4 & 5 & 6
\end{bmatrix}
$$

$$
B = \begin{bmatrix}
1 & 2 \\
3 & 4 \\
5 & 6
\end{bmatrix}
$$

Now, compute the following matrix operation:
1. $C = A B$
2. $D = A^T B$
3. $E = A B^T$
4. $F = A^T B^T$
5. $G = A^T A$



# Pandas

Pandas is a library that provides data structures and data analysis tools. One of the data structures provided by pandas is the DataFrame, which is a 2-dimensional labeled data structure with columns of potentially different types. You can think of it like a spreadsheet or SQL table, or a dict of Series objects. VERY USEFUL!

For a quick introduction to pandas, see [here](https://pandas.pydata.org/pandas-docs/stable/getting_started/10min.html).

Let us start by creating a DataFrame from a dictionary of lists.

In [None]:
import pandas as pd

data = {'Name': ['Pedro', 'Ana', 'João', 'Maria'],
        'Age': [20, 30, 40, 50],
        'Height': [1.8, 1.6, 1.7, 1.9]}
df = pd.DataFrame(data)
df

We can also easily load data from a CSV file. For example, let us use the world's population data from the https://github.com/datasets/population/ repository. The data can be downloaded from [here](https://raw.githubusercontent.com/datasets/population/main/data/population.csv).

In [None]:
df = pd.read_csv('https://raw.githubusercontent.com/datasets/population/main/data/population.csv')
df

Save the DataFrame to a CSV file.

In [None]:
df.to_csv('population.csv')

We can also easily access the data in a given column. For example, let us access the data in the `Country Name` column.

In [None]:
df['Country Name']

We can also access the data in a given row. For example, let us access the data in the first row.

In [None]:
df.iloc[0]

Get the population of Portugal through the years.

In [None]:
query = df['Country Name'] == 'Portugal'
df[query]

Get the population of Portugal in 2018.

In [None]:
query = (df['Country Name'] == 'Portugal') & (df['Year'] == 2018)
df[query]

Plot the population of Portugal through the years.

In [None]:
query = df['Country Name'] == 'Portugal'
df[query].plot(x='Year', y='Value')

In [None]:
This is just a (hyper) quick introduction to pandas. We will use it a more during this course.

## Exercise 4

What was the world's population in 2018?
hint: use the `sum` function

In [None]:
# Your code here

## Exercise 5

Plot the population of Portugal and Spain through the years.

In [None]:
# Your code here

## Exercise 6

Which country had the largest population in 2018?
Hint: use the `max` function to get the maximum value. Maybe you need to discard the `World` values in the `Country Name` column.

In [None]:
# Your code here