### 📘 Lesson 2: Introduction Numpy and Pandas

<div style="display: flex; align-items: center; justify-content: space-between;">
  <div>
    <h3>Notebook Developers</h3>
    <ul>
      <li><strong>Dr. Fabrizio Finozzi</strong> - Big Data Software Developer</li>
      <li><strong>Priyesh Gosai</strong> - Energy Systems Modeler and Training Coordinator</li>
    </ul>
  </div>
  <div>
    <a href="https://openenergytransition.org/index.html">
      <img src="https://openenergytransition.org/assets/img/oet-logo-red-n-subtitle.png" height="60" alt="OET">
    </a>
  </div>
</div>


##### 🎯 Learning Objectives  

* Gain an understanding of Numpy and Pandas, including their purposes and applications.
* Learn to utilize key functions within these libraries.
* Engage in practical exercises to effectively apply Numpy and Pandas tools.



### Introduction to NumPy



---

**What is NumPy**

NumPy is a Python package that brings the computational power of languages like C++
and Fortran to Python. NumPy is the backbone of many other Python packages that span several applications, as shown below

<img src="img/numpy_applications.png" width="500">

How to import NumPy

In [2]:
import numpy as np

**Why NumPy**

NumPy (which is an acronym for **Nu**merical **Py**thon) is a multi-dimensional array library. Data can therefore be stored in one-, two- or n-dimensional arrays. The array object in NumPy is referred to as `ndarray`. NumPy arrays are several times faster than traditional Python lists. This is because (behind the scenes) NumPy is developed in C++. `ndarrays` are also more efficient with memory usage. This is because Python lists may host several types of data at the same time, whereas `ndarrays` can only host numerical data. Finally NumPy can run sub-tasks in parallel. There is in fact the possibility to vectorize operations without the need of using `for` loops to cycle through the array and performing the calculations on each single element.


With respect to terminology:
- a **scalar** can be viewed as a 0-dimension `ndarray`, therefore it has no shape
- a **vector** can be viewed as a 1-dimension `ndarray`, therefore it has shape(n,)
- a **matrix** can be viewed as a 2-dimensions `ndarray`, therefore it has shape(n, m)
and so forth.

In [3]:
scalar = np.array(4)
vector = np.array([1,2,3])
matrix = np.array([[1,2,3], [4,5,6]])

for arr in [scalar, vector, matrix]:
    print("dimensions:", arr.ndim, "| shape:", arr.shape)

dimensions: 0 | shape: ()
dimensions: 1 | shape: (3,)
dimensions: 2 | shape: (2, 3)


In [4]:
matrix.ndim

2

This code snippet makes use of the attributes `ndim` and `shape`, that return respectively an integer corresponding  to the number of dimensions and a tuple of integers corresponding to the array dimensions. The full list of `ndarray` attributes and methods is available at this [link](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.html).

**NumPy overview**

#### Slicing
`ndarray` slicing works as the slicing of lists.

In [5]:
vector = np.array([1,2,3,4,5,6,7])
print(vector)

[1 2 3 4 5 6 7]


In [6]:
# the first and the last elements are given by
print(vector[0], vector[-1])

1 7


In [7]:
# a slice from the second to the fourth element (included) is given by
vector[1:4]

array([2, 3, 4])

In [8]:
# a slice from the first to the last element in steps of three is given by
vector[::3]

array([1, 4, 7])

The slicing works in the same way even for higher dimensions `ndarray`. Let us consider the matrix

In [9]:
matrix = np.array([[1,2,3,4,5,6,7], [8,9,10,11,12,13,14]])
print(matrix)

[[ 1  2  3  4  5  6  7]
 [ 8  9 10 11 12 13 14]]


In [10]:
matrix[1,2]

10

In [11]:
# a slice that returns the first row is given by
matrix[0, :]

array([1, 2, 3, 4, 5, 6, 7])

In [12]:
# a slice that returns the second row is given by
matrix[1, :]

array([ 8,  9, 10, 11, 12, 13, 14])

In [13]:
# a slice that returns the second through the fourth columns is instead given by
matrix[:, 2:5]

array([[ 3,  4,  5],
       [10, 11, 12]])

Slicing is instrumental also when replacing elements of an `ndarray`. For example, the code to replace the second through the fourth columns with **ones** is

In [14]:
matrix[:, 2:5] = [[1, 1, 1], [1, 1, 1]]
matrix

array([[ 1,  2,  1,  1,  1,  6,  7],
       [ 8,  9,  1,  1,  1, 13, 14]])

#### Array manipulation

NumPy provides methods to re-organize and re-shape arrays.

The methods *vstack* and *hstack* enable to vertically and horizontally stack existing `ndarrays`.

In [15]:
vector_one = np.array([1,2,3,4,5,6,7])
vector_two = np.array([8,9,10,11,12,13,14])

In [16]:
# vertical stacking
print(np.vstack([vector_one,vector_two]))

[[ 1  2  3  4  5  6  7]
 [ 8  9 10 11 12 13 14]]


In [17]:
# horizontal stacking
print(np.hstack([vector_one,vector_two]))

[ 1  2  3  4  5  6  7  8  9 10 11 12 13 14]


The method *reshape* instead allows to modify the **shape** of an `ndarray`

In [18]:
original_matrix = np.vstack([vector_one,vector_two])
original_matrix.shape

(2, 7)

In [19]:
original_matrix.reshape((7,2))

array([[ 1,  2],
       [ 3,  4],
       [ 5,  6],
       [ 7,  8],
       [ 9, 10],
       [11, 12],
       [13, 14]])

In [20]:
# original_matrix = original_matrix.reshape((7,2))

#### Zeros and ones

The following code snippets can be used to create `ndarrays` containing only **zeros** or **ones**. Namely:

In [21]:
# zeros vector
vector_zeros = np.zeros(5)
print(vector_zeros)

[0. 0. 0. 0. 0.]


In [22]:
# zeros matrix
matrix_zeros = np.zeros((5,7))
print(matrix_zeros)

[[0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0.]]


In [23]:
# ones vector
vector_ones = np.ones(5)
print(vector_ones)

[1. 1. 1. 1. 1.]


In [24]:
# ones matrix
matrix_ones = np.ones((5,7))
print(matrix_ones)

[[1. 1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1. 1.]]


In [25]:
matrix_ones = np.array(i for i in range(5))

In [26]:
matrix_ones

array(<generator object <genexpr> at 0x000002647BBAA340>, dtype=object)

In [27]:
matrix_ones

array(<generator object <genexpr> at 0x000002647BBAA340>, dtype=object)

#### Linear algebra

A NumPy `ndarray` supports *vectorized* operations. For example, it is possible to add, subtract, multiply or divide each element of an `ndarray` using the following compact form (**Note**: this is **not** possible with lists)

In [28]:
matrix_ones = np.ones((5,7))
print(matrix_ones + 2)

[[3. 3. 3. 3. 3. 3. 3.]
 [3. 3. 3. 3. 3. 3. 3.]
 [3. 3. 3. 3. 3. 3. 3.]
 [3. 3. 3. 3. 3. 3. 3.]
 [3. 3. 3. 3. 3. 3. 3.]]


In [29]:
print(matrix_ones * 4)

[[4. 4. 4. 4. 4. 4. 4.]
 [4. 4. 4. 4. 4. 4. 4.]
 [4. 4. 4. 4. 4. 4. 4.]
 [4. 4. 4. 4. 4. 4. 4.]
 [4. 4. 4. 4. 4. 4. 4.]]


In [30]:
print(matrix_ones - 2)

[[-1. -1. -1. -1. -1. -1. -1.]
 [-1. -1. -1. -1. -1. -1. -1.]
 [-1. -1. -1. -1. -1. -1. -1.]
 [-1. -1. -1. -1. -1. -1. -1.]
 [-1. -1. -1. -1. -1. -1. -1.]]


In [31]:
print(matrix_ones / 4)

[[0.25 0.25 0.25 0.25 0.25 0.25 0.25]
 [0.25 0.25 0.25 0.25 0.25 0.25 0.25]
 [0.25 0.25 0.25 0.25 0.25 0.25 0.25]
 [0.25 0.25 0.25 0.25 0.25 0.25 0.25]
 [0.25 0.25 0.25 0.25 0.25 0.25 0.25]]


The *extended* versions of the code snippets above involve the use of a `for` loop to cycle through each element of the `ndarray`.

NumPy provides also a function to perform a [matrix multiplication](https://en.wikipedia.org/wiki/Matrix_multiplication).

In [32]:
matrix_ones = np.ones((5,7))
matrix_threes = (matrix_ones + 3).transpose()

np.matmul(matrix_ones, matrix_threes)

array([[28., 28., 28., 28., 28.],
       [28., 28., 28., 28., 28.],
       [28., 28., 28., 28., 28.],
       [28., 28., 28., 28., 28.],
       [28., 28., 28., 28., 28.]])

The `@` operator can be used as a shorthand for `np.matmul` on `ndarrays`.

#### Other useful operations

The important *statistics* of an `ndarray` can be determined with the following methods

In [33]:
# minimum of an ndarray
vector.min()

1

In [34]:
# maximum of an ndarray
vector.max()

7

In [35]:
# mean of an ndarray
vector.mean()

4.0

It is also possible to apply these methods to n-dimensional `ndarray`. For example

In [36]:
# maximum element of the matrix for each column
matrix.max(axis=0)

array([ 8,  9,  1,  1,  1, 13, 14])

In [37]:
# maximum element of the matrix for each row
matrix.max(axis=1)

array([ 7, 14])

Finally, the **all** and **any** return `True` or `False` if (respectively) all or any of the elements of the `ndarray` fulfill a given condition. For example

In [38]:
np.any(vector==9)

False

In [39]:
np.any(vector==1)

True

In [40]:
np.all(vector==9)

False

In [41]:
np.all(vector==1)

False

In [42]:
np.all(vector_ones==1)

True

It is also possible to apply these methods to n-dimensional `ndarray` as well. For example

In [43]:
print("Are all the elements by column of the matrix greater than 0?", np.all(matrix > 0, axis=0))
print("Are all the elements by row of the matrix greater than 0?", np.all(matrix > 0, axis=1))
print("Is any of the elements along the columns of the matrix greater than 6?", np.all(matrix > 6, axis=0))
print("Is any of the elements along the rows of the matrix greater than 6?", np.all(matrix > 6, axis=1))

Are all the elements by column of the matrix greater than 0? [ True  True  True  True  True  True  True]
Are all the elements by row of the matrix greater than 0? [ True  True]
Is any of the elements along the columns of the matrix greater than 6? [False False False False False False  True]
Is any of the elements along the rows of the matrix greater than 6? [False False]


### Introduction to Pandas


---

**What is Pandas**

Pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool,
built on top of the Python programming language (primarly on top of NumPy). It enables also to process data as one would do with SQL.

**How to import Pandas**

In [44]:
#!pip install pandas
import pandas as pd

**Pandas overview**



A Pandas `DataFrame` is a two-dimensional, size-mutable and potentially heterogeneous (it means it can contain different types of data) tabular data. Dataframes can be built from dictionaries, where the dictionary keys are the column names and the dictionary values are strings that contain the column values

In [45]:
df_from_dict = pd.DataFrame({"Name": ["Tom", "Paul", "John", "Sarah"], "Age": [31, 42, 12, 56], "Shoe-size": [35, 42, 36, 31]})

In [46]:
df_from_dict

Unnamed: 0,Name,Age,Shoe-size
0,Tom,31,35
1,Paul,42,42
2,John,12,36
3,Sarah,56,31


An alternative is to use only lists

In [47]:
data_list = [["Tom", 31, 35], ["Paul", 42, 42], ["John", 12, 36], ["Sarah", 56, 31]]
data_column_name = ["Name", "Age", "Shoe-size"]
df_from_list = pd.DataFrame(data_list, columns=data_column_name)

Data can be fed into a Pandas `DataFrame` from a file. Pandas supports several file formats, which can be found at this [link](https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html). For example the syntax from an excel file would be `df_read_from_file = pd.read_excel(file_name, sheet_name = sheet_name)`.

**Pandas `Dataframe` attributes and methods**

Pandas offers built-in methods and attributes of the `DataFrame` class which are useful to describe the data contained in the dataframe. Given a `DataFrame` called `dataframe_name`:
- `dataframe_name.shape`: the attribute returns a tuple representing the dimensionality (number of rows and columns) of the DataFrame
- `dataframe_name.head(n)`: the method displays the first `n` rows of the dataframe
- `dataframe_name.tail(n)`: the method displays the last `n` rows of the dataframe
- `dataframename.sample(n)`: the method returns another dataframe, containing a random sample of `n` rows from the original dataframe. If no argument is passed, the method returns a dataframe with just one row. One can also execute `dataframe_name.sample(frac=0.1)`. This will return a dataframe with `10 %` of the rows from the original dataframe
- `dataframe_name.describe()`: the method returns a 8 × ncols dataframe. If `dataframe_name` contains numeric data, the method returns for each column, `count` (number of rows containing non-null values), `mean`, `std`, `min`, `max`, the `25 percentile`, `50 percentile` (the median) and `75 percentile`
- `dataframe_name.isna()`: the method returns a boolean same-sized dataframe, indicating if the values are `NA`. `NA` values, such as `None` or `numpy.NaN`, get mapped to `True`. Everything else gets mapped to `False`. Characters such as empty strings `" "` or `numpy.inf` are **not** considered `NA` values. The snippet `dataframe_name.isna().sum()` returns the number of rows in each column where `NA` values are present

In [48]:
df_from_list

Unnamed: 0,Name,Age,Shoe-size
0,Tom,31,35
1,Paul,42,42
2,John,12,36
3,Sarah,56,31


In [49]:
df_from_list.shape

(4, 3)

In [51]:
df_from_list.head()

Unnamed: 0,Name,Age,Shoe-size
0,Tom,31,35
1,Paul,42,42
2,John,12,36
3,Sarah,56,31


In [52]:
df_from_list.tail(2)

Unnamed: 0,Name,Age,Shoe-size
2,John,12,36
3,Sarah,56,31


In [53]:
df_from_list.sample(1)

Unnamed: 0,Name,Age,Shoe-size
2,John,12,36


In [54]:
df_from_list.describe()

Unnamed: 0,Age,Shoe-size
count,4.0,4.0
mean,35.25,36.0
std,18.571932,4.546061
min,12.0,31.0
25%,26.25,34.0
50%,36.5,35.5
75%,45.5,37.5
max,56.0,42.0


In [55]:
df_from_list.isna()

Unnamed: 0,Name,Age,Shoe-size
0,False,False,False
1,False,False,False
2,False,False,False
3,False,False,False


**Accessing the columns of a Pandas `Dataframe`**

The column `column_name` of a DataFrame `dataframe_name` can be accessed as `dataframe_name[”column_name”]`. For example

In [56]:
df_from_list["Age"]

0    31
1    42
2    12
3    56
Name: Age, dtype: int64

In [57]:
print(type(df_from_list["Age"]))

<class 'pandas.core.series.Series'>


The snippet above returns a Pandas object called a `Series`. Where the `DataFrame` is (at least) a two-dimensional object, a `Series` is instead a one-dimensional object. One can say that a `DataFrame` is a *container* for several `Series`, that correspond to the `DataFrame` columns.

**Please note**: the Series contains only the data of the column. It does not contain the column name available in the `DataFrame` header.

#### Taking a slice of a Pandas `DataFrame`

It is also possible to extract some columns from a `DataFrame` and get a `DataFrame` as a result. This can be achieved as

In [58]:
df_age_from_list = df_from_list[["Age"]]

In [59]:
df_age_from_list

Unnamed: 0,Age
0,31
1,42
2,12
3,56


In [None]:
print(type(df_age_from_list))

**Adding or dropping columns in a `DataFrame`**

The snippet below can be used to add a new column to an existing `DataFrame`

In [60]:
df_from_list["Country"]=["US", "DE", "UK", "IT"]
df_from_list

Unnamed: 0,Name,Age,Shoe-size,Country
0,Tom,31,35,US
1,Paul,42,42,DE
2,John,12,36,UK
3,Sarah,56,31,IT


The snippet instead can be used to drop a column

In [61]:
df_from_list = df_from_list.drop("Country", axis=1)
df_from_list

Unnamed: 0,Name,Age,Shoe-size
0,Tom,31,35
1,Paul,42,42
2,John,12,36
3,Sarah,56,31


The snippet below remains valid also to drop a row. This can be achieved with `axis=0`.

**Further `DataFrame` methods**

It is possible to add new columns to an existing `DataFrame` named `dataframe_name` with the following methods:
- `dataframe_name.map()`: given a mapping dictionary, the method applies element-wise a certain mapping to a `Series` object (as a column)

In [62]:
df_from_list["Country"]=["US", "DE", "UK", "IT"]
df_from_list["Country_numeric_map"] = df_from_list["Country"].map({"US": 0.0, "DE": 1.0, "UK": 2.0})
df_from_list

Unnamed: 0,Name,Age,Shoe-size,Country,Country_numeric_map
0,Tom,31,35,US,0.0
1,Paul,42,42,DE,1.0
2,John,12,36,UK,2.0
3,Sarah,56,31,IT,


- `dataframe_name.replace`: it works as the `map` method, but it has the difference that if the `Series` contains a category not mapped in the mapping dictionary, the method simply carries over the unmapped category in the new column (instead of assigning a `NaN` to it)

In [63]:
df_from_list["Country_numeric_replace"] = df_from_list["Country"].replace({"US": 0.0, "DE": 1.0, "UK": 2.0})
df_from_list

Unnamed: 0,Name,Age,Shoe-size,Country,Country_numeric_map,Country_numeric_replace
0,Tom,31,35,US,0.0,0.0
1,Paul,42,42,DE,1.0,1.0
2,John,12,36,UK,2.0,2.0
3,Sarah,56,31,IT,,IT


- `dataframe_name.apply()`: the method applies a function to all values within a Pandas `Series` or `DataFrame`. The applied function can be either built-in or custom
- `dataframe_name.applymap()`: the method applies a built-in or custom function element-wise to a `DataFrame` object
- `dataframe_name.iloc()` and `dataframe_name.loc()`: the methods are used to slice a `DataFrame` by filtering on columns and/or rows. `iloc` performs the slicing using indexes or index positions, while `loc` using labels or names

In [64]:
# iloc - filter the second row
df_from_list.iloc[[1]]

Unnamed: 0,Name,Age,Shoe-size,Country,Country_numeric_map,Country_numeric_replace
1,Paul,42,42,DE,1.0,1.0


Please note that `df_from_list.iloc[1]` would have returned a Pandas `Series` instead of a Pandas `DataFrame`. Furthermore

In [66]:
df_from_list

Unnamed: 0,Name,Age,Shoe-size,Country,Country_numeric_map,Country_numeric_replace
0,Tom,31,35,US,0.0,0.0
1,Paul,42,42,DE,1.0,1.0
2,John,12,36,UK,2.0,2.0
3,Sarah,56,31,IT,,IT


In [65]:
# iloc - filter the second column
df_from_list.iloc[:, 1]

0    31
1    42
2    12
3    56
Name: Age, dtype: int64

In [67]:
# iloc - filter just the second, the fourth and the fifth columns
df_from_list.iloc[:, [1,3,4]]

Unnamed: 0,Age,Country,Country_numeric_map
0,31,US,0.0
1,42,DE,1.0
2,12,UK,2.0
3,56,IT,


In [68]:
# loc - filter the second row
df_from_list.loc[[1]]

Unnamed: 0,Name,Age,Shoe-size,Country,Country_numeric_map,Country_numeric_replace
1,Paul,42,42,DE,1.0,1.0


Please note that `df_from_list.loc[1]` would have returned a Pandas `Series` instead of a Pandas `DataFrame`. Furthermore

In [69]:
# loc - filter the second column
df_from_list.loc[:, "Age"]

0    31
1    42
2    12
3    56
Name: Age, dtype: int64

In [70]:
# loc - filter just the second, the fourth and the fifth columns
df_from_list.loc[:, ["Age", "Country", "Country_numeric_map"]]

Unnamed: 0,Age,Country,Country_numeric_map
0,31,US,0.0
1,42,DE,1.0
2,12,UK,2.0
3,56,IT,


- `dataframe_name.rename()`: the method renames a column, by means of a mapping dictionary

In [71]:
df_from_list.rename(columns={"Age": "Age_renamed"})

Unnamed: 0,Name,Age_renamed,Shoe-size,Country,Country_numeric_map,Country_numeric_replace
0,Tom,31,35,US,0.0,0.0
1,Paul,42,42,DE,1.0,1.0
2,John,12,36,UK,2.0,2.0
3,Sarah,56,31,IT,,IT


**Merging and aggregating  `DataFrame`**

The `merge` function in Pandas performs on `DataFrame` the equivalent operations of an SQL join. Given two dataframes, the merge happens along shared column/s that exist in both of them. The basic syntax is `pd.merge(df_a ,df_b ,how=”type_of_join”,on=”col_name”)`. The full documentation is available at this [link](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.merge.html).

Pandas allows to perform `groupby` operations. They involve grouping by a column, applying an aggregating function and combining the results. Given the following `DataFrame`

In [72]:
data_dictionary = {
    'Name': ['John', 'Paul', 'Tom', 'Bob', 'Ronan',
                    'Kirby', 'Sarah', 'Joe', 'Donald', 'Jeffrey'],
    'Department': ['Administration', 'Marketing', 'Technical', 'Technical', 'Marketing',
                          'Administration', 'Technical', 'Marketing', 'Technical', 'Administration'],
    'Employment Type': ['Full-time', 'Intern', 'Intern', 'Part-time', 'Part-time',
                               'Full-time', 'Full-time', 'Intern', 'Intern', 'Full-time'],
    'Salary': [12, 50, 700, 700, 550,
                      12, 1250, 600, 111, 12]}

# Create the DataFrame
df = pd.DataFrame(data_dictionary)
df

Unnamed: 0,Name,Department,Employment Type,Salary
0,John,Administration,Full-time,12
1,Paul,Marketing,Intern,50
2,Tom,Technical,Intern,700
3,Bob,Technical,Part-time,700
4,Ronan,Marketing,Part-time,550
5,Kirby,Administration,Full-time,12
6,Sarah,Technical,Full-time,1250
7,Joe,Marketing,Intern,600
8,Donald,Technical,Intern,111
9,Jeffrey,Administration,Full-time,12


the snippet below groups column `Salary` by column `Department` and applies a `sum`

In [73]:
df.groupby("Department")["Salary"].sum()

Department
Administration      36
Marketing         1200
Technical         2761
Name: Salary, dtype: int64

or the snippet below groups column `Department` by column `Employment Type` and counts the occurences

In [74]:
df.groupby("Employment Type")["Department"].count()

Employment Type
Full-time    4
Intern       4
Part-time    2
Name: Department, dtype: int64

In [79]:
pd.read_excel("data\Lesson4_model.xlsx",sheet_name='generators')

  warn(msg)


Unnamed: 0,name,bus,carrier,p_nom,marginal_cost,efficiency,p_nom_extendable,p_min_pu,p_max_pu,ramp_limit_up,ramp_limit_down,committable,min_up_time,min_down_time,start_up_cost,shut_down_cost


### Additional Resources


- [Numpy website](https://numpy.org/)
- [Pandas website](https://pandas.pydata.org/)

### Exercises


**Exercises on Numpy**

**Exercise 1** - write a program that:

- creates a *vector* (one-dimensional `ndarray`) called `vector_1`, containing five ones
- multiplies `vector_1` by the *scalar* three
- add the *scalar* two to `vector_1`
- creates a *vector* (one-dimensional `ndarray`) called `vector_2`, containing all the numbers from 1 to 5 (with the extremes included)
- vertically stack `vector_1` and `vector_2`
- horizontally stack `vector_1` and `vector_2`

In [None]:
# please provide your code here

In [None]:
# please provide your code here

In [None]:
# please provide your code here

In [None]:
# please provide your code here

In [None]:
# please provide your code here

In [None]:
# please provide your code here

**Exercise 2** - using the snippet below, create a five-by-five matrix (two-dimensional `ndarray`) of randomly generated numbers. Afterwards, write a code that:
- prints the first row
- prints the third and fourth columns
- prints the last element of the last row

In [None]:
# Run this cell for the exercise
# This snippet creates a five-by-five matrix (two-dimensional ndarray) of randomly generated numbers
# Please, do NOT modify this cell
matrix_five_five = np.random.randint(1, 100, size = (5,5))
matrix_five_five

In [None]:
# please provide your code here

In [None]:
# please provide your code here

In [None]:
# please provide your code here

**Exercise on Pandas**

**Exercise 3** - using the snippet below to create a Pandas `DataFrame` `df_exercise`, write a code that:
- checks the *shape* of the `DataFrame`
- prints the first two rows
- prints the last two rows
- samples two rows from the `DataFrame`
- describes the data
- checks if the DataFrame contains `NaN` values

In [None]:
# Run this cell for the exercise
# This snippet creates a Pandas Dataframe
# Please, do NOT modify this cell
data_list_exercise = [["Lagos", 8, "Nigeria"], ["Santiago de Chile", 6, "Chile"], ["Sydney", 5, "Australia"], ["Hanoi", 8, "Vietnam"]]
data_column_name_exercise = ["City", "Population (Millions)", "Country"]
df_exercise = pd.DataFrame(data_list_exercise, columns=data_column_name_exercise)
df_exercise

In [None]:
# please provide your code here

In [None]:
# please provide your code here

In [None]:
# please provide your code here

In [None]:
# please provide your code here

In [None]:
# please provide your code here

In [None]:
# please provide your code here

**Exercise 4** - write a code that modifies `df_exercise` by:
- adding a column `Continent` that maps `Nigeria -> Africa`, `Chile -> America`, `Australia -> Oceania` and `Vietnam -> Asia`
- and that filters the second row of the `DataFrame`

In [None]:
# please provide your code here

In [None]:
# please provide your code here

### 
---