# 9. Exercises

This notebook provides three additional blocks of exercises covering different areas. You can choose the topics you are more interested in and start with the corresponding block.

1. **Elevations in Switzerland**: We make use of **NumPy** and **Matplotlib** to analyse the elevation profile of Switzerland.
2. **Printing Patterns**: A slighlty more creative use of **loops** and **conditional statements** is considered to print particular patterns.
3. **Analysing the Tips Dataset**: We have a look at the tips dataset again and work with **Pandas** and **Seaborn**.

***
## 1. Elevations in Switzerland

In the following, you are provided with a NumPy array (stored as a ```.npy``` file in ```data```) containing data on the elevations of Switzerland, normalised to the range [0,1], and visualised below. We will make use of **NumPy** and **Matplotlib** to analyse this dataset a little bit further.

<center><img src="images/tiled_switzerland.png" alt="Tiled Switzerland" width="400"/></center>

The next cell loads the required modules, the dataset, and the maximum elevation (in meters) to rescale the normalised values.

In [None]:
import numpy as np
import matplotlib.pyplot as plt

max_value = 4632.51

ch_tiled = np.load("data/switzerland.npy")

#### (1.1) Inspect array
Inspect the array shape (put solution in the next cell)

and an exemplary tile, e.g. tile 4 with index 3 (put solution in the next cell)

#### (1.2) Rescale values
Undo the normalisation to the [0,1] range by using ```max_value``` and store the result in ```ch_tiled```. 

Put your solution here: 

#### (1.3) Identify tile with highest elevation
Firstly, indentify the maximum value in each tile and store the result in ```max_per_tile``` such that the output looks like

```
In: print(max_per_tile)  
Out: [          nan           nan           nan 1816.67058824  944.66870588
           nan           nan 2125.50458824 2270.83823529 2234.50482353
 2706.83917647           nan 2252.67152941 2579.67223529 2924.83964706
 3560.67435294 3742.34141176 3905.84176471 2634.17235294 3215.50694118
 4360.00941176 4033.00870588 3905.84176471 3905.84176471 2652.33905882
 3815.00823529 4505.34305882 3851.34164706 3887.67505882 4251.00917647
           nan 4396.34282353 4596.17658824 2034.67105882 2688.67247059
           nan]
```

Secondly, identify the tile with the largest value, i.e. the highest elevation.

Hints:
* Make use of ```np.nanmax``` and ```np.nanargmax``` which ignore ```nan``` values to find the maximum value.
* Check out the effect of ```axis=0``` and ```axis=(1,2)``` when identifying the maximum value.

Put your solution for ```max_per_tile``` here:

Put your solution for ```max_tile_index``` here:

#### (1.4) Which tiles are all empty?
Identify the tiles which contain only background, i.e. ```nan```. 

Hint: Make use of the functions ```np.all``` and ```np.isnan```. You might want to check out the following documentation:

In [None]:
help(np.all)

In [None]:
help(np.isnan)

Put your solution here:

#### (1.5) Flatten array into vector
Flatten ```ch_tiled``` into a vector and store the resulting vector in ```ch_flat```.

Put your solution here:

#### (1.6) Plot histogram of the different elevation levels
Plot a histogram by making use of ```ch_flat``` and use 50 bins.

Put your solution here:

#### Bonus exercise (1.7): Recreate the elevation plot
As a bonus exercise, try to recreate the plot shown in the beginning of the exercise as much as possible.

Hints: Make use of ```fig, ax = plt.subplots(...)``` and ```.imshow(tile, vmin=0, vmax=max_value)```

You can put your solution here:

***
## 2. Printing Patterns

In this exercise, we review **loops** and **conditional statements** and make a slightly 
more creative use of them to print the following patterns.

General hint: You can use ```range``` in descending order like this
```Python
for i in range(5,0,-1):
    print(i)
```

```
Out: 
5
4
3
2
1
```

#### (2.1) Print pattern
Recreate the following pattern:


```
*
**
***
****
*****
****
***
**
*

#
##
###
####
#####
####
###
##
#

x
xx
xxx
xxxx
xxxxx
xxxx
xxx
xx
x

```

Use

In [None]:
symbol = ['*','#','x']
rows = 5

and put your solution here:

#### (2.2) Print pattern

Recreate the following pattern:

```
*
**
***
****
xxxxx
****
***
**
*
```

Put your solution here:

#### (2.3) Print pattern

Recreate the following pattern:

```
0 1 2 3 4 
1 1 2 3 4 
2 2 2 3 4 
3 3 3 3 4 
4 4 4 4 4 
```

Hint: You can use the argument ```end=' '``` in the ```print``` function
in the following way

```Python
for i in range(5):
    print(i, end=' ')
```

```
Out: 0 1 2 3 4 
```

In other words, line breaks are replaced by spaces, because

```Python
for i in range(5):
    print(i)
```

```
Out: 
0
1
2
3
4
```


Put your solution here:

***
## 3. Analysing the Tips Dataset

Here, we reconsider the ```tips``` dataset encountered in the ```6-Numpy_Pandas``` notebook and work with **Pandas** and **Seaborn**.

Each row corresponds to an individual visit at a restaurant, with indication of the **day** and **time** of the visit and **size** of the group, whether there were any **smoker** in the group and the **total_bill** and **tip** in dollars as well as the **sex** of the person who payed.

In [None]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

In [None]:
tips = sns.load_dataset("tips")
tips.head(5)

#### (3.1) Dataset overview
Make use of ```describe()``` to get an overview of the dataset

and check out the additional argument ```include=['category']```

#### (3.2) Unique Categories

Identify for each column containing a categorical variable 
the unique categories. You might want to make use of ```.unique()```.

#### (3.3) Mean values by groups
Group by categorical variables and obtain the mean values for the
numerical variables. E.g. group by ```'day'``` and obtain the mean
values for ```'total_bill'```, ```'tip'```, ```'size'```. 

#### (3.4) Pairplot

Try to create a [pairplot](https://seaborn.pydata.org/generated/seaborn.pairplot.html)
(check out the documentation in the link) where you differentiate between male and female, i.e.
use ```'sex'``` to plot these aspecets in two different colours.

#### (3.5) Category Plots

Try to create a [catplot](https://seaborn.pydata.org/generated/seaborn.catplot.html)
(check out the documentation in the link), which plots the ```'size'``` on the x-axis and the **count** of different group sizes on the y-axis (i.e. a histogram) and differentiate between male and female, i.e.
use ```'sex'``` to plot these aspecets in two different colours.

Now, try to create a [catplot](https://seaborn.pydata.org/generated/seaborn.catplot.html)
(check out the documentation in the link), which plots the ```'day'``` on the x-axis and the **count** of different group sizes on the y-axis (i.e. a histogram). This time create two subplots, one for female and one for male, i.e. use ```'sex'``` for the plot **columns**. 

#### (3.6) Correlation

In this last exercise, we compute the correlation between the numerical variables.
To this end, use the method ```corr()``` and put your solution in the following cell:

Try to create a [regplot](https://seaborn.pydata.org/generated/seaborn.regplot.html)
(check out the documentation in the link), which plots the ```'total_bill'``` on the x-axis and the ```'tip'``` on the y-axis. 