# MGT-499 Statistics and Data Science - Individual Assignment

In [2]:
# Import here what you need

import pandas as pd
import numpy as np


This notebook contains the individual assignment for the class MGT-499 Statistics and Data Science. Important information:
- **Content**: the assignment is divided in two main parts, namely data cleaning (2 datasets) and Exploratory Data Analysis, for a total of 13 main questions (see table of contents). Some of these main questions are divided in sub questions. In the first part, the questions are very specific, while in the second part they are more open.
- **Deadline**: Tuesday 8th of November at 23:59. 
- **Final Output**: a Jupyter notebook, which we (teachers) can run. 
- **Answering the Questions**: you will find the questions in markdown cells below. Under each of these cells, you will find a cell / cells for answers. Type there your answer. For the answer to be correct, the cell with the answer must run without error (unless specified). You can use markdown cells for the answers that require text.
- **Submission**: submit the assignment on Moodle, under [Individual Assignment](https://moodle.epfl.ch/mod/assign/view.php?id=1222846)

## Content
- [Polity5 Dataset](#polity5)  
    - [Question 1: Import the data and get a first glance](#question1)
    - [Question 2: Select some variables](#question2)
    - [Question 3: Missing Values](#question3)
    - [Question 4: Check Polity2](#question4)
- [Quality of Government (QOG) Environmental Indicators Dataset](#qog)  
    - [Question 5: Import the data and do few fixes](#question5)
    - [Question 6: Merge QOG and Polity5 ... first attempt](#question6)
    - [Question 7: Merge QOG and Polity5 ... second attempt](#question7)
    - [Question 8: Clean the merged dataframe](#question8)
- [Exploratory Data Analysis](#eda)
    - [Question 9: Selecting the ingredients for the recipe (how I select the variables)](#question9)  
    - [Question 10: Picking the right quantity of each ingredient (how I select my sample)](#question10)
    - [Question 11: Tasting and preparing the ingredients (univariate analysis)](#question11)
    - [Question 12: Cooking the ingredients together (bivariate analysis)](#question12)
    - [Question 13: Tasting the new recipe (conclusion)](#question13)

## Polity5 data <a class="anchor" id="polity5"></a>

Polity5 is a widely used democracy scale. The raw data as well as the codebook are available [here](http://www.systemicpeace.org/inscrdata.html). For this assignment, we have modified a bit the original version, for example we have added the iso3 code for countries to make you save time. You can find the modified version [here](https://raw.githubusercontent.com/edoardochiarotti/class_datascience/main/Notebooks/Assignment/individual_assignment/data/polity2_iso3.csv).

### Question 1: import the data and get a first glance <a class="anchor" id="question1"></a>

1a) Import the csv 'polity2_iso3.csv' (file provided in the link [here](https://raw.githubusercontent.com/edoardochiarotti/class_datascience/main/Notebooks/Assignment/individual_assignment/data/polity2_iso3.csv)) as a panda dataframe (ignore the warning message) **(1 point)**

In [3]:
# Answer 1a

polity_data = pd.read_csv('https://raw.githubusercontent.com/edoardochiarotti/class_datascience/main/Notebooks/Assignment/individual_assignment/data/polity2_iso3.csv', low_memory=False)


1b) Display the first 10 rows **(1 point)**

In [4]:
# Answer 1b
polity_data.head(10)

Unnamed: 0,iso3,year,p5,cyear,ccode,scode,country,flag,fragment,democ,...,interim,bmonth,bday,byear,bprec,post,change,d5,sf,regtrans
0,,1800,0,2711800,271,WRT,Wuerttemburg,0,,0,...,,1.0,1.0,1800.0,1.0,-7.0,88.0,1.0,,
1,,1800,0,7301800,730,KOR,Korea,0,,5,...,,1.0,1.0,1800.0,1.0,1.0,88.0,1.0,,
2,,1800,0,2451800,245,BAV,Bavaria,0,,0,...,,1.0,1.0,1800.0,1.0,-10.0,88.0,1.0,,
3,,1801,0,7301801,730,KOR,Korea,0,,5,...,,,,,,,,,,
4,,1801,0,2711801,271,WRT,Wuerttemburg,0,,0,...,,,,,,,,,,
5,,1801,0,2451801,245,BAV,Bavaria,0,,0,...,,,,,,,,,,
6,,1802,0,7301802,730,KOR,Korea,0,,5,...,,,,,,,,,,
7,,1802,0,2711802,271,WRT,Wuerttemburg,0,,0,...,,,,,,,,,,
8,,1802,0,2451802,245,BAV,Bavaria,0,,0,...,,,,,,,,,,
9,,1803,0,7301803,730,KOR,Korea,0,,5,...,,,,,,,,,,


1c) Display the data types of all the variables included in the data **(1 point)**

In [5]:
# Answer 1c
data_type  = polity_data.dtypes
data_type.to_frame()

Unnamed: 0,0
iso3,object
year,int64
p5,int64
cyear,int64
ccode,int64
scode,object
country,object
flag,int64
fragment,float64
democ,int64


1d) By looking at your answer in 1c, what is the difference between the different types of variables? Why the type of some variables is defined as object? **(1 point)**

Answer 1d:

We can see that we have four different dataypes: `object`, `float64` and `int64`.

### Question 2. Select some variables <a class="anchor" id="question2"></a>

2a) Create a subset dataframe that contains the variables 'iso3', 'country', 'year', 'polity2' and display it **(1 point)**

In [6]:
# Answer 2a
polity_data_subset = (polity_data[['iso3', 'country', 'year', 'polity2']]).copy()

2b) Display the type of the variable "year" **(1 point)**

In [7]:
# Answer 2b
polity_data_subset['year'].dtypes

dtype('int64')

2c) Convert the variable "year" to string **(1 point)**
<br>
Hint: if you get a warning message of the type "SettingWithCopyWarning", it is because you did not subset the data in the right way. Go back to your class notes and check the different ways to subset a dataframe, and try again. If you do it correctly, you will not get the warning message.

In [8]:
# Answer 2c
polity_data_subset['year'] = polity_data_subset['year'].astype('string')
polity_data_subset['year'].dtypes

string[python]

### Question 3: Missing Values <a class="anchor" id="question3"></a>

3a) Subset the rows that have iso3 missing and display **(1 point)**

In [9]:
# Answer 3a
iso_missing = polity_data_subset['iso3'].isna()

# The rows with `True`` are 'NaN' values
iso_missing

#polity_data_subset.isna().sum()

0         True
1         True
2         True
3         True
4         True
         ...  
17569    False
17570    False
17571    False
17572    False
17573    False
Name: iso3, Length: 17574, dtype: bool

3b) Display the countries that have missing iso3. What can you tell by looking at them? Any similarities? **(1 point)**

In [10]:
# Answer 3b
iso_missing_contries = polity_data_subset[polity_data_subset['iso3'].isna()]['country']
print(f'They seem to be old countries, or old country names.\n{iso_missing_contries.unique()}')


They seem to be old countries, or old country names.
['Wuerttemburg' 'Korea' 'Bavaria' 'Saxony' 'Parma' 'Tuscany' 'Sardinia'
 'Modena' 'Two Sicilies' 'Baden' 'Gran Colombia' 'United Province CA'
 'Serbia' 'Orange Free State' 'Yemen North' 'Czechoslovakia' 'USSR'
 'Germany West' 'Germany East' 'Pakistan' 'South Vietnam' 'Yemen South'
 'Vietnam' 'Yugoslavia' 'Ethiopia' 'Serbia and Montenegro' 'Montenegro'
 'Sudan-North']


3c) Display the countries with missing iso3 from 2011. **(1 point)**

In [11]:
# Answer 3c
# Interpreting the question so that we take the contries missing iso3 from 2011 onwards
global df_missing_iso3_2011
global sorted_df_clean_list

df_missing_iso3_2011 = polity_data_subset[(polity_data_subset['iso3'].isna()) & (polity_data_subset['year'] >= '2011')]
sorted_df_clean_list = sorted(df_missing_iso3_2011['country'].unique())
sorted_df_clean_list

['Ethiopia', 'Montenegro', 'Serbia', 'Sudan-North', 'Vietnam']

3d) Display the rows for which the column "country" contains the word "Serbia". By looking at the result, can you tell what happened to Serbia in 2006? **(1 point)**
<br>
Hint: the most general way of doing this is to use a combination of re.search and list comprehension. To display the full subset, you can use print(df.to_string()).

In [12]:
# Answer 3d

# It is important to check for varieties of capilized firstletter when using case sensitive comparison. Information may be lost. Here all country instances uses first capital letter.
polity_data_subset_serbia = polity_data_subset[polity_data_subset['country'] == 'Serbia']
print(polity_data_subset_serbia.to_string())

# We can see that the country referred to as Serbia, reappeared in 2006, this is due to a name change following Montenegro and Serbias decleration of independence of their previous union named `State Union of Serbia and Montenegro`, previously called `Federal Republic of Yugoslavia``


     iso3 country  year  polity2
224   NaN  Serbia  1830     -7.0
230   NaN  Serbia  1831     -7.0
252   NaN  Serbia  1832     -7.0
261   NaN  Serbia  1833     -7.0
272   NaN  Serbia  1834     -7.0
286   NaN  Serbia  1835     -7.0
295   NaN  Serbia  1836     -7.0
301   NaN  Serbia  1837     -7.0
318   NaN  Serbia  1838      2.0
333   NaN  Serbia  1839      2.0
344   NaN  Serbia  1840      2.0
357   NaN  Serbia  1841      2.0
363   NaN  Serbia  1842      2.0
369   NaN  Serbia  1843      2.0
387   NaN  Serbia  1844      2.0
394   NaN  Serbia  1845      2.0
410   NaN  Serbia  1846      2.0
420   NaN  Serbia  1847      2.0
429   NaN  Serbia  1848      2.0
439   NaN  Serbia  1849      2.0
450   NaN  Serbia  1850      2.0
467   NaN  Serbia  1851      2.0
475   NaN  Serbia  1852      2.0
482   NaN  Serbia  1853      2.0
492   NaN  Serbia  1854      2.0
502   NaN  Serbia  1855      2.0
523   NaN  Serbia  1856      2.0
533   NaN  Serbia  1857      2.0
542   NaN  Serbia  1858     -9.0
553   NaN 

3e) Write a function that does the operation in 3d and use it to display the subset that has the word "sudan" (all lower cap) in country. Then do the same for the word "vietnam" (all lower cap). **(1 point)**
<br>
Hint: options of functions can be very useful.

In [13]:
# Answer 3e
def find_country(df:pd.DataFrame, country:str):
	try:
		return df[df['country'] == country]
	except TypeError as e:
		print(f'Not a valid datatype: {e}.\nUse {pd.DataFrame} and {str} as option type parameters')

df_sudan = find_country(polity_data_subset, 'sudan')
df_vietnam = find_country(polity_data_subset, 'vietnam')

print(df_sudan)
print(df_vietnam)

# Both empty. No country uses lower all low cap


Empty DataFrame
Columns: [iso3, country, year, polity2]
Index: []
Empty DataFrame
Columns: [iso3, country, year, polity2]
Index: []


3f) Replace nan values in iso3 with correct iso3 for the 5 countries found in 3c from 2011 onwards, and display the subset with the fixed values to check that everything worked. **(1 point)**
<br>
Hint: the correct iso3 for these 5 countries are "ETH","MNE","SRB","SDN","VNM".

In [105]:
# Answer 3f

# Storing the correct iso3 values in a list
iso3_code_missing = ('ETH', 'MNE', 'SRB', 'SDN', 'VNM')

# Creating a dict to look up correct iso3 values with list from 3c
iso3_dict = dict(zip(sorted_df_clean_list, iso3_code_missing))

# Defining a function to do repeated tasks
def replace(df, dct, replaced):	
 	for key, item in dct.items():
 		df.loc[(df['country'] == key) & (df['year'] >= '2011')] = df[(df['country'] == key) & (df['year'] >= '2011')].replace(replaced, item)

# Calling the function and telling which value i want to replace
replace(polity_data_subset, iso3_dict, np.nan)

# Using the indexes of the subset created in 3c
polity_data_subset.iloc[df_missing_iso3_2011.index]

Unnamed: 0,iso3,country,year,polity2
1230,MNE,Montenegro,2011,9.0
1231,SRB,Serbia,2011,8.0
1233,ETH,Ethiopia,2011,-3.0
1234,SDN,Sudan-North,2011,-4.0
1235,VNM,Vietnam,2011,-7.0
1236,MNE,Montenegro,2012,9.0
1238,SRB,Serbia,2012,8.0
1239,ETH,Ethiopia,2012,-3.0
1240,VNM,Vietnam,2012,-7.0
1241,SDN,Sudan-North,2012,-4.0


3g) Drop the remaining rows which have nan in "iso3" and display the new number of rows of the dataframe (how many are they?) **(1 point)**

In [97]:
# Answer 3g

# Subset with all NaNs in the iso3 column
subset_with_remaining_nans = polity_data_subset[polity_data_subset['iso3'].isna()]

# Indexes with NaN stored
nans_indexes = subset_with_remaining_nans.index

# Dropping row indexes with NaN
df_clean = polity_data_subset.drop(index=nans_indexes)

# Number of rows
len(df_clean.index)


16344

### Question 4: Check Polity2 <a class="anchor" id="question4"></a>

4a) Display the first and last year included in the dataset **(1 point)**

In [110]:
# Answer 4a

# Assuming the assignment is asking for the first and last of the original dataset
print(polity_data_subset.iloc[[0,-1]]['year'])

0        1800
17573    2018
Name: year, dtype: string


4b) What do the values in "polity2" represent? **(1 point)**

Answer 4b: 

`Polity2` is a revised version of the Polity score, which captures a regime authority on a `21-point` scale ranging from `-10` (hereditary monarchy) to `10` (consolidated democracy). 

4c) Do we have weird values for polity2? If yes, why? What should we do about them? Transform the data accordingly. **(1 point)**

Answer 4c:

In [169]:
# Answer 4c

# We have NaNs in polity2
# Since we know that the NaNs from polity2 are the only ones left, we can just drop all NaNs.
print(df_clean.isna().sum())

df_cleaned = df_clean.dropna()

# In addition we can see that we have values of -88 and -66. As this scale is supposed to range from -10 to 10, we should remove these as well.
print(df_cleaned['polity2'].unique())

# Lets find the rows of those values
polity66 = df_cleaned[df_cleaned['polity2'] == -66].index
polity88 = df_cleaned[df_cleaned['polity2'] == -88].index

df_cleaned = df_cleaned.drop(index=polity66)
df_cleaned = df_cleaned.drop(index=polity88)

# Now we can see that we have the right values
print(df_cleaned['polity2'].unique())

# I can see that we have messed up the sorting of the data_frame. We can resolve this by using the sort_value function

df_cleaned = df_cleaned.sort_values(['country', 'year']) 


iso3         0
country      0
year         0
polity2    233
dtype: int64
[  9.   8.   0.  -3.  -4.  -7.   1.  -6.  -8. -10.  -1.  -2.  -9.  -5.
   3.   5.   7.   2.   6.  10.   4. -88. -66.]
[  9.   8.   0.  -3.  -4.  -7.   1.  -6.  -8. -10.  -1.  -2.  -9.  -5.
   3.   5.   7.   2.   6.  10.   4.]


4d) Make a map that shows the number of observations of polity2 by country **(1 point)**

In [172]:
# Answer 4d

# Number of unique countries
unique_countries = (df_cleaned.country.unique())

for country in unique_countries:
	observations = df_cleaned[df_cleaned['country'] == country].polity2.count()
	print(f'{country}: has {observations} polity2 observations')


Afghanistan: has 196 polity2 observations
Albania: has 105 polity2 observations
Algeria: has 57 polity2 observations
Angola: has 44 polity2 observations
Argentina: has 194 polity2 observations
Armenia: has 28 polity2 observations
Australia: has 118 polity2 observations
Austria: has 218 polity2 observations
Azerbaijan: has 28 polity2 observations
Bahrain: has 48 polity2 observations
Bangladesh: has 47 polity2 observations
Belarus: has 28 polity2 observations
Belgium: has 183 polity2 observations
Benin: has 59 polity2 observations
Bhutan: has 112 polity2 observations
Bolivia: has 193 polity2 observations
Bosnia: has 3 polity2 observations
Botswana: has 53 polity2 observations
Brazil: has 195 polity2 observations
Bulgaria: has 139 polity2 observations
Burkina Faso: has 59 polity2 observations
Burundi: has 57 polity2 observations
Cambodia: has 56 polity2 observations
Cameroon: has 59 polity2 observations
Canada: has 152 polity2 observations
Cape Verde: has 44 polity2 observations
Central A

4e) Store the final dataframe (the one you obtained after 5d) in an object called df_pol **(1 point)**

In [171]:
# Answer 4e
df_pol = df_cleaned

## Quality of Government Environmental Indicators <a class="anchor" id="qog"></a>

The QoG Environmental Indicators dataset (QoG-EI) (Povitkina, Marina, Natalia Alvarado Pachon & Cem Mert Dalli. 2021). The Quality of Government Environmental Indicators Dataset, version Sep21. University of Gothenburg: The Quality of Government Institute, https://www.gu.se/en/quality-government), is a compilation of indicators measuring countries' environmental performance over time, including the presence and stringency of environmental policies, environmental outcomes (emissions, deforestation, etc.), and public opinion on the environment. Codebook and data are available [here](https://www.gu.se/en/quality-government/qog-data/data-downloads/environmental-indicators-dataset).

### Question 5: Import the data and do few fixes <a class="anchor" id="question5"></a>

5a) Import data from the Quality of Government Environmental Indicators Dataset and display the variables types and the number of rows **(1 point)**
<br>
Hint: When you go on the webpage of the Environmental Indicators Dataset, you can directly import from a URL by copying the link address of the dataset! 

In [20]:
# Answer 5a


5b) Rename the variable "ccodealp" to "iso3" **(1 point)**

In [21]:
# Answer 5b


5c) Check the type of the variables "year" and "iso3" are string, if not convert them to string **(1 point)**

In [22]:
# Answer 5c


### Question 6: Merge QOG and Polity5 ... issues with QOG? <a class="anchor" id="question6"></a>

6a) Get a subset of the dataframe that includes the variables "cname", "iso3", "year" and "cckp_temp", and display the number of rows. **(1 point)**

In [23]:
# Answer 6a


6b) Merge this subset (left) and the clean version of the polity data (right), using the argument how="left". Was the merge succesfull? If yes, how many rows has the merged dataframe? Is it the same number of rows of the subset in 6a? **(1 point)**

In [24]:
# Answer 6b


6c) Do the same by adding the argument validate="one-to-one". Can you make some hypotheses on why you get an error? **(1 point)**

In [25]:
# Answer 6c


6d) Consider the subset of the QOG you obtained in 6a and write a code to (i) count the number of observations for the variable "cckp_temp" for each combination of iso3 and year, (ii) store the results in a dataframe. For example, the combination "USA-2012" should have 1 observation for "cckp_temp", so the result of your code should be 1. The code should do this for all iso3-year combinations of your subset dataframe, and store the results in a dataframe. **(1 point)**
<br>
Hint: it should not take you more than 2 lines of code.

In [26]:
# Answer 6d


6e) Use the code in 6d to write a function that displays all rows of the dataframe obtained in 6a that have more than one observation of "cckp_temp" for each iso3-year combination, and check if it works. **(1 point)**

In [27]:
# Answer 6e


6f) Which countries have more than one observation for each iso3-year combination? Deal with these countries in the subset dataframe created in 6a to make sure you no longer have double observations for iso3-year combinations, and check that after your fix this is actually the case. **(1 point)**
<br>
Hint: should we keep a country with all missing values?

In [28]:
# Answer 6f


6g) If your check went well, now you can perform the same operation directly in the QOG dataframe (not in the substed dataframe created in 6a). How many rows does now the QOG dataframe has? **(1 point)**

In [29]:
# Answer 6g


### Question 7: Merge QOG and Polity5 ... issues with Polity5? <a class="anchor" id="question7"></a>

7a) Merge the cleaned QOG dataframe (left) and the Polity dataframe (right) using the options how="left" and validate="one_to_one". Does it work? Why? **(1 point)**

In [30]:
# Answer 7a


7b) Use the function you wrote in 6e to check what's wrong in the "clean" version of Polity **(1 point)**

In [31]:
# Answer 7b


7c) Drop or fix the countries that create troubles directly in the "clean" version of Polity and motivate your choices. **(1 point)**

In [32]:
# Answer 7c


7d) Try now to merge the "clean-clean" versions of COG and Polity (the ones you obtained in 7g and 8c) always using the options how="left" and validate="one_to_one". Does it work, and why? How many rows has the resulting merged dataframe? **(1 point)**

In [33]:
# Answer 7d


### Question 8: Clean the merged dataframe <a class="anchor" id="question8"></a>

8a) In the merged dataframe, order the columns so that you have the "index" variables first and the variables with actual values last. **(1 point)**
<br>
Hint: index variables are "iso3", "year" and other similar variables you can find, and the variables with actual values are "polity2", "cckp_temp" and other similar variables you can find.

In [34]:
# Answer 8a


8b) Rename "cname" as "country" and "country" as "country_polity". **(1 point)**

In [35]:
# Answer 8b


8c) Save the clean merged dataframe as a csv in a subfolder called "clean_data" in your working directory **(1 point)**

In [36]:
# Answer 8c


## Exploratory Data Analysis <a class="anchor" id="eda"></a>

In this section you will define a research question and perform a preliminary Exploratory Data Analysis (EDA) to address - or better, start addressing - the question at hand. This exercise will be done along the lines of the analysis done by our own Quentin Gallea in "*A recipe to empirically answer any question quickly*" ([Towards Data Science, 2022](https://towardsdatascience.com/a-recipe-to-empirically-answer-any-question-quickly-22e48c867dd5)). In this article, Quentin shows the first steps of an EDA that aims to explore whether heat waves have pushed governments to implement regulations against climate change (causal link). The logic is that, as it gets hotter and hotter, governments become more aware of climate change, and the problems it can cause to society, and start addressing it. In Quentin's analysis, heat waves (proxied by temperature) is the "main explanatory variable", rainfall is the "explanatory variable for heterogeneity", and regulations against climate change (proxied by the Environmental Policy Stringency Index) is the "outcome variable". He finds that indeed countries with relatively high temperatures have implemented more regulations against climate change. This is true especially when rainfall levels are low, as when it does not rain the damage of extreme heat is more evident to legislators, who therefore apply stricter regulations against these phenomenons.
<br>
<br>
In this exercise, you will be asked to do a similar analysis on a research question of your choice, using at least two of the variables of the dataset we have created in the former questions (QOG + Polity). For example, "what is the average temperature in 2010?" is not a valid research question (univariate), while "what is the impact of high temperatures on the stringency of climate regulations?" is a valid research question (at least bivariate). As before, we will ask you some (this time more general and open) questions, and you should report your answer in the cells below each question. Use a mix of markdown and code cells to answer (markdown for text and code for graphs and tables). We should be able to run all the graphs, i.e. screenshots of graphs are not accepted. Note that for now we have put only one markdown cell and one code cell for the answer, but feel free to add as many cells as you need.
<br>
Beyond the python code, we will grade the interpretations of the results and the coding decision you make.
<br>
<br>
Let your creativity guide you and let's have some fun!

### Question 9: Selecting the ingredients (how I select the variables) <a class="anchor" id="question9"></a>
We have saved the clean merged data that resulted from the previous questions in "clean_data_prepared_EDA" (it should be the same of the one you saved in "clean_data"). Import the clean merged data from "clean_data_prepared_EDA" using this [link](https://raw.githubusercontent.com/edoardochiarotti/class_datascience/main/Notebooks/Assignment/individual_assignment/clean_data_prepared_EDA/df_qog_polity_merged.csv). Explore the variables in the newly obtained dataframe by checking the documentation of QOG and Polity. Then, define a research question that addresses a causal link between at least two of these variables. Describe the research question, why you are addressing it and the variables of interest (outcome variable, main explanatory variable and explanatory variable for heterogeneity). **(3 points)**

Answer 9:

In [37]:
# Answer 9:

### Question 10: Picking the right quantity of each ingredient (how I select my sample) <a class="anchor" id="question10"></a>
Explore the data availability of your variables of interest and select a clean sample for the analysis. Describe this sample with the help of summary-statistics tables and maps. **(3 points)**

Answer 10:

In [38]:
# Answer 10:

### Question 11: Tasting and preparing the ingredients (univariate analysis) <a class="anchor" id="question11"></a>
Do an univariate analysis for each variable you have chosen (outcome variable, main explanatory variable and explanatory variable for heterogeneity):
- Prepare the variable, for example see if you need to transform the data further, i.e. log-transform, define a categorical variable, deal with outliers, etc.
- Understand the nature of the variable, i.e. continuous, categorical, binary, etc., which then allows to pick the right statistical tool in the bivariate analysis.
- Get an idea of the variable's behaviour across time and space.

Describe these steps and the conclusions you can draw with the help of histograms, tables, maps and line graphs. **(3 points)**

Answer 11:

In [39]:
# Answer 11:

### Question 12: Cooking the ingredients together (bivariate analysis) <a class="anchor" id="question12"></a>

Considering the "nature" of your variables (continuous, categorical, binary, etc.), pick the right tool / tools for a preliminary bivariate analysis, i.e. correlation tables, bar/line graphs, scatter plots, etc. Use these tools to describe your preliminary bivariate analysis and your findings. **(3 points)**

Answer 12:

In [40]:
# Answer 12:

### Question 13: Tasting the new recipe (conclusion) <a class="anchor" id="question13"></a>

Explain what you learned, the problem faced, what would you do next (you can suggest other data you would like to have etc). **(2 points)**

Answer 13: