# Homework 3

We first import the needed libraries and set the elements we need for the rest of the Homework.

In [None]:
# Used to look for the saved data
IMAGE_FOLDER = 'Data/Images/'
DATA_EUROPE = 'Data/une_rt_a.xls'
DATA_SWITZERLAND = 'Data/chomage.xlsx'
DATA_SWITZERLAND_BY_AGE = 'Data/chomage_age.xlsx'
DATA_SWITZERLAND_BY_NATIONALITY = 'Data/chomage_nationalite.xlsx'

In [None]:
swiss_geo_path = r'topojson/ch-cantons.topojson.json'
europe_geo_path = r'topojson/europe.topojson.json'

In [None]:
# Importing libraries
import os #might not be useful
import json #might not be useful
import folium
import pandas as pd
import pickle as pkl
import seaborn as sns #might not be useful
from IPython.display import Image #allows showing images

In [None]:
#Verifying we have the right version
folium.__version__ == '0.5.0'

## Unemployment in Europe

The very first thing we have to think about is what data we want to use. Eurostat has a lot of types of information, even more so, has information per country, but also per province/state of some countries (known as NUTS 2 regions). Thus, we have to determine what kind of information - and how much - we want to show.

Thus, we decide to take the **yearly** average for the unemployment rates and not the monthly ones, thus getting data from 2016 and not from 2017. For the monthly rates, we could have information for the beginning of this year, but we choose against it. This also permits to not have any fluctuation due to seasonal workers that would not be seen as unemployed for a time.

Secondly, seeing as the idea is to compare the unemployment rate of Europe to that of Switzerland, it would be more appropriate to take the data by country rather than that per NUTS 2 region, even more so seeing as Switzerland is cut in several in that dataset. So, to be able to compare it to Switzerland as a whole, we should have Switzerland as only one entity.

Therefore, we also need data that includes Switzerland, as it is not always the case for all files about unemployment in Europe. Thus, we had to dig a bit deeper and find data on unemployment according to age, sex and nationality (which contains rates for Switzerland, compared to data in the main indicators), which restricts it to people of age 15 to 74. This is actually what we want as those are the "adults" that could be potentially working already, and people that are younger or older should not count in our rates, as they aren't aprt of the working force.

Note that we also choose to delete all the columns we do not need (years other than 2016) and the data for the UE as a whole, which were the 4 first rows, before downloading the data.

In [None]:
Image(IMAGE_FOLDER + 'euro01.png')

We also had the option to download the data in several different types of files, like HTML, CSV, TSV or XLS. We choose to download the data in a XLS format.

In [None]:
Image(IMAGE_FOLDER + 'euro02.png')

In [None]:
europe_unemployed = pd.read_excel(DATA_EUROPE, header=10)

In [None]:
europe_unemployed.head()

As we can see, we thus have the name in the country and the rate for 2016. The default coplumn names are a bit confusing, so we will quickly change this.

In [None]:
europe_unemployed.columns = ['Country', 'Rate']

We also need to change the names of two countries, as we need to have "Germany" and "The former Yugoslav Republic of Macedonia" instead of "Germany (until 1990 former territory of the FRG)" and "Former Yugoslav Republic of Macedonia, the".

In [None]:
europe_unemployed = europe_unemployed.replace({"Germany (until 1990 former territory of the FRG)": "Germany",
                     "Former Yugoslav Republic of Macedonia, the": "The former Yugoslav Republic of Macedonia"})

In [None]:
europe_unemployed.head()

Now that this is done, we want to have a map of Europe with the given tiles already ready. As we all know, Switzerland is in the middle of Europe, so we're going to give the coordinates for the middle of Switzerland to have the map centered on.

In [None]:
central_swiss_coord = [46.484, 8.1336] 
m_europe = folium.Map(location=central_swiss_coord, zoom_start=4, tiles='cartodbpositron')

In [None]:
m_europe.choropleth(open(europe_geo_path),
             data=europe_unemployed,
             columns=('Country', 'Rate'),
             legend_name='Unemployment rate (%)',
             fill_color='RdYlGn',
             fill_opacity= 0.8,
             key_on = 'feature.properties.NAME',
             topojson='objects.europe')
m_europe

As we can see, for the countries we have no data on, the color that was chosen as default is a a vivid red. It is thus easy to see that the countries we have no information about are those not in the UE or in Schengen.

Here we can see that the difference in unemployment rates between Switzerland and a lot of it's neighboring countries (Germany, Austria, Poland, and so on) is basically the same, but it fares a lot better than two of it's direct neighbors: France and Italy. 

Thus, in terms of unemployment, Switzerland is one of the copuntries with the lowest rate, but not the only one to fare so well.

It is also important to note that the colors are a little bit misleading because of the way that choropleth maps reduce the threshold. Thus, we have any country with rating between 3 and 6 that has the same color as Switzerland. Seeing as the rate in Switzerland is 5%, this means that any country with less would have the same color, yet still fare better than Switzerland (which is a case for Germany or Iceland).

## Unemployment in Switzerland

In this exercise, we need several elements from the online data provided on the [amstat website](https://www.amstat.ch/v2/index.jsp). Each information we retrieve should be provided for every Swiss canton. To explain our methodology, we will include a print screen of each step we performed. As these steps are self-explanatory, we will not include any furter explanation on our methodology (however, the use of the retrieved data will be tackled in each question).

In [None]:
Image(IMAGE_FOLDER + 'amstat01.png')

In [None]:
Image(IMAGE_FOLDER + 'amstat02.png')

In [None]:
Image(IMAGE_FOLDER + 'amstat03.png', retina = True)

In [None]:
Image(IMAGE_FOLDER + 'amstat04.png')

Once we retrieve our data using an XLS file (easier to work with), we still need to perform sanitizing steps before being able to draw our map:
- Deleting useless columns (and rows)
- Getting rid of the Multiindex
- Mapping each canton to its code (vital to draw our map)

In [None]:
swiss_unemployment= pd.read_excel(DATA_SWITZERLAND, header=[0,1])
del swiss_unemployment['Mois'] #Only NaN values
del swiss_unemployment['Total'] #Same as 'Septembre 2017' as we only choose one month
swiss_unemployment.drop([26], inplace=True) #We only need the values of each canton, not nationally

In [None]:
swiss_unemployment.columns = [c[-1] for c in swiss_unemployment.columns]
swiss_unemployment.reset_index(inplace=True)
swiss_unemployment.rename(columns={'index': 'Région linguistique', 'Unnamed: 0_level_1':'Canton'}, inplace=True)
swiss_unemployment.head()

In [None]:
dico = pkl.load(open('Data/map_cantons.pkl', 'rb'))
swiss_unemployment.insert(1, 'Code', swiss_unemployment['Canton'].map(dico))
swiss_unemployment.head()

In [None]:
### Unemployement rate of each Swiss canton

In the first question of Exercise 2, we simply plot the unemployment rate provided by the website (namely using the number of people looking for a job who either currently have a job or not).

In [None]:
swiss_unemployment_map = folium.Map(location=[46.85, 8.23], zoom_start=8)
swiss_unemployment_map.choropleth(open(swiss_geo_path), data=swiss_unemployment,
             columns=['Code', 'Taux de chômage'],
             key_on='feature.id',
             topojson = 'objects.cantons',
             fill_color='BuPu', fill_opacity=0.7, line_opacity=0.2,
             legend_name='Percentage of unemployment (%)')
swiss_unemployment_map

Analysis (en cours)

### Unemployement rate (of people without a job) of each Swiss canton

In this question we will focus on the "real" unemployment rate of people who currently do not have a job (which seems more sound than looking at people who do have a job but are simply looking for a new one). However, to be able to do that, we first need to determine the current active population of Switzerland using the provided data.

In [None]:
real_swiss_unemployment = swiss_unemployment

In [None]:
real_swiss_unemployment_map = folium.Map(location=[47, 8], zoom_start=7)
real_swiss_unemployment_map.choropleth(open(swiss_geo_path), data=real_swiss_unemployment,
             columns=['Code', 'Taux de chômage'],
             key_on='feature.id',
             topojson = 'objects.cantons',
             fill_color='BuPu', fill_opacity=0.7, line_opacity=0.2,
             legend_name='Percentage of unemployment (%)')
real_swiss_unemployment_map

Analysis (en cours)

### Differences in unemployment between foreigners and natives

Here we have two different questions again: one about the difference in employment regarding nationality, and the second regarding age. For both, we proceed similarly as above, but select the option "Classes d'âge 15-24, 25-49, 50 ans et plus" for one of them and "Nationalité" for the other.

In [None]:
Image(IMAGE_FOLDER + 'amstat05.png')

In [None]:
Image(IMAGE_FOLDER + 'amstat06.png')

To start off, we import the data set downloaded previously and clean it as necessary.
In our case, we need to flatten some of the structure.

In [None]:
swiss_foreign_unemployed = pd.read_excel(DATA_SWITZERLAND_BY_NATIONALITY, header=[0,1]) 

In [None]:
del swiss_foreign_unemployed['Mois'] #remove empty line
del swiss_foreign_unemployed['Total'] #only interessted in september

In [None]:
swiss_foreign_unemployed.dropna(inplace=True) #final 'total' column is NA
swiss_foreign_unemployed.reset_index(inplace=True) #add index
swiss_foreign_unemployed.columns

In [None]:
swiss_foreign_unemployed.rename(columns={'index':'Région linguistique'}, inplace=True)
swiss_foreign_unemployed.columns.name = None #we don't want a super column
swiss_foreign_unemployed.head()

In [None]:
swiss_foreign_unemployed.columns = swiss_foreign_unemployed.columns.droplevel(1)

In [None]:
swiss_foreign_unemployed.iloc[:,0:4]

### The Röstigraben