# Homework 3

We first import the needed libraries and set the elements we need for the rest of the Homework.

In [1]:
# Used to look for the saved data sheets
IMAGE_FOLDER = 'Data/Images/'
DATA_EUROPE = 'Data/une_rt_a.xls'
DATA_SWITZERLAND = 'Data/chomage.xlsx'
DATA_SWITZERLAND_BY_AGE = 'Data/chomage_age.xlsx'
DATA_SWITZERLAND_BY_NATIONALITY = 'Data/chomage_nationalite.xlsx'

In [2]:
swiss_geo_path = r'topojson/ch-cantons.topojson.json'
europe_geo_path = r'europe.topojson.json'

In [3]:
# Importing libraries
import os
import json
import folium
import pandas as pd
import pickle as pkl
import seaborn as sns #might not be useful
from IPython.display import Image #allows showing images

In [4]:
#Verifying we have the right version
folium.__version__ == '0.5.0'

True

## Unemployment in Europe

The very first thing we have to think about is what data we want to use. Eurostat has a lot of types of information, even more so, has information per country, but also per province/state of some countries (known as NUTS 2 regions). Thus, we have to determine what kind of information - and how much - we want to show.

Thus, we decide to take the **yearly** average for the unemployment rates and not the monthly ones, thus getting data from 2016 and not from 2017. For the monthly rates, we could have information for the beginning of this year, but we choose against it. This also permits to not have any fluctuation due to seasonal workers that would not be seen as unemployed for a time.

Secondly, seeing as the idea is to compare the unemployment rate of Europe to that of Switzerland, it would be more appropriate to take the data by country rather than that per NUTS 2 region, even more so seeing as Switzerland is cut in several in that dataset. So, to be able to compare it to Switzerland as a whole, we should have Switzerland as only one entity.

Therefore, we also need data that includes Switzerland, as it is not always the case for all files about unemployment in Europe. Thus, we had to dig a bit deeper and find data on unemployment according to age, sex and nationality (which contains rates for Switzerland, compared to data in the main indicators), which restricts it to people of age 15 to 74. This is actually what we want as those are the "adults" that could be potentially working already, and people that are younger or older should not count in our rates, as they aren't aprt of the working force.

Note that we also choose to delete all the columns we do not need (years other than 2016) and the data for the UE as a whole, which were the 4 first rows, before downloading the data.

In [None]:
Image(IMAGE_FOLDER + 'euro01.png')

We also had the option to download the data in several different types of files, like HTML, CSV, TSV or XLS. We choose to download the data in a XLS format.

In [None]:
Image(IMAGE_FOLDER + 'euro02.png')

In [None]:
europe_unemployed = pd.read_excel(DATA_EUROPE, header=10)

In [None]:
europe_unemployed.head()

## Unemployment in Switzerland

Here, we need several things to be contained in our data, as this is a question that contains several points. First, we need to see each canton individually, but also have some statistics of people who alreay have a job but search for another one. 

First, on amstat, we go to the category "Chômeurs" and then look at the details.

In [None]:
Image(IMAGE_FOLDER + 'amstat01.png')

Then, we choose "Taux de chômage".

In [None]:
Image(IMAGE_FOLDER + 'amstat02.png')

We want to only see the current month, thus we selected "Mois sous revue" to only see the latest statistics, even if it might have a few inconsistencies comapred to the whole year. This is because, in one of the questions, we are asked about the current month and not a whole year.

In [None]:
Image(IMAGE_FOLDER + 'amstat03.png')

We take both "Demandeur d'emploi" and "Demandeur d'emploi non chômeur" as to know who is or isn't really unemployed.

Next, for the geography part, we select all two of them (Language region and canton), mainly so that we can maybe use this dataset later for the Röstigraben question.

In [None]:
Image(IMAGE_FOLDER + 'amstat04.png')

Now we can select what type of file we want the file in. We chose the XLS file as "Excel avec texte clair" as the other XLS files didn't seem appropriate to us, because of a few problems with the colapsing of rows (for the regions).

In [None]:
Image(IMAGE_FOLDER + 'amstat07.png')

In [5]:
swiss_unemployed = pd.read_excel(DATA_SWITZERLAND, header=[0,1])
del swiss_unemployed['Mois'] #Only NaN values
del swiss_unemployed['Total'] #Same as Septembre 2017 as we only choose one month

In [6]:
swiss_unemployed.columns = [c[-1] for c in swiss_unemployed.columns]
swiss_unemployed.reset_index(inplace=True)
swiss_unemployed.rename(columns={'index': 'Région linguistique', 'Unnamed: 0_level_1':'Canton'}, inplace=True)
swiss_unemployed

Unnamed: 0,Région linguistique,Canton,Taux de chômage,Coefficients de variation,Chômeurs inscrits,Demandeurs d'emploi,Demandeurs d'emploi non chômeurs
0,Suisse alémanique,Zurich,3.3,A,27225,34156,6931
1,Suisse alémanique,Berne,2.4,A,13658,18385,4727
2,Suisse alémanique,Lucerne,1.7,A,3885,6756,2871
3,Suisse alémanique,Uri,0.6,C,112,257,145
4,Suisse alémanique,Schwyz,1.7,A,1455,2229,774
5,Suisse alémanique,Obwald,0.7,B,153,319,166
6,Suisse alémanique,Nidwald,1.0,B,248,436,188
7,Suisse alémanique,Glaris,1.8,B,416,713,297
8,Suisse alémanique,Zoug,2.3,B,1543,2615,1072
9,Suisse alémanique,Soleure,2.6,A,3801,6628,2827


In [7]:
dico = pkl.load(open('Data/map_cantons.pkl', 'rb'))
swiss_unemployed.drop([26], inplace=True)
swiss_unemployed['Code'] = swiss_unemployed['Canton'].map(dico)
swiss_unemployed

Unnamed: 0,Région linguistique,Canton,Taux de chômage,Coefficients de variation,Chômeurs inscrits,Demandeurs d'emploi,Demandeurs d'emploi non chômeurs,Code
0,Suisse alémanique,Zurich,3.3,A,27225,34156,6931,ZH
1,Suisse alémanique,Berne,2.4,A,13658,18385,4727,BE
2,Suisse alémanique,Lucerne,1.7,A,3885,6756,2871,LU
3,Suisse alémanique,Uri,0.6,C,112,257,145,UR
4,Suisse alémanique,Schwyz,1.7,A,1455,2229,774,
5,Suisse alémanique,Obwald,0.7,B,153,319,166,OW
6,Suisse alémanique,Nidwald,1.0,B,248,436,188,NW
7,Suisse alémanique,Glaris,1.8,B,416,713,297,GL
8,Suisse alémanique,Zoug,2.3,B,1543,2615,1072,ZG
9,Suisse alémanique,Soleure,2.6,A,3801,6628,2827,SO


In [8]:
swiss_unemployed.loc[4, 'Code'] = 'SZ'
swiss_unemployed.loc[15, 'Code'] = 'SG'
swiss_unemployed

Unnamed: 0,Région linguistique,Canton,Taux de chômage,Coefficients de variation,Chômeurs inscrits,Demandeurs d'emploi,Demandeurs d'emploi non chômeurs,Code
0,Suisse alémanique,Zurich,3.3,A,27225,34156,6931,ZH
1,Suisse alémanique,Berne,2.4,A,13658,18385,4727,BE
2,Suisse alémanique,Lucerne,1.7,A,3885,6756,2871,LU
3,Suisse alémanique,Uri,0.6,C,112,257,145,UR
4,Suisse alémanique,Schwyz,1.7,A,1455,2229,774,SZ
5,Suisse alémanique,Obwald,0.7,B,153,319,166,OW
6,Suisse alémanique,Nidwald,1.0,B,248,436,188,NW
7,Suisse alémanique,Glaris,1.8,B,416,713,297,GL
8,Suisse alémanique,Zoug,2.3,B,1543,2615,1072,ZG
9,Suisse alémanique,Soleure,2.6,A,3801,6628,2827,SO


In [12]:
geo_swiss_geo_path = r'Data/ch-cantons.topojson.geojson'

In [14]:
swiss_map = folium.Map(location=[47, 8], zoom_start=7)
swiss_map.choropleth(geo_data=geo_swiss_geo_path, data=swiss_unemployed,
             columns=['Code', 'Taux de chômage'],
             key_on='feature.properties.id',
             fill_color='BuPu', fill_opacity=0.7, line_opacity=0.2,
             legend_name='Percentage of unemployment (%)')
swiss_map

### Differences in unemployment between foreigners and natives

Here we have two different questions again: one about the difference in employment regarding nationality, and the second regarding age. For both, we proceed similarly as above, but select the option "Classes d'âge 15-24, 25-49, 50 ans et plus" for one of them and "Nationalité" for the other.

In [None]:
Image(IMAGE_FOLDER + 'amstat05.png')

In [None]:
Image(IMAGE_FOLDER + 'amstat06.png')

To start off, we import the data set downloaded previously and clean it as necessary.
In our case, we need to flatten some of the structure.

In [None]:
swiss_foreign_unemployed = pd.read_excel(DATA_SWITZERLAND_BY_NATIONALITY, header=[0,1]) 

In [None]:
del swiss_foreign_unemployed['Mois'] #remove empty line
del swiss_foreign_unemployed['Total'] #only interessted in september

In [None]:
swiss_foreign_unemployed.dropna(inplace=True) #final 'total' column is NA
swiss_foreign_unemployed.reset_index(inplace=True) #add index
swiss_foreign_unemployed.columns

In [None]:
swiss_foreign_unemployed.rename(columns={'index':'Région linguistique'}, inplace=True)
swiss_foreign_unemployed.columns.name = None #we don't want a super column
swiss_foreign_unemployed.head()

In [None]:
swiss_foreign_unemployed.columns = swiss_foreign_unemployed.columns.droplevel(1)

In [None]:
swiss_foreign_unemployed.iloc[:,0:4]

### The Röstigraben