In [1]:
import os
import pandas as pd
import numpy as np
import json
import folium
import glob
import csv
import string

# Q2 : Canton wise unemployment rate

For the second task we have downloaded from the amstat website a dataset that includes the rates of unemployed people for the last 15 years, as well as other information. Specifically:
* Registered unemployed
* Registered job seekers
* Job seekers with a job 

For our analysis we choose to keep only the fraction of people who strictly belong to the definition of unemployment of the Bureau of Labor Statistics https://www.bls.gov/cps/cps_htgm.htm: "People who are jobless, looking for a job, and available for work". That is, the number of registered job seekers minus the ones who have already a job. We can easily see that this number corresponds to the registered unemployed people. We consider important to check this difference to exclude the possbility that the "registered unemployed" class includes jobless people that are not looking for a job (and that thus don't belong to our definition of unemployment). Given the premise, since the information about "Registered job seekers" is redundant, we decide to exclude it from our dataset.

Moreover, we are interested in the unemployment rate. According to the Bureau of Labor Statistics, it is the number of unemployed people divided by the number of people in the labor force. Neverthless, in our dataset it seems to be computed as the number of unemployed people over the employed ones.


In [2]:
#Import
ch_geo = os.path.join('./topojson/', 'ch-cantons.topojson.json')
ch_rate = os.path.join('./ch_rate.xlsx')

#Reading
ch_rate_data = pd.read_excel(ch_rate, index=False)

#Clean 1st dataset
ch_rate_data = ch_rate_data.iloc[1:,:]

In [3]:
ch_rate_data.head()

Unnamed: 0,Kanton,Januar 2002,Januar 2002.1,Januar 2002.2,Februar 2002,Februar 2002.1,Februar 2002.2,März 2002,März 2002.1,März 2002.2,...,Oktober 2016.2,November 2016,November 2016.1,November 2016.2,Dezember 2016,Dezember 2016.1,Dezember 2016.2,Gesamt,Gesamt.1,Gesamt.2
1,Zürich,2.6,18757,5989,2.7,19279,6266,2.7,19617,6622,...,6923,3.7,30651,7069,3.8,31570,7021,3.4,4642209,1329157
2,Bern,1.6,8517,4281,1.7,8656,4476,1.6,8261,4892,...,4948,2.8,15753,5208,3.0,16636,5001,2.4,2308491,1042472
3,Luzern,1.8,3378,1370,1.8,3467,1380,1.8,3393,1425,...,2952,2.0,4429,3120,2.2,4883,2988,2.3,840599,463094
4,Uri,0.7,117,111,0.6,106,111,0.6,107,124,...,149,1.1,218,158,1.3,242,165,1.1,36644,31181
5,Schwyz,1.0,740,578,1.1,797,574,1.1,753,596,...,773,1.8,1557,818,1.9,1683,829,1.7,239129,155084


The first objective is to group the different categories by year and rename the data properly.

In [4]:
for year in np.arange(0,15,1):
    rate_data = pd.DataFrame({'rate_ue_{}'.format(year+2002): ch_rate_data.iloc[:,(1+36*year):(1+36*(year+1)-1):3].apply(pd.to_numeric).mean(axis=1)})
    reg_ue_data = pd.DataFrame({'reg_ue_{}'.format(year+2002): ch_rate_data.iloc[:,(2+36*year):(2+36*(year+1)-1):3].apply(pd.to_numeric).mean(axis=1)})
    js_with_job = pd.DataFrame({'js_with_job_{}'.format(year+2002): ch_rate_data.iloc[:,(3+36*year):(3+36*(year+1)-1):3].apply(pd.to_numeric).mean(axis=1)})
    ch_rate_data = pd.concat((ch_rate_data,rate_data,reg_ue_data,js_with_job),axis=1)

In [5]:
beginning_of_dataset = ch_rate_data.iloc[:,:1]
end_of_dataset = ch_rate_data.iloc[:,-45:]
ch_rate_data = pd.concat((beginning_of_dataset,end_of_dataset),axis=1)

In [6]:
ch_rate_data.head()

Unnamed: 0,Kanton,rate_ue_2002,reg_ue_2002,js_with_job_2002,rate_ue_2003,reg_ue_2003,js_with_job_2003,rate_ue_2004,reg_ue_2004,js_with_job_2004,...,js_with_job_2013,rate_ue_2014,reg_ue_2014,js_with_job_2014,rate_ue_2015,reg_ue_2015,js_with_job_2015,rate_ue_2016,reg_ue_2016,js_with_job_2016
1,Zürich,2.991667,21595.75,7283.416667,4.516667,32574.333333,9444.666667,4.483333,32402.166667,10368.5,...,5790.833333,3.175,26013.166667,5923.583333,3.416667,27985.333333,6232.916667,3.65,30083.916667,6702.0
2,Bern,1.75,9162.916667,5155.666667,2.833333,14860.416667,6651.333333,2.916667,15212.583333,7452.5,...,4514.916667,2.35,13189.333333,4866.0,2.5,14116.416667,4922.666667,2.708333,15283.166667,4801.083333
3,Luzern,2.0,3780.0,1560.583333,3.116667,5881.666667,2334.833333,3.141667,5955.333333,2676.666667,...,2489.75,1.866667,4147.083333,2435.916667,1.975,4383.333333,2643.916667,2.058333,4601.75,2864.833333
4,Uri,0.716667,124.416667,169.5,1.066667,188.0,181.5,1.183333,208.833333,200.666667,...,182.75,1.05,202.916667,148.083333,1.033333,199.666667,163.833333,1.041667,201.5,161.916667
5,Schwyz,1.225,878.166667,730.5,2.108333,1490.5,1034.25,2.283333,1630.333333,1019.666667,...,773.0,1.525,1344.083333,794.25,1.583333,1388.333333,758.666667,1.766667,1537.166667,752.416667


Our next step is to use folium to visualize Swiss cantons and assign the corresponding unemployment rates.
Notice that we have downloaded the data in German in order to better match the names of the cantons with the ones provided in the geographical data.

In [7]:
#Loading
ch_geo = open('geojson/switzerland.geojson')
ch_geo = ch_geo.read()
ch_geo = json.loads(ch_geo)

#Assign geoJson
ch_map = folium.Map(location=[46.8,8.1], zoom_start=7.5)
folium.GeoJson(ch_geo, name='ch_geo').add_to(ch_map)

#Visualize
ch_map

In [8]:
ch_map.choropleth(geo_data=ch_geo, data=ch_rate_data,
             columns=['Kanton', 'rate_ue_2008'],
             key_on='feature.properties.name',
             fill_color='OrRd', fill_opacity=0.7, line_opacity=0.2,
             legend_name='Unemployment rate (%)')
folium.LayerControl().add_to(ch_map)
ch_map

In the previous analysis we considered only the unemployed people looking for a job. If we want to look at the overall population looking for a job, we have to consider also those job seekers who already have a job.
Since the rates for this group are not provided, we will calculate them as the number of job seekers over the employed population. We assume that the "Unemployment rate" has been computed from the "Registered unemployed" data and thus we will use this information to derive the number of employed people for each canton.

unemployment rate = n. unemployed/n. employed 

=> n. employed = n. unemployed/unemployed rate

In [9]:
# Compute labor force  2016
ch_rate_data['emp_2016'] = ch_rate_data['reg_ue_2016'] / ch_rate_data['rate_ue_2016'] 

Now we can calculate the rate of job seekers who are already employed on the overall labor force.

In [10]:
# Compute job seekers / labor force  2016
ch_rate_data['rate_js_with_job_2016'] = ch_rate_data['js_with_job_2016'] / ch_rate_data['emp_2016']

In [11]:
ch_rate_data.head()

Unnamed: 0,Kanton,rate_ue_2002,reg_ue_2002,js_with_job_2002,rate_ue_2003,reg_ue_2003,js_with_job_2003,rate_ue_2004,reg_ue_2004,js_with_job_2004,...,reg_ue_2014,js_with_job_2014,rate_ue_2015,reg_ue_2015,js_with_job_2015,rate_ue_2016,reg_ue_2016,js_with_job_2016,emp_2016,rate_js_with_job_2016
1,Zürich,2.991667,21595.75,7283.416667,4.516667,32574.333333,9444.666667,4.483333,32402.166667,10368.5,...,26013.166667,5923.583333,3.416667,27985.333333,6232.916667,3.65,30083.916667,6702.0,8242.16895,0.813135
2,Bern,1.75,9162.916667,5155.666667,2.833333,14860.416667,6651.333333,2.916667,15212.583333,7452.5,...,13189.333333,4866.0,2.5,14116.416667,4922.666667,2.708333,15283.166667,4801.083333,5643.015385,0.850801
3,Luzern,2.0,3780.0,1560.583333,3.116667,5881.666667,2334.833333,3.141667,5955.333333,2676.666667,...,4147.083333,2435.916667,1.975,4383.333333,2643.916667,2.058333,4601.75,2864.833333,2235.668016,1.281422
4,Uri,0.716667,124.416667,169.5,1.066667,188.0,181.5,1.183333,208.833333,200.666667,...,202.916667,148.083333,1.033333,199.666667,163.833333,1.041667,201.5,161.916667,193.44,0.837038
5,Schwyz,1.225,878.166667,730.5,2.108333,1490.5,1034.25,2.283333,1630.333333,1019.666667,...,1344.083333,794.25,1.583333,1388.333333,758.666667,1.766667,1537.166667,752.416667,870.09434,0.864753


In [12]:
ch_map.choropleth(geo_data=ch_geo, data=ch_rate_data,
             columns=['Kanton', 'rate_js_with_job_2016'],
             key_on='feature.properties.name',
             fill_color='OrRd', fill_opacity=0.7, line_opacity=0.2,
             legend_name='Unemployment rate (%)')
folium.LayerControl().add_to(ch_map)
ch_map

## TBD LATER : Calculate the same but by using other categories like maybe young people

# Q3. Calculate rate by using `ch_nationality` and `ch_age datasets`, use the same preprocessing as above (rates are available)

For this task we have downloaded different datasets that include the unemployment rates in Switzerland divided by cathegories, in particular:

* Swiss VS foreign workers
* Three age groups: 

Let's first import the data.

In [33]:
#Import
ch_nationality = os.path.join('./ch_nationality.xlsx')
ch_age = os.path.join('./ch_age.xlsx')
ch_age_nat_total = os.path.join('./ch_age_nat_total.xlsx')

#Reading
ch_nationality_data = pd.read_excel(ch_nationality, index=False)
ch_age_data = pd.read_excel(ch_age, index=False)
ch_age_nat_total_data = pd.read_excel(ch_age_nat_total, index=False)

#Clean 1st dataset
ch_nationality_data = ch_nationality_data.iloc[1:,:]
ch_nationality_data.head()

Unnamed: 0,Kanton,Nationalität,Januar 2002,Januar 2002.1,Januar 2002.2,Februar 2002,Februar 2002.1,Februar 2002.2,März 2002,März 2002.1,...,Oktober 2016.2,November 2016,November 2016.1,November 2016.2,Dezember 2016,Dezember 2016.1,Dezember 2016.2,Gesamt,Gesamt.1,Gesamt.2
1,Zürich,Ausländer,4.8,8573,2935,5.0,8828,3031,5.1,8985,...,3295,6.1,13988,3336,6.4,14614,3343,5.7,1997108,591683
2,Zürich,Schweizer,1.9,10184,3054,1.9,10451,3235,1.9,10632,...,3628,2.8,16663,3733,2.9,16956,3678,2.6,2645101,737474
3,Bern,Ausländer,4.6,3390,1628,4.7,3475,1724,4.5,3351,...,2154,6.9,6083,2237,7.3,6488,2016,5.7,809568,366157
4,Bern,Schweizer,1.1,5127,2653,1.2,5181,2752,1.1,4910,...,2794,2.0,9670,2971,2.1,10148,2985,1.8,1498923,676315
5,Luzern,Ausländer,4.6,1477,622,4.7,1506,629,4.6,1475,...,1405,4.6,1861,1457,5.1,2100,1367,5.4,344364,194170


In [30]:
for year in np.arange(0,7,1):
    rate_data = pd.DataFrame({'rate_ue_{}'.format(year+2002): ch_nationality_data.iloc[:,(2+36*year):(2+36*(year+1)-1):3].apply(pd.to_numeric).mean(axis=1)})
    reg_ue_data = pd.DataFrame({'reg_ue_{}'.format(year+2002): ch_nationality_data.iloc[:,(3+36*year):(3+36*(year+1)-1):3].apply(pd.to_numeric).mean(axis=1)})
    js_with_job = pd.DataFrame({'js_with_job_{}'.format(year+2002): ch_nationality_data.iloc[:,(4+36*year):(4+36*(year+1)-1):3].apply(pd.to_numeric).mean(axis=1)})
    ch_nationality_data = pd.concat((ch_nationality_data,rate_data,reg_ue_data,js_with_job),axis=1)
    ch_nationality_data

In [31]:
beginning_of_dataset = ch_nationality_data.iloc[:,:2]
end_of_dataset = ch_nationality_data.iloc[:,-45:]
ch_nationality_data = pd.concat((beginning_of_dataset,end_of_dataset),axis=1)

In [32]:
ch_nationality_data.head()

Unnamed: 0,Kanton,Nationalität,rate_ue_2003,reg_ue_2003,js_with_job_2003,rate_ue_2004,reg_ue_2004,js_with_job_2004,rate_ue_2005,reg_ue_2005,...,js_with_job_2005,rate_ue_2006,reg_ue_2006,js_with_job_2006,rate_ue_2007,reg_ue_2007,js_with_job_2007,rate_ue_2008,reg_ue_2008,js_with_job_2008
1,Zürich,Ausländer,7.633333,13554.833333,4063.916667,7.441667,13207.166667,4346.583333,6.766667,11990.583333,...,4285.416667,5.741667,10207.666667,4084.166667,4.575,8116.416667,3512.416667,4.316667,7679.583333,2973.916667
2,Zürich,Schweizer,3.483333,19019.5,5380.75,3.525,19195.0,6021.916667,3.133333,17051.5,...,5913.916667,2.575,13966.916667,5511.25,2.0,10915.25,4406.416667,1.841667,10010.916667,3596.583333
3,Bern,Ausländer,7.041667,5236.666667,2253.833333,6.875,5111.666667,2377.416667,6.475,4816.666667,...,2303.333333,5.8,4312.0,2194.416667,4.775,3544.0,1937.166667,4.4,3279.166667,1814.5
4,Bern,Schweizer,2.15,9623.75,4397.5,2.233333,10100.916667,5075.083333,2.216667,9993.583333,...,5163.416667,1.95,8734.0,4722.833333,1.533333,6880.333333,4002.583333,1.308333,5882.0,3426.166667
5,Luzern,Ausländer,7.291667,2337.916667,904.75,7.083333,2269.166667,994.833333,6.875,2203.5,...,1031.666667,6.075,1944.166667,1074.333333,5.075,1625.083333,1016.666667,4.925,1577.5,954.666667


# IMP : 
You have to find out the total number of each category by using the `ch_total` dataset and calculate the rates like that. 

By using the language region information, calculate an average unemployment rate across the two major linguistic regions and comment on the Röstigraben

Comment on the general trends of all the previous questions. Example : in times of crisis, foreign workers are usually kicked out. 