This notebook aims to explore the situation in Italy by looking at its demographic composition and labor status. The data covers the period between 1996 and 2018, with quarterly observations by region (or territory). 

We will try to grasp from the data how the gender and the geographical disparities have evolved in the past 22 years. We will see how one of these disparities has been reduced in this period, while the other didn't show any improvement.

A word of caution before proceeding, this analysis has 2 big limitations (that I can point out, I can't exclude there are more):

* while seeing an increase or a decrease in the number of employed can tell us something, it can't tell us about working conditions and perspectives.
* The population is here broke down into a few age categories and, unfortunately, the size of these categories is not ideal to point out potential generational issues.

Moreover, even though the data are coming from the Italian Statistical Institute (ISTAT) and they are as official as they can be, we will find some anomalies, either directly from the source or during the upload here on Kaggle. For this reason, please keep in mind that, as always, this analysis is limited by the quality of the data that generates it.

In [None]:
import numpy as np 
import pandas as pd 

import matplotlib.pyplot as plt
import matplotlib.lines as mlines
from matplotlib.ticker import AutoMinorLocator
import matplotlib.gridspec as gridspec
import seaborn as sns
%matplotlib inline

pd.set_option('max_columns', 200)
pd.set_option('max_rows', 100)

Let's start by having a look at the data.

In [None]:
df = pd.read_csv('../input/cusersmarildownloadspopolazionecsv/popolazione.csv', delimiter=';', 
                 encoding="ISO-8859-1", dtype={'value': np.int64}, thousands='.')

df.head()

We have some problematic columns, `value` above all (being the most important information of this dataset). We can see this in the fourth row, where the original data clearly misses a trailing 0. We propose a simple cleaning procedure.

In [None]:
df[['year', 'quarter']] = df['time'].str.split('-', expand=True)
df['year'] = df.year.astype(int)
df['quarter'] = df.quarter.fillna('total')
del df['time']  # we already have 2 columns about it
del df['seleziona periodo']  # redundant
del df['tipo dato']  # only one value
del df['tipo_dato_fol']  # only one value
del df['itter107']  # only a code for terriory (territorio)

df.head()

For the issue mentioned above about the column `value`, we can try something with a groupby

In [None]:
tmp = df.copy()
tmp['magnitude'] = np.log10(tmp['value']).astype(int)

tmp['max_value'] = tmp.groupby(['territorio', 'sesso', 'eta1', 
                                       'condizione_prof', 'condizione_prof_eu', 'year']).magnitude.transform('max')

tmp['difference'] = tmp['max_value'] - tmp['magnitude']  # this goes as high as 4

tmp['value_new'] = tmp['value'] * 10**tmp['difference'] 

df['value'] = tmp['value_new']

df.head()

Initially, there were years with more people in the 15-64 age category than in the 15-74 one. Luckily, this small fix of the trailing 0's solved the issue.

I am now curios about what is the difference between professional condition (condizione professionale) and european professional condition (condizione professionale europea) as I can't find the definition on the ISTAT's website

In [None]:
pd.crosstab(df['condizione professionale'], df['condizione professionale europea'])

It doesn't make much sense to me to have both, I am just keeping the total (totale) value of `condizione professionale europea` (I promise I will rename the columns later for a more international audience)

In [None]:
df = df[df['condizione professionale europea'] == 'totale'].copy()
del df['condizione professionale europea']
del df['condizione_prof_eu']

To sense check our cleaning, I want to see if the population size matches the known value (about 61 million italians this year).

In [None]:
df[(df.quarter == 'total') & 
   (df.sesso == 'totale') & 
   (df['condizione professionale'] == 'totale')].groupby(['year','eta1','condizione_prof'], 
                                                                 as_index=False).value.sum()

So in 2018 we have about 53M people 15+ years old and about 8M people in the 0-14 class. It all makes sense.

Before moving on, there is some more data wrangling to get a better view of the italian population to do. Namely, we can appropriately combine age groups to get more granular data.

In [None]:
tmp = df.copy()
tmp['eta1'] = tmp['eta1'].str.replace('_', '-')
tmp['condizione_prof'] = tmp['condizione_prof'].str.replace('_', '-')
prof_cond = tmp[['condizione_prof', 'condizione professionale']].drop_duplicates()

# Start manipulating
tmp = pd.pivot_table(tmp, index=['year', 'territorio'], 
                     columns=['sesso', 'eta1', 'condizione_prof', 'quarter'], values='value', fill_value=0) 
tmp.columns = ['_'.join(col).strip() for col in tmp.columns.values]

for gender in ['femmine', 'maschi', 'totale']:
    for con_prof in ['3-4', '99', '2', '1-2', '1', '3D', '3A-3B-3C']:
        for quarter in ['total', 'Q1', 'Q2', 'Q3', 'Q4']: # 
            tmp[gender+'_Y65-74_'+con_prof+'_'+quarter] = tmp[gender+'_Y15-74_'+con_prof+'_'+quarter] - tmp[gender+'_Y15-64_'+con_prof+'_'+quarter]
            tmp[gender+'_Y-GE75_'+con_prof+'_'+quarter] = tmp[gender+'_Y-GE15_'+con_prof+'_'+quarter] - tmp[gender+'_Y15-74_'+con_prof+'_'+quarter]
            tmp[gender+'_Y-GE65_'+con_prof+'_'+quarter] = tmp[gender+'_Y-GE15_'+con_prof+'_'+quarter] - tmp[gender+'_Y15-64_'+con_prof+'_'+quarter]
            try:
                tmp[gender+'_Total_'+con_prof+'_'+quarter] = tmp[gender+'_Y-GE15_'+con_prof+'_'+quarter] + tmp[gender+'_Y0-14_'+con_prof+'_'+quarter]
            except KeyError:
                tmp[gender+'_Total_'+con_prof+'_'+quarter] = tmp[gender+'_Y-GE15_'+con_prof+'_'+quarter]

tmp = tmp.reset_index()
tmp = pd.melt(tmp, id_vars=['year', 'territorio'])
tmp[['sesso', 'eta1', 'condizione_prof', 'quarter']] = tmp['variable'].str.split('_', expand=True)
del tmp['variable']

df_cleaned = pd.merge(tmp, prof_cond, on='condizione_prof', how='left')

df_cleaned.head()

In [None]:
df_cleaned[(df_cleaned.eta1=='Total') & 
           (df_cleaned.sesso=='totale') & 
           (df_cleaned.quarter == 'total') & 
           (df_cleaned.condizione_prof == '99')].groupby('year', as_index=False).value.sum()

As they are still incomplete in this dataset, we will ignore the entries from 2019.

# Population trends in Italy

The first thing we can do is to see how the population evolved over they years

In [None]:
by_year = df[(df.quarter == 'total') & 
         (df['condizione professionale'] == 'totale')].groupby(['year','eta1', 'sesso'], as_index=False).value.sum()

fig, ax = plt.subplots(1,2, figsize=(16, 6), facecolor='#f7f7f7')

men = by_year[(by_year.sesso == 'maschi') & (by_year.eta1 == 'Y_GE15')]
women = by_year[(by_year.sesso == 'femmine') & (by_year.eta1 == 'Y_GE15')]

ax[0].plot(men.year, men.value, label='Men', color='green')
ax[0].plot(women.year, women.value, label='Women', color='red')
ax[0].set_title('Population 15+', fontsize=14)
ax[0].legend()

men = by_year[(by_year.sesso == 'maschi') & (by_year.eta1 == 'Y0-14')]
women = by_year[(by_year.sesso == 'femmine') & (by_year.eta1 == 'Y0-14')]

ax[1].plot(men.year, men.value, label='Men', color='green')
ax[1].plot(women.year, women.value, label='Women', color='red')
ax[1].set_title('Population 0-14', fontsize=14)
ax[1].legend()

fig.suptitle('Pupulation in Italy 1996-2018', fontsize=18)

plt.show()

Something really odd in 2010, it appears to be a problem that comes from the data source. 

However, we can see how in the late '90s the 0-14 population started decreasing, which resulted in a lower increase rate a few years later in the 15+ population. The worrisome trend is the one started around 2013 but I guess we will see the effects of that in a few years.

This first graph suggests that we need a bit more granularity to have a better view of the Italian population.

In [None]:
piv_year = pd.pivot_table(by_year, index='year', columns=['eta1', 'sesso'], values='value')
piv_year.columns = ['_'.join(col).strip() for col in piv_year.columns.values]

for gender in ['femmine', 'maschi', 'totale']:
    piv_year['Y65-74_'+gender] = piv_year['Y15-74_'+gender] - piv_year['Y15-64_'+gender]
    piv_year['Y_GE75_'+gender] = piv_year['Y_GE15_'+gender] - piv_year['Y15-74_'+gender]
    piv_year['All_'+gender] = piv_year['Y_GE15_'+gender] + piv_year['Y0-14_'+gender]

piv_year.head()

In [None]:
fig, ax = plt.subplots(5,1, figsize=(16, 24), facecolor='#f7f7f7')
fig.subplots_adjust(top=0.95)

years = piv_year.index

i = 0
for sel in ['All', 'Y0-14', 'Y15-64', 'Y65-74', 'Y_GE75']:
    piv_year[sel + '_maschi'].plot(ax=ax[i], label='Men', color='green')
    piv_year[sel + '_femmine'].plot(ax=ax[i], label='Women', color='red')
    ax[i].legend()
    ax[i].set_xticks(years)
    i += 1

ax[0].set_title('Total population', fontsize=14)
ax[1].set_title('Population 0-14', fontsize=14)
ax[2].set_title('Population 15-64', fontsize=14)
ax[3].set_title('Population 65-74', fontsize=14)
ax[4].set_title('Population 75+', fontsize=14)


fig.suptitle('Pupulation in Italy 1996-2018', fontsize=18)

plt.show()

We then see again that in 2010 there is an error for 0-14 females but that, overall, we can still form our first ideas about the Italian population.

* The population has remained stable in the late '90s and saw a steady increase from 2003 to 2013.
* Since 2013, the Italian population is slightly decreasing
* The decrease involves the population up to 65 years of age.
* The population 65+ kept increasing, and the stability of the 75+ is undeniable.
* This is in line with the common knowledge that the Italian population is getting older

The increase in population could be attributed to the increase in the number of immigrants of the mid-2000s but this is not something we can extract from this dataset (other sources on the ISTAT website might help on this matter). 

(*However, if this was the case, it would debunk a fairly popular myth in Italy that says that all the immigrants are males and we can clearly see here that the increase is identical for males and females*). 

We will try to further expand on this point once that the appropriate data becomes available.

Next, we notice that the split by gender is not giving extra insights (as the plots are nearly parallel to one another) and we thus drop it for now, focusing instead on how the population changed from 1996 (in blue) to 2018 (in red)

In [None]:
def newline(ax, p1, p2, color='black'):
    l = mlines.Line2D([p1[0],p2[0]], [p1[1],p2[1]], color=color)
    ax.add_line(l)
    return ax

def plot_pop_change(data, pop, title, range_pop='Full', prof_cond='totale'):
    fig, ax = plt.subplots(1,1,figsize=(14,14), facecolor='#f7f7f7')
    fig.subplots_adjust(top=0.95)

    by_year = data[(data.quarter == 'total') & (data.sesso == 'totale') & (data.eta1 == pop) &
             (data['condizione professionale'] == prof_cond)].groupby(['year','eta1', 'territorio'], as_index=False).value.sum()

    y_1996 = by_year[by_year.year == 1996].reset_index(drop=True)
    y_2018 = by_year[by_year.year == 2018].reset_index(drop=True)

    ax.scatter(y=y_1996['territorio'], x=y_1996['value'], s=80, color='#0e668b', alpha=0.5)
    ax.scatter(y=y_2018['territorio'], x=y_2018['value'], s=160, color='#ff0000', alpha=0.6)
    
    fig.suptitle(title, fontsize=18)

    for i, p1, p2 in zip(y_1996['territorio'], y_1996['value'], y_2018['value']):
        ax = newline(ax, [p1, i], [p2, i])
    
    if range_pop == 'Full':
        ax.set(xlim=(0,10500000), xlabel='Population')
        ax.vlines(x=2000000, ymin='Abruzzo', ymax='Veneto', color='black', alpha=1, linewidth=1, linestyles='dotted')
        ax.vlines(x=6000000, ymin='Abruzzo', ymax='Veneto', color='black', alpha=1, linewidth=1, linestyles='dotted')
        ax.set_xticks([0, 2000000, 4000000, 6000000, 8000000, 10000000])
        ax.set_xticklabels(['0', '2M', '4M', '6M', '8M', '10M'])
    elif range_pop == 'Reduced':
        ax.set(xlim=(0,1200000), xlabel='Population')
        ax.vlines(x=300000, ymin='Abruzzo', ymax='Veneto', color='black', alpha=1, linewidth=1, linestyles='dotted')
        ax.vlines(x=700000, ymin='Abruzzo', ymax='Veneto', color='black', alpha=1, linewidth=1, linestyles='dotted')
    elif range_pop == 'Employed':
        ax.set(xlim=(0,5000000), xlabel='Population')
        ax.vlines(x=1000000, ymin='Abruzzo', ymax='Veneto', color='black', alpha=1, linewidth=1, linestyles='dotted')
        ax.vlines(x=3000000, ymin='Abruzzo', ymax='Veneto', color='black', alpha=1, linewidth=1, linestyles='dotted')
    
    plt.show()

plot_pop_change(df_cleaned, 'Total', 'Variation in the Italian population by region, 1996 vs 2018')

In [None]:
plot_pop_change(df_cleaned, 'Y15-64', 'Variation in the 15-64 population by region, 1996 vs 2018')

In [None]:
plot_pop_change(df_cleaned, 'Y65-74', 'Variation in the 65-74 population by region, 1996 vs 2018', range_pop='Reduced')

In [None]:
plot_pop_change(df_cleaned, 'Y-GE75', 'Variation in the 75+ population by region, 1996 vs 2018', range_pop='Reduced')

This gives us a better view of the previous insights.

* The population grew in very few regions: Lombardia, Lazio, Emilia Romagna, and Veneto.
* The remaining regions essentially kept their population intact for 22 years.
* However, if we focus on the population 15-64, we observe that this growth disappears in almost all the regions
* Lazio is the only exception, with a visible growth
* Liguria and Piemonte had a visible decrease in the 15-64 population. This is surprising as these regions (especially Piemonte) have been historically the destination of many Italians looking for jobs in big industries. Big industries that experienced a quite strong crisis from the late '90s.
* All the growth in population, in any region, should be largely attributed to growth in the 65+ population

To summarize, the Italian population has been increasing in the past 22 years, an increase that was evident in the 2003-2013 period. However, the increase comes from a higher life expectancy rather than an increase in births. As a result, the 0-65 population has been substantially stable, while the 65+ has been steadily increasing.

With this knowledge, we can head to the main subject of this analysis

# Is there a job crisis in Italy?

First of all, we should translate to English and broaden our audience.

In [None]:
cond_prof = {'forze lavoro': 'workforce',  # the sum of employed and unemployed
             'occupati': 'employed', 
             'disoccupati': 'unemployed', 
             'inattivi': 'inactive',  # whoever is not workforce, people that can't work or are not looking for a job
             'totale': 'total',  # the sum of workforce and inactive
             'non cercano e non disponibili a lavorare': 'not_looking',  # part of the inactives
             "zona grigia dell'inattività": 'inactive_greyzone'}  # honestly don't know, but it the other half of the inactive

df_cleaned['prof_cond'] = df_cleaned['condizione professionale'].map(cond_prof)

Next, we will focus on the age range that better reflects the workforce: 15-64. We could focus on the 15+ population as well but, since we know that **the population grew older in the past 22 years, every trend would result artificially negative** as more and more people would fall into the `inactive` category as they get to the retirement age.

In [None]:
def group_year_profcond(data, gender, age='Y15-64', regions=False):
    
    if regions:
        by_year = data[(data.quarter == 'total') & 
           (data.sesso == gender) & 
           (data.eta1 == age)].groupby(['year', 'territorio', 'prof_cond']).value.sum().unstack(2).reset_index()
    else:
        by_year = data[(data.quarter == 'total') & 
           (data.sesso == gender) & 
           (data.eta1 == age)].groupby(['year', 'prof_cond']).value.sum().unstack()
    
    # some entries are inconsistent when we consider the gender split, we fix them here
    by_year['workforce'] = by_year['employed'] + by_year['unemployed']
    try:
        by_year['total'] = by_year['workforce'] + by_year['inactive']
        by_year['inactive_perc'] = by_year['inactive'] / by_year['total']
        by_year['unemployed_overInac'] = by_year['unemployed'] / by_year['inactive']
    except KeyError:
        pass
    
    
    # calculation of rates
    by_year['employed_overWF'] = by_year['employed'] / by_year['workforce']
    by_year['employed_perc'] = by_year['employed'] / by_year['total']
    by_year['unemployed_overWF'] = by_year['unemployed'] / by_year['workforce']
    by_year['unemployed_perc'] = by_year['unemployed'] / by_year['total']
    
    try:
        del by_year['inactive_greyzone']
        del by_year['not_looking']
        #del by_year['total']
    except KeyError:
        pass
    
    return by_year

total_year = group_year_profcond(df_cleaned, 'totale')
males_year = group_year_profcond(df_cleaned, 'maschi')
females_year = group_year_profcond(df_cleaned, 'femmine')

total_year.head()

In [None]:
def multiple_proportions(total_year, males_year, females_year, title, inactives=True):
    if inactives:
        fig, ax = plt.subplots(3,2, figsize=(15, 18), facecolor='#f7f7f7')
    else:
        fig, ax = plt.subplots(2,2, figsize=(15, 12), facecolor='#f7f7f7')
    fig.subplots_adjust(top=0.95)

    def plot_proportions(data, label, color, inactives=True):

        data['employed_perc'].plot(ax=ax[0][0], label=label, color=color)
        data['unemployed_perc'].plot(ax=ax[0][1], label=label, color=color)
        data['employed_overWF'].plot(ax=ax[1][0], label=label, color=color)
        data['unemployed_overWF'].plot(ax=ax[1][1], label=label, color=color)
        if inactives:
            data['inactive_perc'].plot(ax=ax[2][0], label=label, color=color)
            data['unemployed_overInac'].plot(ax=ax[2][1], label=label, color=color)

        return fig, ax

    fig, ax = plot_proportions(total_year, 'Total', 'black', inactives)
    fig, ax = plot_proportions(males_year, 'Males', 'green', inactives)
    fig, ax = plot_proportions(females_year, 'Females', 'red', inactives)

    ax[0][0].set_ylim((0, 1))
    ax[0][1].set_ylim((0, 1))
    ax[1][0].set_ylim((0, 1))
    ax[1][1].set_ylim((0, 1))
    ax[0][0].legend()
    ax[0][1].legend()
    ax[1][0].legend()
    ax[1][1].legend() 
    ax[0][0].set_title('Employed over Population', fontsize=14)
    ax[0][1].set_title('Unemployed over Population', fontsize=14)
    ax[1][0].set_title('Employed over Workforce', fontsize=14)
    ax[1][1].set_title('Unemployed over Workforce', fontsize=14)
    
    if inactives:
        ax[2][0].set_ylim((0, 1))
        ax[2][1].set_ylim((0, 1))
        ax[2][0].legend()
        ax[2][1].legend()
        ax[2][0].set_title('Inactives over Population', fontsize=14)
        ax[2][1].set_title('Unemployed over Inactives', fontsize=14)
        
    fig.suptitle(title, fontsize=18)

    plt.show()
    
    
multiple_proportions(total_year, males_year, females_year, 'Employed, Unemployed, and Inactives (15-64)')

We observe a stable growth in employment from 1996 to 2008, the year of the first economic crisis. Italy then bounced back but felt the second crisis of 2013, which was also longer. After that, the growth in employment resumed.

The employment rate is historically higher for Males but we can observe a positive trend in the Females' employment rate. This has to be attributed to a decrease in the proportions of Inactives among women 15-64 since 1996, a trend that appears to be reversed since 2015. In other words, more women started to look for a job over the years and they also found a job. In 2013 and 2014 it appears to be a massive increase in women's employment, coming out of the inactivity status, but this got reversed in 2015. (*Note: we can't exclude this is an error in the data for those years as the information about the 2 genders provided are less accurate*)

On the other hand, the employment rate has seen a significant drop for men since 2013 (the year of the last economic crisis). Given that the proportion of Inactives among men has been substantially stable, this means that we have more and more unemployed among men, bringing the unemployment rates for men and women to be nearly identical.

The next chart shows better how the increase in employment among Italians is due to a larger proportion of women coming out of the inactive category (which is very broad, going from students to people that stop looking for a job) and finding a job.

In [None]:
fig, ax = plt.subplots(3,2, figsize=(15, 15), facecolor='#f7f7f7')
fig.subplots_adjust(top=0.92)

ax[0][0].pie(total_year[total_year.index==1996][['employed_perc', 'unemployed_perc', 'inactive_perc']].values[0], 
        labels=['Employed', 'Unemployed', 'Inactive'], autopct='%.0f%%')
ax[0][1].pie(total_year[total_year.index==2018][['employed_perc', 'unemployed_perc', 'inactive_perc']].values[0], 
        labels=['Employed', 'Unemployed', 'Inactive'], autopct='%.0f%%')
ax[1][0].pie(males_year[males_year.index==1996][['employed_perc', 'unemployed_perc', 'inactive_perc']].values[0], 
        labels=['Employed', 'Unemployed', 'Inactive'], autopct='%.0f%%')
ax[1][1].pie(males_year[males_year.index==2018][['employed_perc', 'unemployed_perc', 'inactive_perc']].values[0], 
        labels=['Employed', 'Unemployed', 'Inactive'], autopct='%.0f%%')
ax[2][0].pie(females_year[females_year.index==1996][['employed_perc', 'unemployed_perc', 'inactive_perc']].values[0], 
        labels=['Employed', 'Unemployed', 'Inactive'], autopct='%.0f%%')
ax[2][1].pie(females_year[females_year.index==2018][['employed_perc', 'unemployed_perc', 'inactive_perc']].values[0], 
        labels=['Employed', 'Unemployed', 'Inactive'], autopct='%.0f%%')

ax[0][0].set_title('Total Population - 1996', fontsize=14)
ax[0][1].set_title('Total Population - 2018', fontsize=14)
ax[1][0].set_title('Males - 1996', fontsize=14)
ax[1][1].set_title('Males - 2018', fontsize=14)
ax[2][0].set_title('Females - 1996', fontsize=14)
ax[2][1].set_title('Females - 2018', fontsize=14)

fig.suptitle('Employed, Unemployed, and Inactive - 15-64 - 1996 vs 2018', fontsize=18)

plt.show()

## Does the government matter?

We can now kick the wasp nest and overlay the evolution of the employment ratio with the changes that happened in the parliament. By oversimplifying because there are not enough colors to cover all the shades of Italian politics, we will split the governments into 3 categories: Right, Left, Center. 

On the right, we have 3 Berlusconi's governments together with other parties, like Lega, which will show up later as a center. On the left, we have what is now called the Democratic Party, first with a multitude of smaller parties, then alone (and will also show up as a center at the end). At the center, we find the technical government of Monti, following the economic and political crisis of the last Berlusconi's government, the government with both the Democratic Party and Berlusconi (although the Democratic Party had the majority of the seats), and the 2 governments where the 5 Star Movements has the majority of the seats (one with Lega, one with the Democratic Party), which will show up when we extend the data to after 2018.

It is a fairly arbitrary division, but a necessary one to have some order. In red the left, in blue the right, in white the center.

*Note: probably in a future version we will do it quarterly for a better view of the government change*

In [None]:
fig, ax = plt.subplots(3,1, figsize=(18, 16), facecolor='#f7f7f7')

total_year['employed_perc'].plot(ax=ax[0], color='k')
total_year['unemployed_perc'].plot(ax=ax[1], color='k')
total_year['inactive_perc'].plot(ax=ax[2], color='k')

ax[0].set_title('Employement', fontsize=14)
ax[1].set_title('Unemployement', fontsize=14)
ax[2].set_title('Inactive', fontsize=14)

for axes in ax:
    axes.axvspan(1996, 2001, facecolor='r', alpha=0.2)  # 3 goverments + 1 technical goverment
    axes.axvline(x=2001, linestyle='--')
    axes.axvspan(2001, 2006, facecolor='b', alpha=0.2)  # 2 goverments 
    axes.axvline(x=2006, linestyle='--')
    axes.axvspan(2006, 2008, facecolor='r', alpha=0.2)  # 1 goverment
    axes.axvline(x=2008, linestyle='--')
    axes.axvspan(2008, 2011, facecolor='b', alpha=0.2)  # 1 goverment
    axes.axvline(x=2011, linestyle='--')
    axes.axvline(x=2012, linestyle='--')
    axes.axvspan(2012, 2016, facecolor='r', alpha=0.2)  # 2 governments
    axes.axvline(x=2016, linestyle='--')
    axes.axvline(x=2018, linestyle='--')
    axes.set_yticklabels(['{:,.0%}'.format(x) for x in axes.get_yticks()])

We can see that dramatic drops in the employment ratio lead to a change in the government (although not always to a change in the Parliament). The 2 financial crises happened to be during 2 different governments of the Democratic Party who, one way or the other, has also be a protagonist of the technical or center governments. We also see how the economic crises lead to an increase in the inactives rather than of the unemployed. 

To answer the question, yes, it seems to matter. However, this analysis is incapable of excluding macroeconomic factors (like financial crises) and therefore, as often happens in the political debate, we have to leave the interpretation on the governments' role to the reader.

# Regional disparity

We can now try to zoom in again and focus on how the labor status changed in the various Italian regions. Again, we need to simplify the discussion a bit and we focus on macro-regions first, defined in the next cell.

In [None]:
macro_regions = {'Abruzzo': 'South', 
                 'Basilicata': 'South', 
                 'Calabria': 'South', 
                 'Campania': 'South', 
                 'Emilia-Romagna': 'North-East', 
                 'Friuli-Venezia Giulia': 'North-East', 
                 'Lazio': 'Center', 
                 'Liguria': 'North-West', 
                 'Lombardia': 'North-West', 
                 'Marche': 'Center', 
                 'Molise': 'South', 
                 'Piemonte': 'North-West', 
                 'Provincia Autonoma Bolzano / Bozen': 'North-East', 
                 'Provincia Autonoma Trento': 'North-East', 
                 'Puglia': 'South', 
                 'Sardegna': 'Islands', 
                 'Sicilia': 'Islands', 
                 'Toscana': 'Center', 
                 'Trentino Alto Adige / Südtirol': 'North-East', 
                 'Umbria': 'Center', 
                 "Valle d'Aosta / Vallée d'Aoste": 'North-West', 
                 'Veneto': 'North-East'}

In [None]:
total_regions = group_year_profcond(df_cleaned, 'totale', regions=True)

total_regions['macro_region'] = total_regions.territorio.map(macro_regions)

macros = total_regions.groupby(['year', 
                             'macro_region'], as_index=False)[['employed', 
                                                               'unemployed', 
                                                               'inactive', 
                                                               'total']].sum().set_index('year')

macros['employed_perc'] = macros['employed'] / macros['total']
macros['unemployed_perc'] = macros['unemployed'] / macros['total']
macros['inactive_perc'] = macros['inactive'] / macros['total']


fig, ax = plt.subplots(3,1, figsize=(18, 16), facecolor='#f7f7f7')

for region in macros.macro_region.unique():
    macros[macros.macro_region == region]['employed_perc'].plot(ax=ax[0], label=region, alpha=0.7)
    macros[macros.macro_region == region]['unemployed_perc'].plot(ax=ax[1], label=region, alpha=0.7)
    macros[macros.macro_region == region]['inactive_perc'].plot(ax=ax[2], label=region, alpha=0.7)
    
total_year['employed_perc'].plot(ax=ax[0], color='k', label='Total')
total_year['unemployed_perc'].plot(ax=ax[1], color='k', label='Total')
total_year['inactive_perc'].plot(ax=ax[2], color='k', label='Total')

ax[0].set_title('Employement', fontsize=14)
ax[1].set_title('Unemployement', fontsize=14)
ax[2].set_title('Inactive', fontsize=14)

for axes in ax:
    axes.legend()
    axes.set_yticklabels(['{:,.0%}'.format(x) for x in axes.get_yticks()])

We thus see how the North-West of the country kept increasing the fraction of the employed population, substantially unaffected by any crisis or change in government. On the other hand, a similarly rich region like the North-East appears to have dramatically felt the crises, which raises concerns also on the integrity of our data.

The South and the Islands have historically a lower employment level and display some seasonality in the unemployment level. Interestingly, if we overlay the government changes we notice how every decline in employment in the Islands comes after a government in the left category.

In [None]:
fig, ax = plt.subplots(2,1, figsize=(18, 10), facecolor='#f7f7f7')
fig.subplots_adjust(top=0.92)

for region in ['South', 'Islands']:
    macros[macros.macro_region == region]['employed_perc'].plot(ax=ax[0], label=region, alpha=0.7)
    macros[macros.macro_region == region]['unemployed_perc'].plot(ax=ax[1], label=region, alpha=0.7)
    
total_year['employed_perc'].plot(ax=ax[0], color='k', label='Total')
total_year['unemployed_perc'].plot(ax=ax[1], color='k', label='Total')

ax[0].set_title('Employement', fontsize=14)
ax[1].set_title('Unemployement', fontsize=14)

for axes in ax:
    axes.axvspan(1996, 2001, facecolor='r', alpha=0.2)  # 3 goverments + 1 technical goverment
    axes.axvline(x=2001, linestyle='--')
    axes.axvspan(2001, 2006, facecolor='b', alpha=0.2)  # 2 goverments 
    axes.axvline(x=2006, linestyle='--')
    axes.axvspan(2006, 2008, facecolor='r', alpha=0.2)  # 1 goverment
    axes.axvline(x=2008, linestyle='--')
    axes.axvspan(2008, 2011, facecolor='b', alpha=0.2)  # 1 goverment
    axes.axvline(x=2011, linestyle='--')
    axes.axvline(x=2012, linestyle='--')
    axes.axvspan(2012, 2016, facecolor='r', alpha=0.2)  # 2 governments
    axes.axvline(x=2016, linestyle='--')
    axes.axvline(x=2018, linestyle='--')
    axes.set_yticklabels(['{:,.0%}'.format(x) for x in axes.get_yticks()])
    axes.legend()
    
fig.suptitle('Cyclic decline in Employement for southern regions and governement changes', fontsize=18)
plt.show()

All that being said, the disparity between North and South is evident and does not look like it is going to reduce any time soon. We indeed see that the only regions where the number of employed increased in the past 22 years are in the North or Center and, considering that the 15-64 population essentially remained stable in this period, we can conclude that the southern regions not only were starting from a lower employed ratio, but also have seen the gap with the rest of country increase.

In [None]:
plot_pop_change(df_cleaned, 'Y15-64', 'Variation in the number of Employed by region, 1996 vs 2018', 
                prof_cond='occupati', range_pop='Employed')

This is of course more evident if we look at the macro regions

In [None]:
def plot_pop_sidebyside(data, title, prof_cond='occupati'):
    fig, ax = plt.subplots(1,2,figsize=(16,6), facecolor='#f7f7f7')
    fig.subplots_adjust(top=0.87)

    by_year = data.groupby(['year', 'macro_region'], as_index=False).sum()

    y_1996 = by_year[by_year.year == 1996].reset_index(drop=True)
    y_2018 = by_year[by_year.year == 2018].reset_index(drop=True)

    ax[0].scatter(y=y_1996['macro_region'], x=y_1996['total'], s=80, color='#0e668b', alpha=0.5)
    ax[0].scatter(y=y_2018['macro_region'], x=y_2018['total'], s=160, color='#ff0000', alpha=0.6)
    ax[1].scatter(y=y_1996['macro_region'], x=y_1996[prof_cond], s=80, color='#0e668b', alpha=0.5)
    ax[1].scatter(y=y_2018['macro_region'], x=y_2018[prof_cond], s=160, color='#ff0000', alpha=0.6)

    fig.suptitle(title, fontsize=18)

    for i, p1, p2 in zip(y_1996['macro_region'], y_1996['total'], y_2018['total']):
        ax[0] = newline(ax[0], [p1, i], [p2, i])
    for i, p1, p2 in zip(y_1996['macro_region'], y_1996[prof_cond], y_2018[prof_cond]):
        ax[1] = newline(ax[1], [p1, i], [p2, i])
        
    ax[0].set(xlim=(0,10500000), xlabel='Population')
    ax[0].set_title('Population', fontsize=14)
    ax[1].set(xlim=(0,10500000), xlabel=prof_cond.capitalize())
    ax[1].set_title(f'{prof_cond.capitalize()}', fontsize=14)
    for axes in ax:
        axes.set_xticks([0, 2000000, 4000000, 6000000, 8000000, 10000000])
        axes.set_xticklabels(['0', '2M', '4M', '6M', '8M', '10M'])

    plt.show()
    
plot_pop_sidebyside(total_regions, 'Variation by Macro Regions (15-64), 1996 vs 2018', prof_cond='employed')

In [None]:
by_year = total_regions.groupby(['year', 'macro_region'], as_index=False).sum()

by_year['employed_perc'] = by_year['employed'] / by_year['total']
by_year['unemployed_perc'] = by_year['unemployed'] / by_year['total']
by_year['inactive_perc'] = by_year['inactive'] / by_year['total']

fig, ax = plt.subplots(5,2, figsize=(15, 25), facecolor='#f7f7f7')
fig.subplots_adjust(top=0.95)

for axes, region in zip(ax[:,0], ['South', 'North-West', 'North-East', 'Islands', 'Center']):
    axes.pie(by_year[(by_year.year==1996) & 
                     (by_year.macro_region==region)][['employed_perc', 'unemployed_perc', 'inactive_perc']].values[0], 
        labels=['Employed', 'Unemployed', 'Inactive'], autopct='%.0f%%')
    axes.set_title(f'{region} - 1996', fontsize=14)

for axes, region in zip(ax[:,1], ['South', 'North-West', 'North-East', 'Islands', 'Center']):
    axes.pie(by_year[(by_year.year==2018) & 
                     (by_year.macro_region==region)][['employed_perc', 'unemployed_perc', 'inactive_perc']].values[0], 
        labels=['Employed', 'Unemployed', 'Inactive'], autopct='%.0f%%')
    axes.set_title(f'{region} - 2018', fontsize=14)

fig.suptitle('Employed, Unemployed, and Inactive - 15-64 - 1996 vs 2018', fontsize=18)

plt.show()

We thus see how every macro-region, except the center, saw a **reduction in the 15-64 population but only the Center and the North had an increase in the Employed population**. Once again, we see that the biggest change can be found in the proportion of inactive. This can be because in the past it was possible to retire and receive a good pension relatively young while now is much rarer to see someone with less than 65 years of age receiving a pension. Another reason can be that in the past it was much more common to work without contracts (for official purposes, if you don't declare you are working and you are not searching, you are inactive, even if you have a job) and this phenomenon has been fought as a measure to reduce tax evasion and increase workers rights. The only macro-region not showing this trend is the one of the Islands (Sicilia and Sardegna), while the South is the only one that remained pretty much unchanged in the past 22 years.

# Italy, regions, local governments, and employment

After getting a general idea about the macro-regions of Italy, we conclude this analysis by zooming into the regions that show a peculiar pattern. On the contrary of the national politics, the local governments tend to change color less often but, even though the influence of local governments on the employment of the population is somewhat limited, we will try to study potentially interesting situations.

## North-West

We have seen already the stability of this macro-region in the past 22 years, with a steady increase in occupation. We can see this by looking at the biggest of the 4 regions: Lombardia.

In [None]:
def region_overview(data, region):
    total_year = group_year_profcond(data[data.territorio==region], 'totale', regions=True)
    males_year = group_year_profcond(data[data.territorio==region], 'maschi', regions=True)
    females_year = group_year_profcond(data[data.territorio==region], 'femmine', regions=True)
    
    total_year = total_year[total_year.year < 2019]
    males_year = males_year[males_year.year < 2019]
    females_year = females_year[females_year.year < 2019]

    fig = plt.figure(figsize=(18, 10), facecolor='#f7f7f7') 
    fig.suptitle(region, fontsize=18)

    spec = fig.add_gridspec(ncols=3, nrows=2, height_ratios=[5, 3])

    ax0 = fig.add_subplot(spec[0,:])
    ax1 = fig.add_subplot(spec[1,0])
    ax2 = fig.add_subplot(spec[1,1])
    ax3 = fig.add_subplot(spec[1,2])

    ax0.set_title('Population trend by labour status (15-64)', fontsize=14)
    ax1.set_title('Employed, Unemployed, Inactive (2018)', fontsize=12)
    ax2.set_title('Males (15-64)', fontsize=12)
    ax3.set_title('Females (15-64)', fontsize=12)
    
    total_year.set_index('year').employed_perc.plot(ax=ax0, label='Employed')
    total_year.set_index('year').unemployed_perc.plot(ax=ax0, label='Unemployed')
    total_year.set_index('year').inactive_perc.plot(ax=ax0, label='Inactive')
    
    ax1.pie(total_year[total_year.year==2018][['employed_perc', 'unemployed_perc', 'inactive_perc']].values[0], 
            labels=['Employed', 'Unemployed', 'Inactive'], autopct='%.0f%%')
    
    males_year.set_index('year').employed.plot(ax=ax2, label='Empl.')
    males_year.set_index('year').unemployed.plot(ax=ax2, label='Unempl.')
    males_year.set_index('year').inactive.plot(ax=ax2, label='Inact.')
    females_year.set_index('year').employed.plot(ax=ax3, label='Empl.')
    females_year.set_index('year').unemployed.plot(ax=ax3, label='Unempl.')
    females_year.set_index('year').inactive.plot(ax=ax3, label='Inact.')
    
    for axes in [ax0, ax2, ax3]:
        axes.set_xticks([1997, 2002, 2007, 2012, 2017])
        
    ax0.legend()
    ax0.set_ylim((0,1))
    ax0.set_ylabel('Percentage')
    ax0.set_yticklabels(['{:,.0%}'.format(x) for x in ax0.get_yticks()])

    plt.show()

In [None]:
region_overview(df_cleaned, "Lombardia")

Besides a few anomalies in the data for the unemployed when split by gender, we see a steady decrease in the inactive Females as well as an increase in employment. The local government has been essentially the same since 1995, with a coalition of what we can call Berlusconi's party and the old fashioned Lega Nord (now called just Lega).

## North-East

Here, we have seen that the financial crises had quite a strong impact on the employment rate and, although we should consider that this could be a data anomaly, it is worth investigating. Since we have just seen a region that never changed government colors in these 22 years, we can start from Emilia-Romagna, which had some parties from the left always in charge since Italy is a Republic.

In [None]:
region_overview(df_cleaned, "Emilia-Romagna")

Unfortunately, we see again some anomalies but also the increase in occupation among Women. The change is less strong than the Lombardia case but we have to consider that in 1996's Emilia-Romagna the majority of women was not considered inactive, i.e. situation was a bit better.

The first major change, however, is about the financial crisis of 2011. Here, we can observe a visible effect and the employment ratio is barely back at the pre-crisis level now.

On the opposite side (so to speak) of the political spectrum, we find Veneto, which also never changed government colors as Lombardia.

In [None]:
region_overview(df_cleaned, "Veneto")

The region shows much more instability in the employment ratio, in particular around the 2 financial crises of 2008 and 2011.

To find more stability we can look at Friuli-Venezia Giulia, which passed through a lot of local presidents from the right and one from the left (2013-2018) by showing the same steady improvement in employment. Maybe where working is not a problem, work is not a factor at the local elections. Or maybe the fact that this region is an autonomous region (has different legislation, different administration, and different taxes, to some extent) is guaranteeing the stability.

In [None]:
region_overview(df_cleaned, "Friuli-Venezia Giulia")

## Center

Again, good stability. We have regions that never changed colors (like Toscana being with a government from the left since always) but the interesting one, I think, is Lazio.

We have seen above that it was the region with the largest increase in the 15-64 population and, to make it more interesting, changed colors several times

* Left until 2000
* Right until 2005
* Left until 2010
* Right until 2013
* Left since then

At least under these metrics, the color of the local government does not matter as in the region of the capital Rome there is a steady increase in occupation. On the contrary of other regions, we can also observe a slight increase in unemployment since 2011 that still failed to climb down.

In [None]:
region_overview(df_cleaned, "Lazio")

## South

Here the situation, we have already seen this, is worse if we consider the employment ratio and we can now observe that even considering individual regions the situation is not changing enough since 1996. For example, for the first time in this analysis, we find a region that did not improve the situation of inactive women: Abruzzo

In [None]:
region_overview(df_cleaned, "Abruzzo")

Or Campania, that sadly has more inactive than employed, especially among women

In [None]:
region_overview(df_cleaned, "Campania")

And the same can be seen in all the other regions of the South.

## Islands

Both regions here are autonomous, like Friuli-Venezia Giulia above, but we have seen that they do not show the same stability.

The interesting periodic behavior observed before was due to the contribution of Sardegna, which shows also the improvement in the labor status for Women. It also the same region where we find a visible (other times was not visible due to data anomalies) in the number of employed Males since 2007.

The local government changed quite a lot too, basically changing from left to right every 2-5 years

In [None]:
region_overview(df_cleaned, "Sardegna")

The other region is Sicilia, which is, not only geographically, much closer to the South with more inactive than employed, a big gender gap that doesn't look is closing down, and a decrease in occupation among Males.

In [None]:
region_overview(df_cleaned, "Sicilia")

# Conclusions

In 1995, Italy was the 5th largest economy by GDP and the 7th by GDP per person. 

In 2005, it was the 7th largest economy by GDP and the 10th by GDP per person.

In 2015, it was the 8th largest economy by DGP and it felt off the top 10 for GDP per person.

In that span of time, the population got older and older, many big industries (from automotive, to chemistry, to siderurgy) went through several crises and the economy relied more and more on tourism and luxury products. 

The historical gap between men and women got smaller with time but, unfortunately, only in some regions, stressing out even more the disparities between the North and South of the country. If we can observe an overall improvement in the proportion of the population with a job, this was not uniform across the country. Moreover, some regions appear to be more prepared than others to absorb the effects of financial crises on a global scale.

Local governments seem to be irrelevant in altering any trend of occupation and the influence of the national government appears to be limited with, at best, short term solutions.

The situation doesn't look promising at all but not even so grim as one may think. The main problem here, in my opinion, is the disparity, both between genders and between regions. The first one has deep roots in the Italian culture and we can expect is going to take time, but we can't deny that steps in the right direction were made. It is more difficult to explain how an economic difference that comes from the creation of the country (1861) is still visible in a nation so small. 

From this analysis, which is limited to the observation of the workforce, we can see that every improvement came from reducing one of these inequalities, confirming that it is always better to do things together. 