# First Laboratory
### Group "Trap" - Data Science for Business 2022

In this notebook we're importing three datasets describing created, renewed and ceased job contracts in Lombardy (Italy). The purpose is to analyze the job market before and during the Covid emergency in order to investigate if there are categories that were more affected than others by the pandemic.

The three datasets share the same structure, making working with them a bit easier.


In order to make the notebook work without changes, the dataset must be downloaded from the following links (then it may be needed to update the DATA path in the next cell, were all data files were cells are expected to be found).

|   | Filename                         | Link |
|---|----------------------------------|------|
| 1 | Rapporti_di_lavoro_attivati.csv  | [Download](https://www.dati.lombardia.it/api/views/qbau-cyuc/rows.csv?accessType=DOWNLOAD)    |
| 2 | Rapporti_di_lavoro_cessati.csv   | [Download](https://www.dati.lombardia.it/api/views/nwz3-p6vm/rows.csv?accessType=DOWNLOAD)     |
| 3 | Rapporti_di_lavoro_prorogati.csv | [Download](https://www.dati.lombardia.it/api/views/chng-cman/rows.csv?accessType=DOWNLOAD)     |

In [1]:
# Remember the trailing slash
data_path = "~/data/"

## Importing the datasets

1. We load the datasets the with _pandas_. 
2. We convert the columns __DATA__ to the _datetime_ type.
3. We sort the DataFrame by the __DATA__ column.

#### NOTE

The _ceased_ dataset has one row with an incorrect date in the year 2600: such date is too big to be parsed correctly, so we use the option "_coerce_" to make it parse incorrect dates as _NaT_. We decided to filter out this record later since we can't be sure about what the real date is supposed to be.


In [2]:
import pandas as pd

def prepare_dataset(path): 
    ds = pd.read_csv(path)
    ds["DATA"] = pd.to_datetime(ds["DATA"], dayfirst=True, errors="coerce")
    return ds.sort_values("DATA", ignore_index=True)

# Load Created, Ceased and Renewed contracts
created = prepare_dataset(data_path + "Rapporti_di_lavoro_attivati.csv")
ceased = prepare_dataset(data_path + "Rapporti_di_lavoro_cessati.csv")
renewed = prepare_dataset(data_path + "/Rapporti_di_lavoro_prorogati.csv")

## A first look into the data

Our goal is to analyze a specific period of time, so let's look into the datasets to check if the data for such period is present. The dataset have already been sorted by date in the previous step.

It is possible to notice that all the datasets start way before the period we decided to analyze, so that's good! Whereas, for what concerns more recent data, it seems that the data of all the dataset has been updated until December 9th 2021. It may seem like December is incomplete but in reality it looks like data has been loaded only during the first days of each month __(*)__.

This will give us two equally sized periods for comparison:
* January 2017 - December 2019 - pre-pandemic period
* January 2020 - December 2021 - corresponding, more or less, with the pandemic

#### NOTE
There is a record with an incorrect date in the _created_ dataset (2201-09-06). We decided to filter out this record  since we can't be sure about what the real date is supposed to be. Moreover the datasets have millions of records, so this would hardly make any difference.

In [3]:
#created
ceased
#renewed

Unnamed: 0,DATA,GENERE,ETA,SETTOREECONOMICODETTAGLIO,TITOLOSTUDIO,CONTRATTO,MODALITALAVORO,PROVINCIAIMPRESA,ITALIANO
0,1988-01-05,F,28,Attività di famiglie e convivenze come datori ...,NESSUN TITOLO DI STUDIO,LAVORO DOMESTICO A TEMPO INDETERMINATO,TEMPO PARZIALE ORIZZONTALE,MILANO,ITALIA
1,1988-12-07,F,19,Attività di famiglie e convivenze come datori ...,NESSUN TITOLO DI STUDIO,LAVORO DOMESTICO A TEMPO INDETERMINATO,TEMPO PARZIALE ORIZZONTALE,BERGAMO,ITALIA
2,1989-10-04,F,37,Attività di famiglie e convivenze come datori ...,NESSUN TITOLO DI STUDIO,LAVORO DOMESTICO A TEMPO INDETERMINATO,TEMPO PARZIALE ORIZZONTALE,MILANO,ITALIA
3,1990-06-04,F,42,Attività di famiglie e convivenze come datori ...,NESSUN TITOLO DI STUDIO,LAVORO DOMESTICO A TEMPO INDETERMINATO,TEMPO PARZIALE ORIZZONTALE,COMO,ITALIA
4,1992-09-04,F,33,Attività di famiglie e convivenze come datori ...,NESSUN TITOLO DI STUDIO,LAVORO DOMESTICO A TEMPO INDETERMINATO,TEMPO PARZIALE ORIZZONTALE,LECCO,ITALIA
...,...,...,...,...,...,...,...,...,...
3741389,2021-12-09,F,38,Servizi logistici relativi alla distribuzione ...,LICENZA MEDIA,LAVORO A TEMPO INDETERMINATO,TEMPO PARZIALE MISTO,MILANO,ROMANIA
3741390,2021-12-09,M,60,Fabbricazione di altri prodotti cartotecnici,LICENZA MEDIA,LAVORO A TEMPO INDETERMINATO,TEMPO PIENO,BERGAMO,ITALIA
3741391,2021-12-09,F,38,Servizi logistici relativi alla distribuzione ...,LICENZA MEDIA,LAVORO A TEMPO INDETERMINATO,TEMPO PARZIALE MISTO,MILANO,ROMANIA
3741392,2021-12-09,F,36,Servizi logistici relativi alla distribuzione ...,LICENZA MEDIA,LAVORO A TEMPO INDETERMINATO,TEMPO PARZIALE MISTO,MILANO,ROMANIA


### (*) 
In the next cell we can see how dates never stretch past the 12th of each month.

This indicates that the day is probably not relevant and it might be better to reason in terms on year and/or month.

In [4]:
# Show possible days for each dataset
print(f"""
created:  {created['DATA'].transform(lambda x: x.day).unique()}
ceased:   {ceased['DATA'].transform(lambda x: x.day).unique()}
renewed:  {renewed['DATA'].transform(lambda x: x.day).unique()}
""")


created:  [ 2 11  1 10  8  4  3  7 12  6  5  9]
ceased:   [ 5.  7.  4.  6. 12. 10.  8.  1.  3.  2.  9. 11. nan]
renewed:  [ 3  4  6  7  8  9 10 11 12  5  1  2]



## Filtering by date

Now we filter the rows in our datasets by date, by only keeping the period we're interested in.
We do this in place since the original datasets take lots of memory.

We also add a new column with the kind of record. This will be useful later when we join the datasets.

In [5]:
created = created[(created["DATA"] >= "2017/01/01") & (created["DATA"] <= "2021/12/31")].reset_index(drop=True)
created["KIND"] = "created"

ceased = ceased[(ceased["DATA"] >= "2017/01/01") & (ceased["DATA"] <= "2021/12/31")].reset_index(drop=True)
ceased["KIND"] = "ceased"

renewed = renewed[(renewed["DATA"] >= "2017/01/01") & (renewed["DATA"] <= "2021/12/31")].reset_index(drop=True)
renewed["KIND"] = "renewed"

## Joining into a single dataset

We concatenate the datasets into the _jobs_ dataset

In [6]:
jobs = pd.concat([created, ceased, renewed])

In [7]:
# Saving the dataset in CSV format (if needed)...
#jobs.to_csv("Jobs.csv")

# ...or resume from previously created *jobs* dataset
#import pandas as pd
#jobs = pd.read_csv("Jobs.csv", index_col=[0])
#jobs["DATA"] = pd.to_datetime(jobs["DATA"], errors="coerce")

In [8]:
jobs

Unnamed: 0,DATA,GENERE,ETA,SETTOREECONOMICODETTAGLIO,TITOLOSTUDIO,CONTRATTO,MODALITALAVORO,PROVINCIAIMPRESA,ITALIANO,KIND
0,2017-01-01,M,50,Trasporto di merci su strada,TITOLO DI ISTRUZIONE SECONDARIA SUPERIORE (SCO...,LAVORO A TEMPO INDETERMINATO,TEMPO PIENO,BERGAMO,ITALIA,created
1,2017-01-01,M,64,"Attività di produzione cinematografica, di vid...",DIPLOMA DI ISTRUZIONE SECONDARIA SUPERIORE CH...,LAVORO AUTONOMO NELLO SPETTACOLO,NON DEFINITO,MILANO,ITALIA,created
2,2017-01-01,M,52,Coltivazioni agricole associate all'allevament...,LICENZA MEDIA,LAVORO A TEMPO DETERMINATO,TEMPO PIENO,PAVIA,ITALIA,created
3,2017-01-01,M,44,Alberghi,LICENZA MEDIA,LAVORO A TEMPO DETERMINATO,TEMPO PIENO,MILANO,ITALIA,created
4,2017-01-01,F,39,Attività di famiglie e convivenze come datori ...,NESSUN TITOLO DI STUDIO,LAVORO DOMESTICO,TEMPO PIENO,MILANO,GEORGIA,created
...,...,...,...,...,...,...,...,...,...,...
1889983,2021-12-09,F,27,Pulizia generale (non specializzata) di edifici,DIPLOMA DI ISTRUZIONE SECONDARIA SUPERIORE CH...,LAVORO A TEMPO DETERMINATO,TEMPO PARZIALE ORIZZONTALE,MILANO,ALBANIA,renewed
1889984,2021-12-09,M,23,"Commercio al dettaglio di articoli sportivi, b...",DIPLOMA DI ISTRUZIONE SECONDARIA SUPERIORE CH...,LAVORO A TEMPO DETERMINATO,TEMPO PARZIALE MISTO,BRESCIA,ITALIA,renewed
1889985,2021-12-09,M,32,Commercio al dettaglio ambulante di prodotti o...,LICENZA MEDIA,LAVORO A TEMPO DETERMINATO,TEMPO PIENO,BERGAMO,ITALIA,renewed
1889986,2021-12-09,M,27,Movimento merci relativo ad altri trasporti te...,LICENZA MEDIA,LAVORO A TEMPO DETERMINATO,TEMPO PARZIALE ORIZZONTALE,MILANO,LIBIA,renewed


## Cleaning the data

By checking unique values for each column we get these results:

1. __GENERE__: all rows are either _M_ or _F_, so nothing to do here
2. __ETA__: a few rows have really low values, like 0, 1, 2. Since these are probably errors we remove all rows with ETA < 15 from the dataset, since 15 is the legal age for working in Italy.
3. __SETTOREECONOMICODETTAGLIO__: This column is the "Codice Ateco" description for the activity. There are 1211 different values in this column, so it could be possible to use [a dataset of the Ateco codes](https://indicepa.gov.it/ipa-dati/dataset/codici-ateco) to group them into more generic categories. Nonetheless, this column it's not relevant for our investigation (which is more focused on personal characteristics), so I will leave it untouched for the moment. Also, this column has some null values, but we don't care for now.
4. __TITOLOSTUDIO__: Since a few of the categories overlap it could be a good idea to group them when doing stats. Also, a big chunk of the rows have the "NESSUN TITOLO DI STUDIO", which is pretty unrealistic, so it is probably better to skip these rows when running statistics involving this column.
5. __CONTRATTO__: Also here a few values could be grouped into macro-categories.
6. __MODALITALAVORO__: It's probably worth to group the various kind of "TEMPO PARZIALE" rows. Also we should ignore the _NON DEFINITO_ rows when doing stats on this column.
7. __PROVINCIAIMPRESA__: All good here
8. __ITALIANO__: It's probably better to change the name of the column with something more meaningful (_CITTADINANZA_). Also, for the purpose of this project it might be bertter to group rows by _ITALIANA/STRANIERA_.

In the next cells we'll make these changes, by obtaining the _jobs_cleaned_ dataset.

In [26]:
# Filtering by age
cleaned_jobs = jobs[jobs["ETA"] >= 15].copy(deep=True)

# Takes care of "grouping" categories in  a column by passing a dict of lists, 
# where the list represents the categories to replace and the key is the new name.
# When invert=False, changes the name of all values NOT in the list
#
# The order of execution of the rules is not guaranteed so it is 
# necessary to make multiple calls when that's important.
def replace_categories(dframe, column, replaceDict, invert=False):
    for k in replaceDict.keys():
        dframe.loc[dframe[column].isin(replaceDict[k])^invert, column] = k
        
# Renaming TITOLOSTUDIO to ISTRUZIONE and grouping into wider categories
cleaned_jobs.rename(columns={"TITOLOSTUDIO": "ISTRUZIONE"}, inplace=True)

replace_categories(cleaned_jobs, "ISTRUZIONE", {
    "ELEMENTARE": [
        "LICENZA ELEMENTARE"],
    "SECONDARIA INFERIORE": [
        "LICENZA MEDIA"],
    "SECONDARIA SUPERIORE": [
        "DIPLOMA DI ISTRUZIONE SECONDARIA SUPERIORE  CHE PERMETTE L'ACCESSO ALL'UNIVERSITA",
        "TITOLO DI ISTRUZIONE SECONDARIA SUPERIORE (SCOLASTICA ED EXTRA-SCOLASTICA) CHE NON PERMETTE L'ACCESSO ALL'UNIVERSITÀ ()"],
    "SCONOSCIUTO": ["NESSUN TITOLO DI STUDIO"]
})

replace_categories(cleaned_jobs, "ISTRUZIONE", {
    "TERZIARIA": [
        "ELEMENTARE",
        "SECONDARIA INFERIORE",
        "SECONDARIA SUPERIORE",
        "SCONOSCIUTO"
    ]
}, invert=True)

# Grouping CONTRATTO
replace_categories(cleaned_jobs, "CONTRATTO", {
    "TEMPO DETERMINATO": [  
        "LAVORO A TEMPO DETERMINATO", 
        "LAVORO A TEMPO DETERMINATO  PER SOSTITUZIONE",
        "CONTRATTO DI AGENZIA",
        "Lavoro a tempo determinato con piattaforma" ],
    "TEMPO INDETERMINATO": [
        "LAVORO A TEMPO INDETERMINATO", 
        "Lavoro a tempo indeterminato con piattaforma",
        "CONTRATTO DI AGENZIA"],
    "LAVORO DOMESTICO": [
        "LAVORO DOMESTICO A TEMPO INDETERMINATO"],
    "TIROCINI E APPRENDISTATI": [   
        "TIROCINIO", 
        "APPRENDISTATO PROFESSIONALIZZANTE O CONTRATTO DI MESTIERE",
        "APPRENDISTATO PER LA QUALIFICA E PER IL DIPLOMA PROFESSIONALE, IL DIPLOMA DI ISTRUZIONE SECONDARIA SUPERIORE E IL CERTIFICATO DI SPECIALIZZAZIONE TECNICA SUPERIORE",
        "CONTRATTI DI BORSA LAVORO E ALTRE WORK EXPERIENCES",
        "APPRENDISTATO DI ALTA FORMAZIONE E RICERCA",
        "CONTRATTO DI FORMAZIONE LAVORO (SOLO PUBBLICA AMMINISTRAZIONE)"],
    "AUTONOMO": [   
        "COLLABORAZIONE COORDINATA E CONTINUATIVA", 
        "LAVORO AUTONOMO NELLO SPETTACOLO"]
})

replace_categories(cleaned_jobs, "CONTRATTO", {
    "ALTRI": [
        "TEMPO DETERMINATO", 
        "TEMPO INDETERMINATO",
        "TIROCINI E APPRENDISTATI",
        "LAVORO INTERMITTENTE",
        "AUTONOMO",
        "LAVORO DOMESTICO"]
}, invert=True)

# Grouping MODALITALAVORO
replace_categories(cleaned_jobs, "MODALITALAVORO", {
    "TEMPO PARZIALE": [
        "TEMPO PARZIALE ORIZZONTALE",
            "TEMPO PARZIALE MISTO",
            "TEMPO PARZIALE VERTICALE"]
})

# Renaming and grouping CITTADINANZA (ex ITALIANO)
cleaned_jobs.rename(columns={"ITALIANO": "CITTADINANZA"}, inplace=True)
cleaned_jobs.loc[cleaned_jobs["CITTADINANZA"] != "ITALIA", "CITTADINANZA"] = "STRANIERA"
cleaned_jobs.loc[cleaned_jobs["CITTADINANZA"] == "ITALIA", "CITTADINANZA"] = "ITALIANA"

# Adding the column PERIOD, which groups record from the pre-pandemic and pandemic periods
cleaned_jobs["PERIOD"] = ""
cleaned_jobs["PERIOD"].loc[cleaned_jobs["DATA"] < "2020-01-01"] = "pre-pandemic"
cleaned_jobs["PERIOD"].loc[cleaned_jobs["DATA"] >= "2020-01-01"] = "pandemic"

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cleaned_jobs["PERIOD"].loc[cleaned_jobs["DATA"] < "2020-01-01"] = "pre-pandemic"
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cleaned_jobs["PERIOD"].loc[cleaned_jobs["DATA"] >= "2020-01-01"] = "pandemic"


In [27]:
cleaned_jobs

Unnamed: 0,DATA,GENERE,ETA,SETTOREECONOMICODETTAGLIO,ISTRUZIONE,CONTRATTO,MODALITALAVORO,PROVINCIAIMPRESA,CITTADINANZA,KIND,PERIOD
0,2017-01-01,M,50,Trasporto di merci su strada,SECONDARIA SUPERIORE,TEMPO INDETERMINATO,TEMPO PIENO,BERGAMO,ITALIANA,created,pre-pandemic
1,2017-01-01,M,64,"Attività di produzione cinematografica, di vid...",SECONDARIA SUPERIORE,AUTONOMO,NON DEFINITO,MILANO,ITALIANA,created,pre-pandemic
2,2017-01-01,M,52,Coltivazioni agricole associate all'allevament...,SECONDARIA INFERIORE,TEMPO DETERMINATO,TEMPO PIENO,PAVIA,ITALIANA,created,pre-pandemic
3,2017-01-01,M,44,Alberghi,SECONDARIA INFERIORE,TEMPO DETERMINATO,TEMPO PIENO,MILANO,ITALIANA,created,pre-pandemic
4,2017-01-01,F,39,Attività di famiglie e convivenze come datori ...,SCONOSCIUTO,LAVORO DOMESTICO,TEMPO PIENO,MILANO,STRANIERA,created,pre-pandemic
...,...,...,...,...,...,...,...,...,...,...,...
1889983,2021-12-09,F,27,Pulizia generale (non specializzata) di edifici,SECONDARIA SUPERIORE,TEMPO DETERMINATO,TEMPO PARZIALE,MILANO,STRANIERA,renewed,pandemic
1889984,2021-12-09,M,23,"Commercio al dettaglio di articoli sportivi, b...",SECONDARIA SUPERIORE,TEMPO DETERMINATO,TEMPO PARZIALE,BRESCIA,ITALIANA,renewed,pandemic
1889985,2021-12-09,M,32,Commercio al dettaglio ambulante di prodotti o...,SECONDARIA INFERIORE,TEMPO DETERMINATO,TEMPO PIENO,BERGAMO,ITALIANA,renewed,pandemic
1889986,2021-12-09,M,27,Movimento merci relativo ad altri trasporti te...,SECONDARIA INFERIORE,TEMPO DETERMINATO,TEMPO PARZIALE,MILANO,STRANIERA,renewed,pandemic


## Extracting some stats

### Gender

By calculating the % variation in created, ceased and renewed contracts we can see how gender didn't play an important role as men and women had similar performance: the decrease slightly more strong for women in new contracts, while a bit more strong for men in renewed contracts.

There's a more significant drop for men in ceased contracts. It is worth to remember that during the pandemic period in Italy it wasn't allowed to terminate contract. This could depend on men being more likely to have a "TEMPO DETERMINATO" contract? __(**)__

In [28]:
gender = cleaned_jobs.groupby(by=["PERIOD", "KIND", "GENERE"]).size()

# Calculating the % variation in the pandemic period (2020/21) 
# compared to  pre-pandemic period (2018/19)
gender["pandemic"]/gender["pre-pandemic"]*100-100

KIND     GENERE
ceased   F        -45.655545
         M        -48.082576
created  F        -52.111360
         M        -51.577395
renewed  F        -47.170643
         M        -49.711109
dtype: float64

#### (**) 

Yes, proportionally it seems men are slightly more likely to have a "TEMPO DETERMINATO" contract than women.

In [29]:
gender_total = cleaned_jobs.groupby(by=["GENERE"]).size()
gender_contract = cleaned_jobs.groupby(by=["GENERE", "CONTRATTO"]).size()
                                       
gender_contract/gender_total*100

GENERE  CONTRATTO               
F       ALTRI                        0.115497
        AUTONOMO                     4.312276
        LAVORO DOMESTICO             6.518930
        LAVORO INTERMITTENTE         6.380834
        TEMPO DETERMINATO           61.406806
        TEMPO INDETERMINATO         15.035328
        TIROCINI E APPRENDISTATI     6.230331
M       ALTRI                        0.166998
        AUTONOMO                     3.893115
        LAVORO DOMESTICO             1.139799
        LAVORO INTERMITTENTE         4.234507
        TEMPO DETERMINATO           65.632533
        TEMPO INDETERMINATO         19.629895
        TIROCINI E APPRENDISTATI     5.303153
dtype: float64

### Age

It looks like there is a significant shift in the mean age for ceased contracts, meaning that older workers were more likely to have their contract end or not being renewed.

In [30]:
age = cleaned_jobs.groupby(by=["PERIOD", "KIND"])["ETA"].mean()
age["pandemic"]-age["pre-pandemic"]

KIND
ceased     1.289837
created    0.428215
renewed   -0.110842
Name: ETA, dtype: float64

## Studies

It is possible to see how a higher education level, granted a better chance of getting a job during the pandemic, renewing a contract. At the same time it was also easier to lose the job (maybe due to the a higher cost for the companies?).

It's also worth noticing how the lower instruction level (ISTRUZIONE ELEMENTARE) had the second best performance (hypotesis: those workers were more protected by the "blocco dei licenziamenti" during the pandemic?).

In [31]:
studies = cleaned_jobs.groupby(by=["PERIOD", "KIND", "ISTRUZIONE"]).size()
studies["pandemic"]/studies["pre-pandemic"]*100-100

KIND     ISTRUZIONE                           
ceased   DIPLOMA DI SPECIALIZZAZIONE             -50.207125
         DIPLOMA TERZIARIO EXTRA-UNIVERSITARIO   -33.673469
         DIPLOMA UNIVERSITARIO                   -39.770018
         ELEMENTARE                              -55.664695
         LAUREA - Vecchio o nuovo ordinamento    -39.806382
         MASTER UNIVERSITARIO DI PRIMO LIVELLO   -36.500396
         SCONOSCIUTO                             -47.455658
         SECONDARIA INFERIORE                    -50.056818
         SECONDARIA SUPERIORE                    -46.205055
         TITOLO DI DOTTORE DI RICERCA            -49.148418
         TITOLO DI STUDIO POST-LAUREA            -41.864891
created  DIPLOMA DI SPECIALIZZAZIONE             -54.007363
         DIPLOMA TERZIARIO EXTRA-UNIVERSITARIO   -60.358451
         DIPLOMA UNIVERSITARIO                   -47.923028
         ELEMENTARE                              -50.113081
         LAUREA - Vecchio o nuovo ordinamento    -41.

### City

- As for new contracts, Milano and Varese seem to be the most affected, while Monza was the best province.
- As for renewed contracts, Cremona were greatly affected, while provinces like Lodi, Sondrio and Monza, had a better performance.
- As for ceased contracts Sondrio had the worst performance, while Lodi was the best.

Overall, it seems like provinces like Lodi or Monza held up better than the other provinces. The other provinces had worse performance but none stood out particularly.

In [32]:
city = cleaned_jobs.groupby(by=["PERIOD", "KIND", "PROVINCIAIMPRESA"]).size()
city["pandemic"]/city["pre-pandemic"]*100-100

KIND     PROVINCIAIMPRESA
ceased   BERGAMO            -44.356178
         BRESCIA            -44.566288
         COMO               -46.786492
         CREMONA            -44.992774
         LECCO              -45.484797
         LODI               -55.627191
         MANTOVA            -46.323020
         MILANO             -49.321266
         MONZA E BRIANZA    -45.648289
         PAVIA              -43.475967
         SONDRIO            -32.577702
         VARESE             -45.980680
created  BERGAMO            -48.064499
         BRESCIA            -48.494797
         COMO               -47.806536
         CREMONA            -51.395704
         LECCO              -46.515418
         LODI               -45.893791
         MANTOVA            -47.882514
         MILANO             -55.100249
         MONZA E BRIANZA    -43.907401
         PAVIA              -48.059575
         SONDRIO            -48.026165
         VARESE             -53.401973
renewed  BERGAMO            -48.597980

### Citizenship

It is clear from the data that foreign workers were the most affected by the pandemic. The only exception is with ceased contracts, which seem to have affected italians more (hypotesis, could this also be related to "blocco dei licenziamenti" like studies? __(**)__)

In [33]:
citizenship = cleaned_jobs.groupby(by=["PERIOD", "KIND", "CITTADINANZA"]).size()
citizenship["pandemic"]/citizenship["pre-pandemic"]*100-100

KIND     CITTADINANZA
ceased   ITALIANA       -46.566577
         STRANIERA      -48.228046
created  ITALIANA       -52.860843
         STRANIERA      -48.428674
renewed  ITALIANA       -49.726889
         STRANIERA      -45.728076
dtype: float64

#### (**)

Surely the composition of the foreign worker, with regard to studies, supports this hypotesis.

In [34]:
citizenship_studies = cleaned_jobs.groupby(by=["ISTRUZIONE", "CITTADINANZA"]).size()
studies_total = cleaned_jobs.groupby(by=["CITTADINANZA"]).size()

citizenship_studies/studies_total*100

ISTRUZIONE                             CITTADINANZA
DIPLOMA DI SPECIALIZZAZIONE            ITALIANA         0.562037
                                       STRANIERA        0.215178
DIPLOMA TERZIARIO EXTRA-UNIVERSITARIO  ITALIANA         0.161819
                                       STRANIERA        0.077216
DIPLOMA UNIVERSITARIO                  ITALIANA         0.900215
                                       STRANIERA        0.355030
ELEMENTARE                             ITALIANA         0.499678
                                       STRANIERA        1.773528
LAUREA - Vecchio o nuovo ordinamento   ITALIANA        12.668499
                                       STRANIERA        2.034873
MASTER UNIVERSITARIO DI PRIMO LIVELLO  ITALIANA         0.160263
                                       STRANIERA        0.064634
SCONOSCIUTO                            ITALIANA        16.216425
                                       STRANIERA       54.306631
SECONDARIA INFERIORE                  

In [19]:
cleaned_jobs.groupby(by=["ISTRUZIONE"]).size()

ISTRUZIONE
ELEMENTARE                59512
SCONOSCIUTO             1871180
SECONDARIA INFERIORE    2084575
SECONDARIA SUPERIORE    2336128
TERZIARIA                837399
dtype: int64