# [DOCUMENTATION] MASHUP PHASE (b)
#### **[WHAT]**
This Jupyter Notebook analyses the mashup datasets for "EmpowerItaly", an open data project regarding the analysis of foreigner workers presence in Italy. <b>

This part represents the last part of the divergent phase of the double diamond - here we already made some assumptions while scraping data - in this phase the main objective is to provide a potential answer to our research question. <br>

- **DEFINITION OF *ACTIVITY RATE**: [...]
- **DEFINITION OF *UNEMPLOYMENT RATE**: [...]


#### **[HOW]**

The described phase results in the attempt to verify the presence of a potential correlation between ACTIVITY RATE and UNEMPLOYMENT RATE - both for #foreigners and #native citizens.

#### **[WHY]**

This first assumption came to our interest when we actually tried to visualize data for this specific purpose: analyzing the relation between the acitivity and the real condition of the person (without gender distinction).

#### install packages

In [55]:
# install packages
!pip install plotly
!pip install chart_studio



In [56]:
# import packages
import pandas as pd
import numpy as np
import scipy as sp
import plotly.express as px
import chart_studio.plotly as py
import plotly.graph_objects as go

## ANALYZING POTENTIAL CORRELATION BETWEEN ACTIVITY RATE AND UNEMPLOYMENT RATE

# 1: TOTAL ACTIVITY RATE X UNEMPLOYMENT RATE - by year

In [57]:
# WHAT: ANALYSIS OF THE TOTAL UNEMPLOYMENT X ACTIVITY RATE (i.e. WIHTOUT PAYING ATTENTION TO THE EDUCATIONAL LEVEL)
# INSIGHT: The level of activity has more or less the same starting rate as the second level.
# The more we go upper on the educational level, the more the unemployment rate has a upper-defined span in the line.
# Furthermore, we notice how in 2022 the unemployment rate is downgraded, and the activity rate is raised in a significative way.

dunnoDf = pd.read_csv('https://raw.githubusercontent.com/openaccesstoimmigrants/openaccesstoimmigrants/main/2.VISUALIZE/mashupVizEnvironment/dunnoMashup.csv')
# dunnoDf = dunnoDf.replace('italian',0)
# dunnoDf = dunnoDf.replace('foreign',1)

#round percentages
dunnoDf.total_x = dunnoDf.total_x.round(1)
dunnoDf.total_y = dunnoDf.total_y.round(1)

#add % symbol to the main line for explainability purposes...

dunnoDf.rename(columns={"total_y": "% Activity"}, inplace=True)
dunnoDf
dunnoDf.rename(columns={"total_x": "% Unemployment"}, inplace=True)
dunnoDf

Unnamed: 0.1,Unnamed: 0,Territory,Citizenship,Year,UNEMP_ED_1,UNEMP_ED_3,% Unemployment,UNEMP_ED_2,ACT_ED_1,ACT_ED_3,% Activity,ACT_ED_2
0,0,Centro (I),foreign,2018,15.155627,10.804081,14.5,14.83671,70.985618,77.12419,73.6,75.708602
1,1,Centro (I),foreign,2019,13.568437,11.434287,14.8,17.514611,70.615605,75.93708,73.5,76.776958
2,2,Centro (I),foreign,2020,13.153746,8.624535,12.8,13.542032,66.39947,68.671068,68.0,70.110227
3,3,Centro (I),foreign,2021,13.241853,18.1026,15.2,16.586291,66.442788,71.376057,69.6,73.098578
4,4,Centro (I),foreign,2022,12.723967,12.505329,13.0,13.3772,68.150298,74.444015,70.8,72.925446
5,5,Centro (I),italian,2018,11.356605,4.986898,8.7,9.37772,52.889158,83.546888,67.8,71.418533
6,6,Centro (I),italian,2019,9.827092,5.102061,7.7,8.193194,52.15033,83.869037,67.6,71.001206
7,7,Centro (I),italian,2020,10.372889,5.024784,7.6,7.821124,49.99157,82.418189,66.1,69.852723
8,8,Centro (I),italian,2021,11.100709,4.150024,7.7,8.114135,50.522212,83.899395,66.8,70.399843
9,9,Centro (I),italian,2022,8.462975,3.369947,6.1,6.500167,51.609045,83.923159,68.0,71.828835


In [58]:
duemiladiciotto = dunnoDf[dunnoDf["Year"] == 2018]
duemiladiciannove = dunnoDf[dunnoDf["Year"] == 2019]
duemilaventi = dunnoDf[dunnoDf["Year"] == 2020]
duemilaventuno = dunnoDf[dunnoDf["Year"] == 2021]
duemilaventidue = dunnoDf[dunnoDf["Year"] == 2022]

In [59]:
fig2018 = px.scatter(
    duemiladiciotto, #dataframe
    x="Territory", #regions
    y="% Activity", #activity rate  ||
    size="% Unemployment", #bubble size, directly proportional to  ||
    color="Citizenship",#foreign/italian color relation  ||
    color_continuous_scale=px.colors.sequential.Plotly3, #color theme

)
fig2018.update_layout(
    xaxis_tickangle=30,#angle of the tick on x-axis
    xaxis_tickfont=dict(size=9), #set the font for x-axis
    yaxis_tickfont=dict(size=9), #set the font for y-axis
    margin=dict(l=500, r=20, t=50, b=20), #set the margin
    paper_bgcolor="LightSteelblue", #set the background color for chart
    title="Activity x Unemployment Rate by Macro Region (2018)"
)

In [60]:
fig2019 = px.scatter(
    duemiladiciannove, #dataframe
    x="Territory", #regions
    y="% Activity", #activity rate  ||
    size="% Unemployment", #bubble size, directly proportional to  ||
    color="Citizenship",#foreign/italian color relation  ||
    color_continuous_scale=px.colors.sequential.Plotly3, #color theme

)
fig2019.update_layout(
    xaxis_tickangle=30,#angle of the tick on x-axis
    xaxis_tickfont=dict(size=9), #set the font for x-axis
    yaxis_tickfont=dict(size=9), #set the font for y-axis
    margin=dict(l=500, r=20, t=50, b=20), #set the margin
    paper_bgcolor="LightSteelblue", #set the background color for chart
    title="Activity x Unemployment Rate by Macro Region (2019)"

)

In [61]:
fig2020 = px.scatter(
    duemilaventi, #dataframe
    x="Territory", #regions
    y="% Activity", #activity rate  ||
    size="% Unemployment", #bubble size, directly proportional to  ||
    color="Citizenship",#foreign/italian color relation  ||
    color_continuous_scale=px.colors.sequential.Plotly3, #color theme

)
fig2020.update_layout(
    xaxis_tickangle=30,#angle of the tick on x-axis
    title="Activity x Unemployment Rate by Macro Region (2020)",
    xaxis_tickfont=dict(size=9), #set the font for x-axis
    yaxis_tickfont=dict(size=9), #set the font for y-axis
    margin=dict(l=500, r=20, t=50, b=20), #set the margin
    paper_bgcolor="LightSteelblue", #set the background color for chart
)

In [62]:
fig2021 = px.scatter(
    duemilaventuno, #dataframe
    x="Territory", #regions
    y="% Activity", #activity rate  ||
    size="% Unemployment", #bubble size, directly proportional to  ||
    color="Citizenship",#foreign/italian color relation  ||
    color_continuous_scale=px.colors.sequential.Plotly3, #color theme

)
fig2021.update_layout(
    xaxis_tickangle=30,#angle of the tick on x-axis
    title="Activity x Unemployment Rate by Macro Region (2021)",
    xaxis_tickfont=dict(size=9), #set the font for x-axis
    yaxis_tickfont=dict(size=9), #set the font for y-axis
    margin=dict(l=500, r=20, t=50, b=20), #set the margin
    paper_bgcolor="LightSteelblue", #set the background color for chart
)

In [63]:
fig2022 = px.scatter(
    duemilaventidue, #dataframe
    x="Territory", #regions
    y="% Activity", #activity rate  ||
    size="% Unemployment", #bubble size, directly proportional to  ||
    color="Citizenship",#foreign/italian color relation  ||
    color_continuous_scale=px.colors.sequential.Plotly3, #color theme

)
fig2022.update_layout(
    xaxis_tickangle=30,#angle of the tick on x-axis
    title="Activity x Unemployment Rate by Macro Region (2022)",
    xaxis_tickfont=dict(size=9), #set the font for x-axis
    yaxis_tickfont=dict(size=9), #set the font for y-axis
    margin=dict(l=500, r=20, t=50, b=20), #set the margin
    paper_bgcolor="LightSteelblue", #set the background color for chart
)

In [64]:
figTot = px.scatter(
    dunnoDf, #dataframe
    x="Territory", #regions
    y="% Activity", #activity rate  ||
    size="% Unemployment", #bubble size, directly proportional to  ||
    color="Citizenship",#foreign/italian color relation  ||
    color_continuous_scale=px.colors.sequential.Plotly3, #color theme

)
figTot.update_layout(
    xaxis_tickangle=30,#angle of the tick on x-axis
    title="Activity x Unemployment Rate by Macro Region (2018-2022)",
    xaxis_tickfont=dict(size=9), #set the font for x-axis
    yaxis_tickfont=dict(size=9), #set the font for y-axis
    margin=dict(l=500, r=20, t=50, b=20), #set the margin
    paper_bgcolor="LightSteelblue", #set the background color for chart
)