# Hours worked

Time Horizont: 2019-2022

Average annual hours worked is defined as the total number of hours actually worked per year divided by the average number of people in employment per year. Actual hours worked include regular work hours of full-time, part-time and part-year workers, paid and unpaid overtime, hours worked in additional jobs, and exclude time not worked because of public holidays, annual paid leave, own illness, injury and temporary disability, maternity leave, parental leave, schooling or training, slack work for technical or economic reasons, strike or labour dispute, bad weather, compensation leave and other reasons. The data cover employees and self-employed workers. This indicator is measured in terms of hours per worker per year. The data are published with the following health warning: The data are intended for comparisons of trends over time; they are unsuitable for comparisons of the level of average annual hours of work for a given year, because of differences in their sources and method of calculation.

Variablen: LOCATION,SUBJECT,MEASURE,TIME,Value


### Setup

In [2]:
import pandas as pd
import altair as alt

In [3]:
# Zeilenlimit deaktivieren (n=5000)
# Falls Datensatz mehr als 5000 Zeilen, dann soll es deaktiviert werden

alt.data_transformers.disable_max_rows()

DataTransformerRegistry.enable('default')

## Data

Data Import

#### Dataset Average annual hours actually worked, 2022 or latest

In [4]:
# falls Datei lokal liegt, einfach den Pfad zur CSV eingeben: ins Terminal 'pwd' eingeben, dann wird Pfad gezeigt
LINK = '/Users/Lea/Desktop/dst-projekt/hoursworked_20192022.csv'

df = pd.read_csv(LINK)

In [5]:
df
# falls 50 Zeilen werden alle angezeigt 
#df.head() für begrenzte Anzeigen

Unnamed: 0,LOCATION,INDICATOR,SUBJECT,MEASURE,FREQUENCY,TIME,Value,Flag Codes
0,AUT,HRWKD,TOT,HR_WKD,A,2019,1509.591660,
1,AUT,HRWKD,TOT,HR_WKD,A,2020,1400.786923,
2,AUT,HRWKD,TOT,HR_WKD,A,2021,1439.093920,
3,AUT,HRWKD,TOT,HR_WKD,A,2022,1443.720369,
4,BEL,HRWKD,TOT,HR_WKD,A,2019,1577.095522,
...,...,...,...,...,...,...,...,...
114,ROU,HRWKD,TOT,HR_WKD,A,2022,1808.232842,
115,EU27,HRWKD,TOT,HR_WKD,A,2019,1592.987477,
116,EU27,HRWKD,TOT,HR_WKD,A,2020,1505.611235,
117,EU27,HRWKD,TOT,HR_WKD,A,2021,1560.286783,


In [6]:
df.info()



<class 'pandas.core.frame.DataFrame'>
RangeIndex: 119 entries, 0 to 118
Data columns (total 8 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   LOCATION    119 non-null    object 
 1   INDICATOR   119 non-null    object 
 2   SUBJECT     119 non-null    object 
 3   MEASURE     119 non-null    object 
 4   FREQUENCY   119 non-null    object 
 5   TIME        119 non-null    int64  
 6   Value       119 non-null    float64
 7   Flag Codes  0 non-null      float64
dtypes: float64(2), int64(1), object(5)
memory usage: 7.6+ KB


### Eliminate Clutter

In [7]:
df.drop('FREQUENCY', axis=1, inplace=True)

In [8]:
df.drop('Flag Codes', axis=1, inplace=True)

In [9]:
df.drop('INDICATOR', axis=1, inplace=True)

In [10]:
#df.drop('SUBJECT', axis=1, inplace=True)

In [11]:
list_cat = ['LOCATION']
df = df.astype('category')

In [12]:
# Time as Year
df['TIME'] = pd.to_datetime(df['TIME'], format='%Y').dt.year


In [13]:
df.rename(columns={'Value': 'VALUE'}, inplace=True)


In [14]:
# NUR GANZZAHLIGE VALUE EINTRÄGE
df['VALUE'] = df['VALUE'].astype(int)

In [15]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 119 entries, 0 to 118
Data columns (total 5 columns):
 #   Column    Non-Null Count  Dtype   
---  ------    --------------  -----   
 0   LOCATION  119 non-null    category
 1   SUBJECT   119 non-null    category
 2   MEASURE   119 non-null    category
 3   TIME      119 non-null    int32   
 4   VALUE     119 non-null    int64   
dtypes: category(3), int32(1), int64(1)
memory usage: 3.4 KB


In [16]:
#df['TIME'] = pd.to_datetime(df['TIME'], format='%Y')


Focus on selected countries and eliminate double listing

In [17]:
df_selectedlocations = df[df['LOCATION'].isin(['OECD', 'EU27', 'DEU', 'GBR'])]

df_selectedlocations

Unnamed: 0,LOCATION,SUBJECT,MEASURE,TIME,VALUE
23,DEU,TOT,HR_WKD,2019,1372
24,DEU,TOT,HR_WKD,2020,1319
25,DEU,TOT,HR_WKD,2021,1340
26,DEU,TOT,HR_WKD,2022,1340
71,GBR,TOT,HR_WKD,2019,1537
72,GBR,TOT,HR_WKD,2020,1364
73,GBR,TOT,HR_WKD,2021,1498
74,GBR,TOT,HR_WKD,2022,1531
83,OECD,TOT,HR_WKD,2019,1766
84,OECD,TOT,HR_WKD,2020,1687


In [18]:
df_selectedlocations.count()

LOCATION    16
SUBJECT     16
MEASURE     16
TIME        16
VALUE       16
dtype: int64

In [19]:
df_selectedlocations['LOCATION'].replace({'EU27': 'European Union', 'DEU': 'Germany', 'GBR': 'United Kingdom'}, inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_selectedlocations['LOCATION'].replace({'EU27': 'European Union', 'DEU': 'Germany', 'GBR': 'United Kingdom'}, inplace=True)


### Data Exploration

In [20]:
df_selectedlocations.describe().astype(int)

Unnamed: 0,TIME,VALUE
count,16,16
mean,2020,1529
std,1,153
min,2019,1319
25%,2019,1370
50%,2020,1534
75%,2021,1615
max,2022,1766


In [21]:
#Color Scale

colors = alt.Scale(
    range=['#003f5c','#58508d','#bc5090','#ff6361','#ffa600']
)
colors


Scale({
  range: ['#003f5c', '#58508d', '#bc5090', '#ff6361', '#ffa600']
})

In [22]:
df_selectedlocations

Unnamed: 0,LOCATION,SUBJECT,MEASURE,TIME,VALUE
23,Germany,TOT,HR_WKD,2019,1372
24,Germany,TOT,HR_WKD,2020,1319
25,Germany,TOT,HR_WKD,2021,1340
26,Germany,TOT,HR_WKD,2022,1340
71,United Kingdom,TOT,HR_WKD,2019,1537
72,United Kingdom,TOT,HR_WKD,2020,1364
73,United Kingdom,TOT,HR_WKD,2021,1498
74,United Kingdom,TOT,HR_WKD,2022,1531
83,OECD,TOT,HR_WKD,2019,1766
84,OECD,TOT,HR_WKD,2020,1687


In [23]:
location_list = df_selectedlocations['LOCATION'].tolist()

alt.Chart(df_selectedlocations).mark_geoshape(
    fill='lightgray',
    stroke='white'
).project(
    "equirectangular"
).properties(
    width=500,
    height=300
)

In [24]:
from vega_datasets import data

source = alt.topo_feature(data.world_110m.url, 'countries')

#input_dropdown = alt.binding_select
param_projection = alt.param(value="equalEarth")

alt.Chart(source, width=500, height=300).mark_geoshape(
    fill='lightgray',
    stroke='gray'
).project(
    type=alt.expr(param_projection.name)
).add_params(param_projection).properties(
    width=800,
    height=400
)
# No channel encoding options are specified in this chart
# so the code is the same as for the method-based syntax.

In [25]:
linechart1 = alt.Chart(df_selectedlocations).mark_line().encode(
    x=alt.X('TIME:O', title='Jahr').axis(
        titleAnchor='start',
        labelAngle= -0,
        ),
    y=alt.Y('VALUE').scale(domain=(1000,1800)).axis(
        title='Anzahl Stunden',
        titleAnchor='end',
        grid= False,
        ),
    strokeWidth=alt.value(4), 
    color=alt.Color('LOCATION', scale=colors),
    tooltip=['LOCATION']
).properties(
    title='Hours Worked'
)


In [26]:
location_list = df_selectedlocations['LOCATION'].tolist()

linechart_labels = alt.Chart(df_selectedlocations).mark_text(align='left', dx=3).encode(
    alt.X('TIME:O', aggregate='max'),
    alt.Y('VALUE:Q', aggregate={'argmax': 'VALUE'}, scale=alt.Scale(domain=(1000,1800))),
    alt.Text('LOCATION'),
    alt.Color('LOCATION:N', legend=None, scale=alt.Scale(domain=location_list,type='ordinal')), 
).properties(
    width=800,
    height=500,    
)

In [27]:
linechart1_final = alt.layer(linechart1, linechart_labels).configure_view(
    strokeWidth=0
).configure_title(
    fontSize=20,
    anchor='start',
    fontWeight='bold',
).configure_axis(
    labelFontSize = 11,
    titleFontSize = 12,
    titleFontWeight= 'normal',
    titleColor='grey'
).configure_text(
    fontWeight='bold',
    fontSize = 12
)

linechart1_final

In [28]:
alt.Chart(df_selectedlocations).mark_bar().encode(
    x=alt.X('LOCATION').axis(
        title='Location',
        titleAnchor='start',
        labelAngle=0,
        grid=False,
        labelColor='black',
        tickColor='grey'),
    y=alt.Y('VALUE').axis(
        title='Value',
        titleAnchor='end',
        grid=False,
        labelColor='black',
        tickColor='black'),
    color=alt.Color('TIME:O', scale=colors),
    tooltip=['TIME']
).properties(
    width=600,
    height=400
)#.interactive()


Histogramm

In [31]:
histogram = alt.Chart(df_selectedlocations).mark_bar().encode(
    x=alt.X('VALUE', bin=True),
    y='count()'
).properties(
    width=800,
    height=500,
)
histogram

In [29]:
df_selectedlocations.to_csv('df_selectedlocations.csv', index=False)


### Finding Connections

In [None]:
#df_selectedlocations.corr()