# Dataset: Absenteeism at work

Source: UCI Machine Learning Repository 

URL: https://archive.ics.uci.edu/ml/datasets/Absenteeism+at+work

### Dataset description 

The data set allows for several new combinations of attributes and attribute exclusions, or the modification of the attribute type (categorical, integer, or real) depending on the purpose of the research.The data set (Absenteeism at work - Part I) was used in academic research at the Universidade Nove de Julho - Postgraduate Program in Informatics and Knowledge Management.


### Categorical data information 

The data contains the following categories without (CID) patient follow-up (22), medical consultation (23), blood donation (24), laboratory examination (25), unjustified absence (26), physiotherapy (27), dental consultation (28).

1. Individual identification (ID)
2. Reason for absence (ICD).
3. Month of absence
4. Day of the week (Monday (2), Tuesday (3), Wednesday (4), Thursday (5), Friday (6))
5. Seasons (summer (1), autumn (2), winter (3), spring (4))
6. Transportation expense
7. Distance from Residence to Work (kilometers)
8. Service time
9. Age
10. Work load Average/day
11. Hit target
12. Disciplinary failure (yes=1; no=0)
13. Education (high school (1), graduate (2), postgraduate (3), master and doctor (4))
14. Son (number of children)
15. Social drinker (yes=1; no=0)
16. Social smoker (yes=1; no=0)
17. Pet (number of pet)
18. Weight
19. Height
20. Body mass index
21. Absenteeism time in hours (target)


#### Load the data

In [1]:
import requests
import zipfile
import io
import pandas as pd
from sqlalchemy.engine import create_engine

url = "https://archive.ics.uci.edu/ml/machine-learning-databases/00445/Absenteeism_at_work_AAA.zip"

# download the ZIP file
response = requests.get(url)

# extract the contents of the ZIP file
zf = zipfile.ZipFile(io.BytesIO(response.content))
df = pd.read_csv(zf.open("Absenteeism_at_work.csv"), sep=";", index_col=0)

# Replace spaces with underscores in the column names
df.columns = [c.replace(" ", "_").replace("/","_per_") for c in df.columns]

#### Store the data into a SQLite instance

In [2]:
engine = create_engine("sqlite://")

df.to_sql("absenteeism", engine)

740

#### Load Engine

In [3]:
%load_ext sql
%sql engine

#### Use JupySQL to perform the queries and answer the questions.

In [4]:
%%sql 
SELECT *
FROM absenteeism 
LIMIT 5

*  sqlite://
Done.


ID,Reason_for_absence,Month_of_absence,Day_of_the_week,Seasons,Transportation_expense,Distance_from_Residence_to_Work,Service_time,Age,Work_load_Average_per_day_,Hit_target,Disciplinary_failure,Education,Son,Social_drinker,Social_smoker,Pet,Weight,Height,Body_mass_index,Absenteeism_time_in_hours
11,26,7,3,1,289,36,13,33,239.554,97,0,1,2,1,0,1,90,172,30,4
36,0,7,3,1,118,13,18,50,239.554,97,1,1,1,1,0,0,98,178,31,0
3,23,7,4,1,179,51,18,38,239.554,97,0,1,0,1,0,0,89,170,31,2
7,7,7,5,1,279,5,14,39,239.554,97,0,1,2,1,1,0,68,168,24,4
11,23,7,5,1,289,36,13,33,239.554,97,0,1,2,1,0,1,90,172,30,2


#### Question 1 (Easy):
How many records are there in the 'absenteeism' table? 

In [8]:
# your answer here
%%sql

UsageError: Line magic function `%%sql` not found.


#### Question 2 (Medium):
On which days of the week does the average absenteeism time exceed 4 hours? 


In [None]:
# your answer here
%%sql

#### Question 3 (Hard):
Which months have a higher total absenteeism time than the previous month?  


In [None]:
# your answer here
%%sql

### References   

Martiniano, A., Ferreira, R. P., Sassi, R. J., & Affonso, C. (2012). Application of a neuro fuzzy network in prediction of absenteeism at work. In Information Systems and Technologies (CISTI), 7th Iberian Conference on (pp. 1-4). IEEE.

### Acknowledgements

Professor Gary Johns for contributing to the selection of relevant research attributes.

Professor Emeritus of Management

Honorary Concordia University Research Chair in Management

John Molson School of Business

Concordia University

Montreal, Quebec, Canada

Adjunct Professor, OB/HR Division

Sauder School of Business,

University of British Columbia

Vancouver, British Columbia, Canada