# Problem: Predict Absenteeism From Work

Today's highly competitive business environment exerts increased pressure on employees, potentially resulting in unattainable business goals and a heightened risk of job insecurity, which, in turn, can elevate stress levels. Prolonged exposure to these stressors can negatively impact an individual's health, potentially leading to minor illnesses or even long-term conditions like depression. However, our focus is on addressing this issue from the standpoint of enhancing company productivity.

Specifically, we aim to predict absenteeism from the workplace, providing insights into whether an employee is likely to be absent for a certain number of hours during a workday. Anticipating such absences can enhance decision-making and enable us to reorganize workflow efficiently, thus preventing productivity gaps and improving overall work quality.

Defining Absenteeism: Absenteeism refers to the absence from work during regular working hours, resulting in temporary incapacity to perform standard work activities.

Throughout our analysis, we'll address key questions, such as the data on which we base our absenteeism predictions, how we measure absenteeism, and whether we should focus on predicting excessive absenteeism. Our ultimate objective is to determine whether individuals with specific characteristics are likely to be absent from work and, if so, for how many hours. This information will help us make informed decisions related to workforce management, considering factors like employees' proximity to the workplace, family size, educational background, and more.

The data set we'll be working on, is based on the data set of an already existing study about the prediction of absenteeism at work. We will be using both Primary and Secondary sources of data, as we will create some data in this Lab. We will use Python,SQL and Tableau to clean, analyze and visualize our findings. 

# Preprocessing The Data

In [16]:
import pandas as pd
pd.options.display.max_rows = 10 

In [17]:
raw_data_csv= pd.read_csv('/Users/zakariefarah/Downloads/Absenteeism_data (1).csv')

In [18]:
raw_data_csv

Unnamed: 0,ID,Reason for Absence,Date,Transportation Expense,Distance to Work,Age,Daily Work Load Average,Body Mass Index,Education,Children,Pets,Absenteeism Time in Hours
0,11,26,07/07/2015,289,36,33,239.554,30,1,2,1,4
1,36,0,14/07/2015,118,13,50,239.554,31,1,1,0,0
2,3,23,15/07/2015,179,51,38,239.554,31,1,0,0,2
3,7,7,16/07/2015,279,5,39,239.554,24,1,2,0,4
4,11,23,23/07/2015,289,36,33,239.554,30,1,2,1,2
...,...,...,...,...,...,...,...,...,...,...,...,...
695,17,10,23/05/2018,179,22,40,237.656,22,2,2,0,8
696,28,6,23/05/2018,225,26,28,237.656,24,1,1,2,3
697,18,10,24/05/2018,330,16,28,237.656,25,2,0,0,8
698,25,23,24/05/2018,235,16,32,237.656,25,3,0,0,2


In [19]:
df =raw_data_csv.copy()

In [20]:
df

Unnamed: 0,ID,Reason for Absence,Date,Transportation Expense,Distance to Work,Age,Daily Work Load Average,Body Mass Index,Education,Children,Pets,Absenteeism Time in Hours
0,11,26,07/07/2015,289,36,33,239.554,30,1,2,1,4
1,36,0,14/07/2015,118,13,50,239.554,31,1,1,0,0
2,3,23,15/07/2015,179,51,38,239.554,31,1,0,0,2
3,7,7,16/07/2015,279,5,39,239.554,24,1,2,0,4
4,11,23,23/07/2015,289,36,33,239.554,30,1,2,1,2
...,...,...,...,...,...,...,...,...,...,...,...,...
695,17,10,23/05/2018,179,22,40,237.656,22,2,2,0,8
696,28,6,23/05/2018,225,26,28,237.656,24,1,1,2,3
697,18,10,24/05/2018,330,16,28,237.656,25,2,0,0,8
698,25,23,24/05/2018,235,16,32,237.656,25,3,0,0,2


In [21]:
pd.options.display.max_columns=None
pd.options.display.max_rows=None

In [72]:
df.head(15)

Unnamed: 0,Reason_1,Reason_2,Reason_3,Reason_4,Date,Transportation Expense,Distance to Work,Age,Daily Work Load Average,Body Mass Index,Education,Children,Pets,Absenteeism Time in Hours
0,0,0,0,1,07/07/2015,289,36,33,239.554,30,1,2,1,4
1,0,0,0,0,14/07/2015,118,13,50,239.554,31,1,1,0,0
2,0,0,0,1,15/07/2015,179,51,38,239.554,31,1,0,0,2
3,1,0,0,0,16/07/2015,279,5,39,239.554,24,1,2,0,4
4,0,0,0,1,23/07/2015,289,36,33,239.554,30,1,2,1,2
5,0,0,0,1,10/07/2015,179,51,38,239.554,31,1,0,0,2
6,0,0,0,1,17/07/2015,361,52,28,239.554,27,1,1,4,8
7,0,0,0,1,24/07/2015,260,50,36,239.554,23,1,4,0,4
8,0,0,1,0,06/07/2015,155,12,34,239.554,25,1,2,0,40
9,0,0,0,1,13/07/2015,235,11,37,239.554,29,3,1,1,8


In [23]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 700 entries, 0 to 699
Data columns (total 12 columns):
 #   Column                     Non-Null Count  Dtype  
---  ------                     --------------  -----  
 0   ID                         700 non-null    int64  
 1   Reason for Absence         700 non-null    int64  
 2   Date                       700 non-null    object 
 3   Transportation Expense     700 non-null    int64  
 4   Distance to Work           700 non-null    int64  
 5   Age                        700 non-null    int64  
 6   Daily Work Load Average    700 non-null    float64
 7   Body Mass Index            700 non-null    int64  
 8   Education                  700 non-null    int64  
 9   Children                   700 non-null    int64  
 10  Pets                       700 non-null    int64  
 11  Absenteeism Time in Hours  700 non-null    int64  
dtypes: float64(1), int64(10), object(1)
memory usage: 65.8+ KB


# Exploratory Data Analysis

After getting an overview of the dataset, there are no missing values or colomns with mixed data types. 

We are trying to predict Absenteeism using regression Analysis. "Absenteeism time in hours" will be our dependent variable as that is the target we are trying to preict while all other colomns will be independent variables.

In [24]:
df= df.drop(["ID"], axis= 1)

In [25]:
df.head()

Unnamed: 0,Reason for Absence,Date,Transportation Expense,Distance to Work,Age,Daily Work Load Average,Body Mass Index,Education,Children,Pets,Absenteeism Time in Hours
0,26,07/07/2015,289,36,33,239.554,30,1,2,1,4
1,0,14/07/2015,118,13,50,239.554,31,1,1,0,0
2,23,15/07/2015,179,51,38,239.554,31,1,0,0,2
3,7,16/07/2015,279,5,39,239.554,24,1,2,0,4
4,23,23/07/2015,289,36,33,239.554,30,1,2,1,2


In [26]:
df['Reason for Absence'].min()

0

In [27]:
df['Reason for Absence'].max()

28

In [28]:
df['Reason for Absence'].unique()

array([26,  0, 23,  7, 22, 19,  1, 11, 14, 21, 10, 13, 28, 18, 25, 24,  6,
       27, 17,  8, 12,  5,  9, 15,  4,  3,  2, 16])

In [29]:
len(df['Reason for Absence'].unique())

28

In [30]:
sorted(df['Reason for Absence'].unique())# Reason Number 20 is missing

[0,
 1,
 2,
 3,
 4,
 5,
 6,
 7,
 8,
 9,
 10,
 11,
 12,
 13,
 14,
 15,
 16,
 17,
 18,
 19,
 21,
 22,
 23,
 24,
 25,
 26,
 27,
 28]

To perform Quantitative analysis, we need to add categorical meaning to our nominal values. One of them is turning these values into dummy variables. In econometrics, statistics and data analytics in general, or more particularly, in regression analysis, a dummy variable is an explanatory binary variable that equals 1 if a certain categorical effect is present, and that equals 0 if that same effect is absent.

We will also check to make sure that every specific case(which represent the 700 rows going down), has only 1 reason for every absence entered into the data. By summing each row into a new column called "check":

If we get 0: Missing reason for a specific entry|
If we get 1: Single value(Hurray!)|
If Number>1:Multiple reasons

In [31]:
reason_columns=pd.get_dummies(df["Reason for Absence"])

In [32]:
reason_columns.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,21,22,23,24,25,26,27,28
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0
1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0
3,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0


In [33]:
reason_columns['Check'] = reason_columns.sum(axis=1)
reason_columns.head()#New coloumn to check if there is 1 reason per absence

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,21,22,23,24,25,26,27,28,Check
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1
1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1
2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1
3,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1
4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1


In [34]:
reason_columns['Check'].sum(axis=0)#700 proves our logic to assume

700

In [35]:
reason_columns['Check'].unique()# Confirms no duplicate values/missing #'s'

array([1])

In [36]:
reason_columns=reason_columns.drop(['Check'],axis=1)
reason_columns.head()#Now we can drop the Check coloumn

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,21,22,23,24,25,26,27,28
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0
1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0
3,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0


In [37]:
reason_columns=pd.get_dummies(df["Reason for Absence"],drop_first=True)
reason_columns.head()#Drop to avoid multicollinearity issues with the number 0

Unnamed: 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,21,22,23,24,25,26,27,28
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0
3,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0


# Classification: Group the Reasons for Absence

In [38]:
df.columns.values

array(['Reason for Absence', 'Date', 'Transportation Expense',
       'Distance to Work', 'Age', 'Daily Work Load Average',
       'Body Mass Index', 'Education', 'Children', 'Pets',
       'Absenteeism Time in Hours'], dtype=object)

In [39]:
reason_columns.columns.values

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
       18, 19, 21, 22, 23, 24, 25, 26, 27, 28])

If we add these dummy variables to our current DataFrame, it results in duplicate information. Specifically, the categorical variable "reason for absence" conveys the same information as the 27 columns derived from the reason categories. In the fields of econometrics, statistics, and data analytics, this is known as multicollinearity, a phenomenon to be generally avoided.

In [40]:
df=df.drop(['Reason for Absence'],axis=1,errors='ignore')#I accidently ran this line twice,Since there's obviously no column I got an error
df.head()#The errors='ignore' just ignores the simple error

Unnamed: 0,Date,Transportation Expense,Distance to Work,Age,Daily Work Load Average,Body Mass Index,Education,Children,Pets,Absenteeism Time in Hours
0,07/07/2015,289,36,33,239.554,30,1,2,1,4
1,14/07/2015,118,13,50,239.554,31,1,1,0,0
2,15/07/2015,179,51,38,239.554,31,1,0,0,2
3,16/07/2015,279,5,39,239.554,24,1,2,0,4
4,23/07/2015,289,36,33,239.554,30,1,2,1,2


In [41]:
reason_columns.loc[:, 1:14].head()#.loc[]: This is a label-based indexer in Pandas, used for selecting a group of rows and columns by labels or a boolean array.

Unnamed: 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,1,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [42]:
reason_type_1=reason_columns.loc[:,1:14].max(axis=1)#Storing the groups into 4 main series
reason_type_2=reason_columns.loc[:,15:17].max(axis=1)
reason_type_3=reason_columns.loc[:,18:21].max(axis=1)
reason_type_4=reason_columns.loc[:,22:].max(axis=1)

 The .max(axis=1) operation is applied along the columns (axis=1), and it effectively collapses the specified range of columns (1 to 14, 15 to 17, 18 to 21, and 22 to 28) into a single column with the maximum value for each row.

So, the result is a new Series for each reason_type with an index corresponding to the original 700 rows and a single column indicating the maximum value within the specified range for each row. Therefore, it does shrink the number of columns from the original range (14, 3, 4, and 7 columns, respectively) to a single column for each reason_type.

These new Series can be more easily merged into the main DataFrame (df) because each Series now represents a summarized version of the information in the specified column ranges.

At this point we have our original dataframe df, reason_coloumns which have the 27 dummy variables with binary values(0s and 1s), and 4 individual series for 4 different classified reasons that we plan to merge soon with df

# Concatenate Column Values

In [43]:
df = pd.concat([df, reason_type_1,reason_type_2,reason_type_3, reason_type_4],axis=1)
df.head()

Unnamed: 0,Date,Transportation Expense,Distance to Work,Age,Daily Work Load Average,Body Mass Index,Education,Children,Pets,Absenteeism Time in Hours,0,1,2,3
0,07/07/2015,289,36,33,239.554,30,1,2,1,4,0,0,0,1
1,14/07/2015,118,13,50,239.554,31,1,1,0,0,0,0,0,0
2,15/07/2015,179,51,38,239.554,31,1,0,0,2,0,0,0,1
3,16/07/2015,279,5,39,239.554,24,1,2,0,4,1,0,0,0
4,23/07/2015,289,36,33,239.554,30,1,2,1,2,0,0,0,1


In [44]:
df.columns.values

array(['Date', 'Transportation Expense', 'Distance to Work', 'Age',
       'Daily Work Load Average', 'Body Mass Index', 'Education',
       'Children', 'Pets', 'Absenteeism Time in Hours', 0, 1, 2, 3],
      dtype=object)

In [45]:
columns_names= ['Date', 'Transportation Expense', 'Distance to Work', 'Age',
       'Daily Work Load Average', 'Body Mass Index', 'Education',
       'Children', 'Pets', 'Absenteeism Time in Hours', 'Reason_1', 'Reason_2', 'Reason_3', 'Reason_4']

In [46]:
df.columns=columns_names

In [47]:
df.head()

Unnamed: 0,Date,Transportation Expense,Distance to Work,Age,Daily Work Load Average,Body Mass Index,Education,Children,Pets,Absenteeism Time in Hours,Reason_1,Reason_2,Reason_3,Reason_4
0,07/07/2015,289,36,33,239.554,30,1,2,1,4,0,0,0,1
1,14/07/2015,118,13,50,239.554,31,1,1,0,0,0,0,0,0
2,15/07/2015,179,51,38,239.554,31,1,0,0,2,0,0,0,1
3,16/07/2015,279,5,39,239.554,24,1,2,0,4,1,0,0,0
4,23/07/2015,289,36,33,239.554,30,1,2,1,2,0,0,0,1


# Reorder Columns

In [48]:
column_names_reordered = ['Reason_1', 'Reason_2', 'Reason_3', 'Reason_4','Date', 'Transportation Expense', 'Distance to Work', 'Age',
       'Daily Work Load Average', 'Body Mass Index', 'Education',
       'Children', 'Pets', 'Absenteeism Time in Hours']#Reorder columns to put the reasons on the left. Pay attention to the syntax

In [49]:
df=df[column_names_reordered]#Set the original equal to the new variable containing the list of reordered columns
df.head()

Unnamed: 0,Reason_1,Reason_2,Reason_3,Reason_4,Date,Transportation Expense,Distance to Work,Age,Daily Work Load Average,Body Mass Index,Education,Children,Pets,Absenteeism Time in Hours
0,0,0,0,1,07/07/2015,289,36,33,239.554,30,1,2,1,4
1,0,0,0,0,14/07/2015,118,13,50,239.554,31,1,1,0,0
2,0,0,0,1,15/07/2015,179,51,38,239.554,31,1,0,0,2
3,1,0,0,0,16/07/2015,279,5,39,239.554,24,1,2,0,4
4,0,0,0,1,23/07/2015,289,36,33,239.554,30,1,2,1,2


# Create a Checkpoint

In [50]:
df_reason_mod=df.copy()#Creating a checkpoint to save this dataframe and finishing this project in a copied version
df_reason_mod.head()# Any major mistakes can have us return to this point which is better than restarting everything

Unnamed: 0,Reason_1,Reason_2,Reason_3,Reason_4,Date,Transportation Expense,Distance to Work,Age,Daily Work Load Average,Body Mass Index,Education,Children,Pets,Absenteeism Time in Hours
0,0,0,0,1,07/07/2015,289,36,33,239.554,30,1,2,1,4
1,0,0,0,0,14/07/2015,118,13,50,239.554,31,1,1,0,0
2,0,0,0,1,15/07/2015,179,51,38,239.554,31,1,0,0,2
3,1,0,0,0,16/07/2015,279,5,39,239.554,24,1,2,0,4
4,0,0,0,1,23/07/2015,289,36,33,239.554,30,1,2,1,2


In [51]:
type(df_reason_mod['Date'][0])# There can be only 1 data type per column so the first row shows this column to be strings

str

In [52]:
df_reason_mod['Date']=pd.to_datetime(df_reason_mod['Date'],format = '%d/%m/%Y')#Format allows python to read the column correctly
df_reason_mod['Date'].head()#The Y has to be capital and it explicitly tells python to read it in DMY

0   2015-07-07
1   2015-07-14
2   2015-07-15
3   2015-07-16
4   2015-07-23
Name: Date, dtype: datetime64[ns]

# Extract Months and Days of the Week

In [53]:
df_reason_mod['Date'][0]

Timestamp('2015-07-07 00:00:00')

In [54]:
df_reason_mod['Date'][0].month#Extract month

7

In [55]:
df_reason_mod.shape

(700, 14)

In [56]:
list_months=[]
for i in range(df_reason_mod.shape[0]):#Iterate through 700 nd not 14
    list_months.append(df_reason_mod['Date'][i].month)

In [57]:
len(list_months)

700

In [58]:
df_reason_mod['Month Value']= list_months
df_reason_mod.head()

Unnamed: 0,Reason_1,Reason_2,Reason_3,Reason_4,Date,Transportation Expense,Distance to Work,Age,Daily Work Load Average,Body Mass Index,Education,Children,Pets,Absenteeism Time in Hours,Month Value
0,0,0,0,1,2015-07-07,289,36,33,239.554,30,1,2,1,4,7
1,0,0,0,0,2015-07-14,118,13,50,239.554,31,1,1,0,0,7
2,0,0,0,1,2015-07-15,179,51,38,239.554,31,1,0,0,2,7
3,1,0,0,0,2015-07-16,279,5,39,239.554,24,1,2,0,4,7
4,0,0,0,1,2015-07-23,289,36,33,239.554,30,1,2,1,2,7


In [59]:
df_reason_mod['Date'][699].weekday()

3

In [60]:
def date_to_weekday(date_value):#Create a function that returns the day of the week value
    return date_value.weekday()

In [61]:
df_reason_mod['Day of the week'] = df_reason_mod['Date'].apply(date_to_weekday)

In [62]:
df_reason_mod.head()

Unnamed: 0,Reason_1,Reason_2,Reason_3,Reason_4,Date,Transportation Expense,Distance to Work,Age,Daily Work Load Average,Body Mass Index,Education,Children,Pets,Absenteeism Time in Hours,Month Value,Day of the week
0,0,0,0,1,2015-07-07,289,36,33,239.554,30,1,2,1,4,7,1
1,0,0,0,0,2015-07-14,118,13,50,239.554,31,1,1,0,0,7,1
2,0,0,0,1,2015-07-15,179,51,38,239.554,31,1,0,0,2,7,2
3,1,0,0,0,2015-07-16,279,5,39,239.554,24,1,2,0,4,7,3
4,0,0,0,1,2015-07-23,289,36,33,239.554,30,1,2,1,2,7,3


In [63]:
df_reason_mod.drop(['Date'], axis=1, inplace=True)

In [64]:
# Define the desired column order
column_order = ['Reason_1', 'Reason_2', 'Reason_3', 'Reason_4', 'Month Value', 'Day of the week', 'Transportation Expense', 'Distance to Work', 'Age', 'Daily Work Load Average', 'Body Mass Index', 'Education', 'Children', 'Pets', 'Absenteeism Time in Hours']

# Reorder the columns
df_reason_mod = df_reason_mod[column_order]

In [65]:
df_reason_date_mod = df_reason_mod.copy()
df_reason_date_mod.head()

Unnamed: 0,Reason_1,Reason_2,Reason_3,Reason_4,Month Value,Day of the week,Transportation Expense,Distance to Work,Age,Daily Work Load Average,Body Mass Index,Education,Children,Pets,Absenteeism Time in Hours
0,0,0,0,1,7,1,289,36,33,239.554,30,1,2,1,4
1,0,0,0,0,7,1,118,13,50,239.554,31,1,1,0,0
2,0,0,0,1,7,2,179,51,38,239.554,31,1,0,0,2
3,1,0,0,0,7,3,279,5,39,239.554,24,1,2,0,4
4,0,0,0,1,7,3,289,36,33,239.554,30,1,2,1,2


# Analysis of the Next 5 columns in the Dataframe

In [66]:
df_reason_date_mod['Education'].value_counts() #Transform Education into a dummy variable to extract meaning

1    583
3     73
2     40
4      4
Name: Education, dtype: int64

In [67]:
df_reason_date_mod['Education'] = df_reason_date_mod['Education'].map(({1:0, 2:1, 3:1, 4:1}))# We want to group 2,3,4 as we want to group them into post secondary education

In [68]:
df_reason_date_mod['Education'].value_counts()

0    583
1    117
Name: Education, dtype: int64

In [69]:
df_reason_date_mod['Education'].unique()#We modified the column succesfully

array([0, 1])

# Final Checkpoint

In [70]:
df_preprocessed=df_reason_date_mod.copy()
df_preprocessed.head(10)

Unnamed: 0,Reason_1,Reason_2,Reason_3,Reason_4,Month Value,Day of the week,Transportation Expense,Distance to Work,Age,Daily Work Load Average,Body Mass Index,Education,Children,Pets,Absenteeism Time in Hours
0,0,0,0,1,7,1,289,36,33,239.554,30,0,2,1,4
1,0,0,0,0,7,1,118,13,50,239.554,31,0,1,0,0
2,0,0,0,1,7,2,179,51,38,239.554,31,0,0,0,2
3,1,0,0,0,7,3,279,5,39,239.554,24,0,2,0,4
4,0,0,0,1,7,3,289,36,33,239.554,30,0,2,1,2
5,0,0,0,1,7,4,179,51,38,239.554,31,0,0,0,2
6,0,0,0,1,7,4,361,52,28,239.554,27,0,1,4,8
7,0,0,0,1,7,4,260,50,36,239.554,23,0,4,0,4
8,0,0,1,0,7,0,155,12,34,239.554,25,0,2,0,40
9,0,0,0,1,7,0,235,11,37,239.554,29,1,1,1,8
