# Process Mining Module -  PDEng program Data Science

This notebook is part of the Process Mining module of the JADS PDEng program on Data Science. It accompanies Assignment 1 on *Event Logs and Process Visualization*. 
The collection of notebooks is a *living document* and subject to change. 

# Hands-On 1 - 'Event Logs and Process Visualization' (R / bupaR)

* **Responsible Lecturer**: Dr. Felix Mannhardt, [@fmannhardt](https://twitter.com/fmannhardt)
* **Last Update**: 21th April 2021

## Setup

<img src="http://bupar.net/images/logo_text.PNG" alt="bupaR" style="width: 200px;"/>

In this notebook, we are going to need the `tidyverse` and the `bupaR` packages.

In [23]:
## Perform the commented out commands below in a separate R session
# install.packages("tidyverse")
# install.packages("bupaR")

In [24]:
# for larger and readable plots
options(jupyter.plot_scale=1.25)

In [25]:
# the initial execution of these may give you warnings that you can safely ignore
library(tidyverse)
library(bupaR)
library(processanimateR)

## Assignment

In this hands-on session, you are going to explore a real-life dataset and apply what was presented in the lecture about event logs and basic process mining visualizations.
The objective is to explore your dataset and as an event log and with the learned process mining visualizations in mind.

* Analyse basic properties of the the process (business process or other process) that has generated it. 
    * What are possible case notions / what is the or what are the case identifiers?
    * What are the activities? Are all activities on the same abstraction level? Can activities be derived from other data?
    * Can activities or actions be derived from other (non-activity) data?
* Discovery a map of the process (or a sub-process) behind it.
    * Are there multiple processes that can be discovered?
    * What is the effect of taking a subset of the data (by incident type, …)? 

You may use this notebook to conduct the analysis.

## Dataset

The proposed real-life dataset to investigate is the *BPI Challenge 2014* dataset. The dataset is captured from the ITIL process of Rabobank Group ICT and has been subject to the yearly BPI challenge in 2014. Here is more informaation on the dataset and downloads links to the data files:

* [Overview](https://www.win.tue.nl/bpi/doku.php?id=2014:challenge)
* [Dataset](http://dx.doi.org/10.4121/uuid:c3e5d162-0cfd-4bb0-bd82-af5268819c35)
* [Quick Reference](https://www.win.tue.nl/bpi/lib/exe/fetch.php?media=2014:quick_reference_bpi_challenge_2014.pdf)

On the BPI Challenge 2014 website above, there are also several reports that describe and analyze the dataset in detail. We suggest to first explore the dataset without reading the reports.

## Data Loading

To simplify the data loading task, here are the initial steps:

In [26]:
# some warnings are expected here
interaction_data <- read_csv2("https://data.4tu.nl/ndownloader/files/24031670")
incident_data <- read_csv2("https://data.4tu.nl/ndownloader/files/24031637")
activity_log_incidents <- read_csv2("https://data.4tu.nl/ndownloader/files/24060575")
change_data <- read_csv2("https://data.4tu.nl/ndownloader/files/24073421")

[36mi[39m Using [34m[34m','[34m[39m as decimal and [34m[34m'.'[34m[39m as grouping mark. Use [30m[47m[30m[47m`read_delim()`[47m[30m[49m[39m for more control.


[36m--[39m [1m[1mColumn specification[1m[22m [36m------------------------------------------------------------------------------------------------[39m
cols(
  `CI Name (aff)` = [31mcol_character()[39m,
  `CI Type (aff)` = [31mcol_character()[39m,
  `CI Subtype (aff)` = [31mcol_character()[39m,
  `Service Comp WBS (aff)` = [31mcol_character()[39m,
  `Interaction ID` = [31mcol_character()[39m,
  Status = [31mcol_character()[39m,
  Impact = [32mcol_double()[39m,
  Urgency = [32mcol_double()[39m,
  Priority = [32mcol_double()[39m,
  Category = [31mcol_character()[39m,
  `KM number` = [31mcol_character()[39m,
  `Open Time (First Touch)` = [31mcol_character()[39m,
  `Close Time` = [31mcol_character()[39m,
  `Closure Code` = [31mcol_character()[39m,
  `First Call Resolution` = [31mc

In [27]:
interaction_data %>% head()

CI Name (aff),CI Type (aff),CI Subtype (aff),Service Comp WBS (aff),Interaction ID,Status,Impact,Urgency,Priority,Category,KM number,Open Time (First Touch),Close Time,Closure Code,First Call Resolution,Handle Time (secs),Related Incident
<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<dbl>,<dbl>,<dbl>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<dbl>,<chr>
SBA000243,application,Server Based Application,WBS000125,SD0000001,Closed,5,4,4,incident,KM0000987,9-9-2011 9:23,14-2-2014 9:05,Other,N,239,IM0000001
SUB000443,subapplication,Web Based Application,WBS000125,SD0000002,Closed,4,4,4,request for information,KM0000989,29-9-2011 14:59,13-12-2013 16:27,Software,N,406,IM0000001
LAP000110,computer,Laptop,WBS000187,SD0000003,Closed,4,4,4,incident,KM0000317,13-10-2011 15:47,21-10-2013 5:01,Software,N,738,
DTA000110,application,Desktop Application,WBS000256,SD0000004,Closed,4,4,4,incident,KM0000057,1-12-2011 15:39,21-10-2013 5:02,Unknown,N,787,
SBA000855,application,Server Based Application,WBS000054,SD0000005,Closed,4,4,4,incident,KM0000652,23-12-2011 16:23,21-10-2013 5:02,Software,N,459,IM0000003
SUB000424,subapplication,Web Based Application,WBS000073,SD0000006,Closed,4,4,4,incident,KM0000702,16-1-2012 14:09,21-10-2013 5:03,Other,N,412,


In [28]:
incident_data %>% head()

CI Name (aff),CI Type (aff),CI Subtype (aff),Service Component WBS (aff),Incident ID,Status,Impact,Urgency,Priority,Category,...,X69,X70,X71,X72,X73,X74,X75,X76,X77,X78
<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<dbl>,<dbl>,<dbl>,<chr>,...,<lgl>,<lgl>,<lgl>,<lgl>,<lgl>,<lgl>,<lgl>,<lgl>,<lgl>,<lgl>
SUB000508,subapplication,Web Based Application,WBS000162,IM0000004,Closed,4,4,4,incident,...,,,,,,,,,,
WBA000124,application,Web Based Application,WBS000088,IM0000005,Closed,3,3,3,incident,...,,,,,,,,,,
DTA000024,application,Desktop Application,WBS000092,IM0000006,Closed,3,3,3,request for information,...,,,,,,,,,,
WBA000124,application,Web Based Application,WBS000088,IM0000011,Closed,4,4,4,incident,...,,,,,,,,,,
WBA000124,application,Web Based Application,WBS000088,IM0000012,Closed,4,4,4,incident,...,,,,,,,,,,
WBA000124,application,Web Based Application,WBS000088,IM0000013,Closed,4,4,4,incident,...,,,,,,,,,,


In [29]:
activity_log_incidents %>% head()

Incident ID,DateStamp,IncidentActivity_Number,IncidentActivity_Type,Assignment Group,KM number,Interaction ID
<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>
IM0000004,07-01-2013 08:17:17,001A3689763,Reassignment,TEAM0001,KM0000553,SD0000007
IM0000004,04-11-2013 13:41:30,001A5852941,Reassignment,TEAM0002,KM0000553,SD0000007
IM0000004,04-11-2013 13:41:30,001A5852943,Update from customer,TEAM0002,KM0000553,SD0000007
IM0000004,04-11-2013 12:09:37,001A5849980,Operator Update,TEAM0003,KM0000553,SD0000007
IM0000004,04-11-2013 12:09:37,001A5849979,Assignment,TEAM0003,KM0000553,SD0000007
IM0000004,04-11-2013 13:41:30,001A5852942,Assignment,TEAM0002,KM0000553,SD0000007


In [30]:
change_data %>% head()

CI Name (aff),CI Type (aff),CI Subtype (aff),Service Component WBS (aff),Change ID,Change Type,Risk Assessment,Emergency Change,CAB-approval needed,Planned Start,...,Scheduled Downtime Start,Scheduled Downtime End,Actual Start,Actual End,Requested End Date,Change record Open Time,Change record Close Time,Originated from,# Related Interactions,# Related Incidents
<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,...,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<dbl>,<dbl>
HMD000002,hardware,MigratieDummy,WBS000195,C00000003,Release Type 11,Minor Change,N,N,30-8-2012 7:00,...,,,18-12-2013 14:00,18-12-2013 16:15,1-6-2012 0:00,1-9-2011 9:13,18-12-2013 16:16,Problem,,
SUB000494,subapplication,Web Based Application,WBS000162,C00000005,Release Type 13,Business Change,N,Y,4-3-2014 9:00,...,,,4-3-2014 17:52,4-3-2014 17:52,7-6-2012 12:00,6-10-2011 15:54,5-3-2014 7:03,Problem,,
OVR000012,no type,no subtype,WBS000256,C00000006,Release Type 11,Minor Change,N,N,1-6-2011 7:00,...,,,17-4-2013 14:00,13-12-2013 17:00,31-3-2012 17:00,7-10-2011 10:06,30-12-2013 9:40,Problem,,
ASW000010,software,Automation Software,WBS000284,C00000007,Standard Change Type 93,Minor Change,N,N,21-6-2013 9:00,...,,,,,2-9-2013 18:00,14-11-2011 17:17,10-10-2013 10:16,Problem,,
ASW000010,software,Automation Software,WBS000284,C00000008,Standard Change Type 93,Minor Change,N,N,21-10-2013 9:00,...,,,24-10-2013 0:00,25-10-2013 23:00,8-11-2013 18:00,30-11-2011 14:59,27-10-2013 14:52,Problem,,
STA000026,application,Standard Application,WBS000284,C00000008,Standard Change Type 93,Minor Change,N,N,21-10-2013 9:00,...,,,24-10-2013 0:00,25-10-2013 23:00,8-11-2013 18:00,30-11-2011 14:59,27-10-2013 14:52,Problem,,


## Event Log

Have a look at the excellent `bupaR` documentation: http://bupar.net/creating_eventlogs.html