# Process Mining Module -  PDEng program Data Science

This notebook is part of the Process Mining module of the JADS PDEng program on Data Science. It accompanies Assignment 1 on *Event Logs and Process Visualization*. 
The collection of notebooks is a *living document* and subject to change. 

# Hands-On 1 - 'Event Logs and Process Visualization' (Python / PM4Py)

* **Responsible Lecturer**: Dr. Felix Mannhardt, [@fmannhardt](https://twitter.com/fmannhardt)
* **Last Update**: 21th April 2021

## Setup

<img src="https://pm4py.fit.fraunhofer.de/static/assets/images/pm4py-site-logo-padded.png" alt="PM4Py" style="width: 200px;"/>

In this notebook, we are using the [PM4Py library](https://pm4py.fit.fraunhofer.de/) in combination with several standard Python data science libraries:

* [pandas](https://pandas.pydata.org/)
* [plotnine](https://plotnine.readthedocs.io/en/stable/)

In [1]:
## Perform the commented out commands to install the dependencies
# %pip install pandas
# %pip install matplotlib
# %pip install pm4py

In [2]:
import pandas as pd
import pm4py
import plotnine
from plotnine import ggplot, geom_point, aes, theme_bw, coord_flip, scale_y_discrete, theme, element_text, geom_bin2d

## Assignment

In this hands-on session, you are going to explore a real-life dataset and apply what was presented in the lecture about event logs and basic process mining visualizations.
The objective is to explore your dataset and as an event log and with the learned process mining visualizations in mind.

* Analyse basic properties of the the process (business process or other process) that has generated it. 
    * What are possible case notions / what is the or what are the case identifiers?
    * What are the activities? Are all activities on the same abstraction level? Can activities be derived from other data?
    * Can activities or actions be derived from other (non-activity) data?
* Discovery a map of the process (or a sub-process) behind it.
    * Are there multiple processes that can be discovered?
    * What is the effect of taking a subset of the data (by incident type, …)? 

You may use this notebook to conduct the analysis.

## Dataset

The proposed real-life dataset to investigate is the *BPI Challenge 2014* dataset. The dataset is captured from the ITIL process of Rabobank Group ICT and has been subject to the yearly BPI challenge in 2014. Here is more informaation on the dataset and downloads links to the data files:

* [Overview](https://www.win.tue.nl/bpi/doku.php?id=2014:challenge)
* [Dataset](http://dx.doi.org/10.4121/uuid:c3e5d162-0cfd-4bb0-bd82-af5268819c35)
* [Quick Reference](https://www.win.tue.nl/bpi/lib/exe/fetch.php?media=2014:quick_reference_bpi_challenge_2014.pdf)

On the BPI Challenge 2014 website above, there are also several reports that describe and analyze the dataset in detail. We suggest to first explore the dataset without reading the reports.

## Data Loading

To simplify the data loading task, here are the initial steps:

In [3]:
# some warnings are expected here
interaction_data = pd.read_csv("https://data.4tu.nl/ndownloader/files/24031670", sep=';')
incident_data = pd.read_csv("https://data.4tu.nl/ndownloader/files/24031637", sep=';')
activity_log_incidents = pd.read_csv("https://data.4tu.nl/ndownloader/files/24060575", sep=';')
change_data = pd.read_csv("https://data.4tu.nl/ndownloader/files/24073421", sep=';')



In [4]:
interaction_data.head()

Unnamed: 0,CI Name (aff),CI Type (aff),CI Subtype (aff),Service Comp WBS (aff),Interaction ID,Status,Impact,Urgency,Priority,Category,KM number,Open Time (First Touch),Close Time,Closure Code,First Call Resolution,Handle Time (secs),Related Incident
0,SBA000243,application,Server Based Application,WBS000125,SD0000001,Closed,5,4,4,incident,KM0000987,9-9-2011 9:23,14-2-2014 9:05,Other,N,239,IM0000001
1,SUB000443,subapplication,Web Based Application,WBS000125,SD0000002,Closed,4,4,4,request for information,KM0000989,29-9-2011 14:59,13-12-2013 16:27,Software,N,406,IM0000001
2,LAP000110,computer,Laptop,WBS000187,SD0000003,Closed,4,4,4,incident,KM0000317,13-10-2011 15:47,21-10-2013 5:01,Software,N,738,
3,DTA000110,application,Desktop Application,WBS000256,SD0000004,Closed,4,4,4,incident,KM0000057,1-12-2011 15:39,21-10-2013 5:02,Unknown,N,787,
4,SBA000855,application,Server Based Application,WBS000054,SD0000005,Closed,4,4,4,incident,KM0000652,23-12-2011 16:23,21-10-2013 5:02,Software,N,459,IM0000003


In [5]:
incident_data.head()

Unnamed: 0,CI Name (aff),CI Type (aff),CI Subtype (aff),Service Component WBS (aff),Incident ID,Status,Impact,Urgency,Priority,Category,...,Unnamed: 68,Unnamed: 69,Unnamed: 70,Unnamed: 71,Unnamed: 72,Unnamed: 73,Unnamed: 74,Unnamed: 75,Unnamed: 76,Unnamed: 77
0,SUB000508,subapplication,Web Based Application,WBS000162,IM0000004,Closed,4.0,4,4.0,incident,...,,,,,,,,,,
1,WBA000124,application,Web Based Application,WBS000088,IM0000005,Closed,3.0,3,3.0,incident,...,,,,,,,,,,
2,DTA000024,application,Desktop Application,WBS000092,IM0000006,Closed,3.0,3,3.0,request for information,...,,,,,,,,,,
3,WBA000124,application,Web Based Application,WBS000088,IM0000011,Closed,4.0,4,4.0,incident,...,,,,,,,,,,
4,WBA000124,application,Web Based Application,WBS000088,IM0000012,Closed,4.0,4,4.0,incident,...,,,,,,,,,,


In [6]:
activity_log_incidents.head()

Unnamed: 0,Incident ID,DateStamp,IncidentActivity_Number,IncidentActivity_Type,Assignment Group,KM number,Interaction ID
0,IM0000004,07-01-2013 08:17:17,001A3689763,Reassignment,TEAM0001,KM0000553,SD0000007
1,IM0000004,04-11-2013 13:41:30,001A5852941,Reassignment,TEAM0002,KM0000553,SD0000007
2,IM0000004,04-11-2013 13:41:30,001A5852943,Update from customer,TEAM0002,KM0000553,SD0000007
3,IM0000004,04-11-2013 12:09:37,001A5849980,Operator Update,TEAM0003,KM0000553,SD0000007
4,IM0000004,04-11-2013 12:09:37,001A5849979,Assignment,TEAM0003,KM0000553,SD0000007


In [7]:
change_data.head()

Unnamed: 0,CI Name (aff),CI Type (aff),CI Subtype (aff),Service Component WBS (aff),Change ID,Change Type,Risk Assessment,Emergency Change,CAB-approval needed,Planned Start,...,Scheduled Downtime Start,Scheduled Downtime End,Actual Start,Actual End,Requested End Date,Change record Open Time,Change record Close Time,Originated from,# Related Interactions,# Related Incidents
0,HMD000002,hardware,MigratieDummy,WBS000195,C00000003,Release Type 11,Minor Change,N,N,30-8-2012 7:00,...,,,18-12-2013 14:00,18-12-2013 16:15,1-6-2012 0:00,1-9-2011 9:13,18-12-2013 16:16,Problem,,
1,SUB000494,subapplication,Web Based Application,WBS000162,C00000005,Release Type 13,Business Change,N,Y,4-3-2014 9:00,...,,,4-3-2014 17:52,4-3-2014 17:52,7-6-2012 12:00,6-10-2011 15:54,5-3-2014 7:03,Problem,,
2,OVR000012,no type,no subtype,WBS000256,C00000006,Release Type 11,Minor Change,N,N,1-6-2011 7:00,...,,,17-4-2013 14:00,13-12-2013 17:00,31-3-2012 17:00,7-10-2011 10:06,30-12-2013 9:40,Problem,,
3,ASW000010,software,Automation Software,WBS000284,C00000007,Standard Change Type 93,Minor Change,N,N,21-6-2013 9:00,...,,,,,2-9-2013 18:00,14-11-2011 17:17,10-10-2013 10:16,Problem,,
4,ASW000010,software,Automation Software,WBS000284,C00000008,Standard Change Type 93,Minor Change,N,N,21-10-2013 9:00,...,,,24-10-2013 0:00,25-10-2013 23:00,8-11-2013 18:00,30-11-2011 14:59,27-10-2013 14:52,Problem,,


## Event Log

Have a look at the excellent `PM4Py` documentation: https://pm4py.fit.fraunhofer.de/documentation#importing