# Applied Process Mining Module

This notebook is part of an Applied Process Mining module. The collection of notebooks is a *living document* and subject to change. 

# Lecture 1 - 'Event Logs and Process Visualization' (R / bupaR)

## Setup

<img src="http://bupar.net/images/logo_text.PNG" alt="bupaR" style="width: 200px;"/>

In this notebook, we are going to need the `tidyverse` and the `bupaR` packages.

In [None]:
## Perform the commented out commands below in a separate R session
# install.packages("tidyverse")
# install.packages("bupaR")

In [None]:
# for larger and readable plots
options(jupyter.plot_scale=1.25)

In [None]:
# the initial execution of these may give you warnings that you can safely ignore
library(tidyverse)
library(bupaR)
library(processanimateR)

## Event Logs

This part introduces event logs and their unique properties that provide the basis for any Process Mining method. Together with `bupaR` several event logs are distributed that can be loaded without further processing. 
In this lecture we are going to make use of the following datasets:

* Patients, a synthetically generated example event log in a hospital setting.
* Sepsis, a real-life event log taken from a Dutch hospital. The event log is publicly available here: https://doi.org/10.4121/uuid:915d2bfb-7e84-49ad-a286-dc35f063a460 and has been used in many Process Mining related publications.

### Exploring Event Data

Let us first explore the event data without any prior knowledge about event log structure or properties. We convert the `patients` event log below to a standard `tibble` (https://tibble.tidyverse.org/) and inspect the first rows.

In [None]:
patients %>%
    as_tibble() %>%
    head()

The most important ingredient of an event log is the timestamps column `time`. This allows us to establish a sequence of events.

In [None]:
patients %>% 
  filter(time < '2017-01-31') %>% 
  ggplot(aes(time, "Event")) + 
  geom_point() + 
  theme_bw()

In [None]:
patients %>%
    as_tibble() %>% 
    distinct(handling)

In [None]:
patients %>%
    as_tibble() %>% 
    distinct(patient)  %>% 
    head()

In [None]:
patients %>%
    as_tibble() %>% 
    count(patient) %>% 
    head()

In [None]:
patients %>% 
  filter(time < '2017-01-31') %>% 
  ggplot(aes(time, patient, color = handling)) + 
  geom_point() + 
  theme_bw()

In [None]:
patients %>% 
  as_tibble() %>% 
  arrange(patient, time) %>% 
  head()

### Further resources

* [XES Standard](http://xes-standard.org/)
* [Creating event logs from CSV files in bupaR](http://bupar.net/creating_eventlogs.html)
* [Changing the case, activity notiions in bupaR](http://bupar.net/mapping.html)

### Reflection Questions

* What could be the reason a column `.order` is included in this dataset?
* How could the column `employee` be used?
* What is the use of the column `handling_id` and in which situation is it required?

## Basic Process Visualization

### Set of Traces

In [None]:
patients %>% 
  trace_explorer(coverage = 1.0, .abbreviate = T) # abbreviated here due to poor Jupyter notebook output scaling

### Dotted Chart

In [None]:
patients %>%
    filter(time < '2017-01-31') %>% 
    dotted_chart(add_end_events = T)

In [None]:
patients %>%    
    dotted_chart("relative", add_end_events = T)

We can also use `plotly` to get an interactive visualization:

In [None]:
patients %>%    
    plotly_dotted_chart("relative", add_end_events = T)

In [None]:
sepsis %>% 
    dotted_chart("relative_day",
                 sort = "start_day", 
                 units = "hours")

Check out other process visualization options using bupaR:

* [Further Dotted Charts](http://bupar.net/dotted_chart.html)
* [Exploring Time, Resources, Structuredness](http://bupar.net/exploring.html)

## Process Map Visualization

In [None]:
patients %>% 
    precedence_matrix() %>% 
    plot()

In [None]:
patients %>% 
    process_map()

In [None]:
patients %>% 
    process_map(type = performance(units = "hours"))

#### Challenge 1
Use some other attribute to be shown in the `patients` dataset.

In [None]:
#patients %>% 
#    process_map(type = custom(...))

In [None]:
patients %>% 
    animate_process(mode = "relative")

#### Challenge 2
Reproduce the example shown on the lecture slides by animating some other attribute from the `traffic_fines` dataset.

In [None]:
traffic_fines %>% 
  head()

In [None]:
traffic_fines %>% 
    # WARNING: don't animate the full log in Jupyter (at least not on Firefox - it will really slow down your browser the library does not scale well)
    bupaR::sample_n(1000) %>%
    edeaR::filter_trace_frequency(percentage=0.95) %>%
    animate_process(mode = "relative")

In [None]:
# traffic_fines %>% 

## Real-life Processes

In [None]:
sepsis %>% 
  precedence_matrix() %>% 
  plot()

# Exercises - 1st Hands-on Session

In the first hands-on session, you are going to explore a real-life dataset (see the Assignment notebook) and apply what was presented in the lecture about event logs and basic process mining visualizations. The objective is to explore your dataset and as an event log and with the learned process mining visualizations in mind.

* Analyse basic properties of the the process (business process or other process) that has generated it. 
    * What are possible case notions / what is the or what are the case identifiers?
    * What are the activities? Are all activities on the same abstraction level? Can activities be derived from other data?
    * Can activities or actions be derived from other (non-activity) data?
* Discovery a map of the process (or a sub-process) behind it.
    * Are there multiple processes that can be discovered?
    * What is the effect of taking a subset of the data?

*Hint*: You may use/copy the code from this notebook to have a starting point. 