This notebook is part of a course on Applied Process Mining. The collection of notebooks is a *living document* and subject to change.

# Lecture 2 - 'Process Discovery with the Heuristics Miner' (R / bupaR)

## Setup

<img src="http://bupar.net/images/logo_text.PNG" alt="bupaR" style="width: 200px;"/>

In this notebook, we are going to need the `tidyverse` and the `bupaR` packages.

In [None]:
## Perform the commented out commands below in a separate R session
# install.packages("tidyverse")
# install.packages("bupaR")

In [None]:
# for larger and readable plots
options(jupyter.plot_scale=1.25)

In [None]:
# the initial execution of these may give you warnings that we can safely ignore
suppressPackageStartupMessages(library(tidyverse)) 
suppressPackageStartupMessages(library(bupaR))
library(xesreadR)
library(processanimateR)
suppressPackageStartupMessages(library(heuristicsmineR))

## Process Discovery

### Process Models 🚧

bupaR does not provide an option to load BPMN models yet. Please have a look at the PM4Py instructions.

### Quality Dimensions

The quality dimensions `fitness`, `precision`, `simplicity`, and `generalisation` are best illustrated by using a small example event log.
We are using an example event log in XES format that is used in the book `Process Mining - Data Science in Action` by Wil van der Aalst, which is downloaded and stored in the `../data` directory with the code below:

In [None]:
# ignore the warnings, the package needs to be updated and no 'activity instance identifier' is required in this example
example_log <- xesreadR::read_xes("../data/Lfull.xes")

Let us have a look at the event log in tabular form. The mapping of the activity labels to actual activities is:

* a = register request, 
* b = examine thoroughly, 
* c = examine casually, 
* d = check ticket, 
* e = decide, 
* f = reinitiate request, 
* g = pay compensation, and 
* h = reject request.

In [None]:
example_log %>% head(10)

Now let us discover a process map as we have seen in Lecture 1:

In [None]:
example_log %>% process_map()

Not really very insightful the directly-follows based process map visualization.

### Heuristics Miner

#### L_heur_1 Event Log
We are using an example event log that is suited to introduce the Heuristics Miner algorithm. This event log is already included with the `heuristicsmineR` package in bupaR.

In [None]:
L_heur_1 %>% head(9)

The naive process map drawing reveals some weird behaviour between the activities `b` and `c`. There seems to be a loop between both activities even though they never occur more than once in each trace.

In [None]:
L_heur_1 %>%
    as_tibble() %>%
    mutate(activity_id = as.character(activity_id)) %>%
    mutate(activity_id = if_else(activity_id == "b" | activity_id == "c", "cb", activity_id)) %>%
    simple_eventlog(case_id = "CASE_concept_name", activity_id = "activity_id", timestamp = "timestamp") %>%
    process_map()

In [None]:
L_heur_1 %>% process_map()

#### Dependency Graphs

In [None]:
L_heur_1 %>% precedence_matrix(type = "absolute") %>%  plot()

Based on the precedence matrix, we can follows the formula for the dependency relation:

In [None]:
mat_pre <- L_heur_1 %>% precedence_matrix(type = "absolute") %>% as.matrix()
mat_pre

Since, we want to compute how often activities follow each other in either direction, we need the transposed matrix:

In [None]:
t_mat_pre <- t(mat_pre)
t_mat_pre

And, then it is basic math:

In [None]:
(mat_pre - t_mat_pre) / (mat_pre + t_mat_pre + 1)

Of course, this has already been implemented in the `heuristicsmineR` package. There are also some more details of the algorithm that deal with the detection of loops as well as making sure that all activities are connected to each other. Please consult the original [Heuristics Miner paper](https://is.ieis.tue.nl/staff/aweijters/WP334_FHMv3.pdf) and the documentation of `heuristicsmineR` for more details.

In [None]:
L_heur_1 %>% dependency_matrix(threshold = 0) %>% plot()

A dependency graph can be 

In [None]:
L_heur_1 %>% 
   dependency_matrix(threshold = 0.8) %>% 
   render_dependency_matrix()

Have a look at the parameters (via `?dependency_matrix`) and try to change some of them to see what happens.

In [None]:
L_heur_1 %>% 
   dependency_matrix(threshold = 0.9) %>% 
   render_dependency_matrix()

In [None]:
sepsis %>% precedence_matrix() %>% plot

In [None]:
sepsis %>% 
  dependency_matrix(threshold = 0.7) %>% 
  render_dependency_matrix()

In [None]:
sepsis %>% 
  dependency_matrix(threshold = 0.9) %>% 
  render_dependency_matrix()

### Causal nets

In [None]:
L_heur_1 %>% 
  causal_net(threshold = 0.8) %>%
  render_causal_net()

In [None]:
sepsis %>%
  act_unite(Release = c("Release A", "Release B", "Release C", "Release D", "Release E")) %>%
  causal_net(all_connected = TRUE) %>%
  render_causal_net()

In [None]:
example_log %>% 
  causal_net() %>% 
  render_causal_net()

#### Visualise / Convert as BPMN 🚧

In bupaR there is currently no support for BPMN visualizations. However, it is possible to convert the Causal net into a Petri net. For simple process models, the mapping between BPMN and Petri nets is easy to understand. Thus, we are using Petri nets here.

In [None]:
L_heur_1 %>% 
    causal_net() %>%
    as.petrinet() %>%
    petrinetR::render_PN()

In [None]:
example_log %>% 
    causal_net() %>%
    as.petrinet() %>%
    petrinetR::render_PN()

**TODO** we could use the discovered Petri net with PM4Py to do further processing 🚧