<a href="https://colab.research.google.com/github/rohandawar/ProcessMining/blob/main/ProcessMining_1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In this notebook, I am trying to learn processing mining through ** Health Care** industry example

In [2]:
# Import Libs 
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Ipython
from IPython.display import display, Markdown

In [3]:
# Function Defination
def printmd(string):
  display(Markdown(string))

In [None]:
#https://gitlab.com/healthcare2/process-mining-tutorial
#https://medium.com/@c3_62722/process-mining-with-python-tutorial-a-healthcare-application-part-2-4cf57053421f

In [5]:
data_path = 'https://gitlab.com/healthcare2/process-mining-tutorial/-/raw/master/ArtificialPatientTreatment.csv'
events = pd.read_csv(data_path)
events.head()

Unnamed: 0,patient,action,org:resource,DateTime
0,patient 0,First consult,Dr. Anna,2017-01-02 11:40:11
1,patient 0,Blood test,Lab,2017-01-02 12:47:33
2,patient 0,Physical test,Nurse Jesse,2017-01-02 12:53:50
3,patient 0,Second consult,Dr. Anna,2017-01-02 16:21:06
4,patient 0,Surgery,Dr. Charlie,2017-01-05 13:23:09


In [6]:
# renaming the columns to have a better shape
events.columns = ['patient', 'action', 'resources', 'datetime']

# Change the event to date time 
events['datetime'] = pd.to_datetime(events['datetime'])
events.head()

Unnamed: 0,patient,action,resources,datetime
0,patient 0,First consult,Dr. Anna,2017-01-02 11:40:11
1,patient 0,Blood test,Lab,2017-01-02 12:47:33
2,patient 0,Physical test,Nurse Jesse,2017-01-02 12:53:50
3,patient 0,Second consult,Dr. Anna,2017-01-02 16:21:06
4,patient 0,Surgery,Dr. Charlie,2017-01-05 13:23:09


In [7]:
# Get the case start time
case_start_ends = events.pivot_table(index='patient', aggfunc={'datetime':['min', 'max']})
case_start_ends = case_start_ends.reset_index()
case_start_ends.columns = ['patient', 'casened', 'casestart']
case_start_ends.head()



Unnamed: 0,patient,casened,casestart
0,patient 0,2017-01-09 08:29:28,2017-01-02 11:40:11
1,patient 1,2017-01-06 16:49:21,2017-01-02 12:50:35
2,patient 10,2017-01-30 11:19:19,2017-01-17 14:13:17
3,patient 11,2017-02-02 10:13:13,2017-01-19 13:35:20
4,patient 12,2017-01-27 11:18:57,2017-01-20 11:43:38


In [8]:
# Merging the 2 dataframes to have the case start time & end time for every patient
events = events.merge(case_start_ends, on='patient')
events['relativetime'] = events['datetime'] - events['casestart']
events['action'] = events['action'].apply(lambda x : x.strip())
events.head()

Unnamed: 0,patient,action,resources,datetime,casened,casestart,relativetime
0,patient 0,First consult,Dr. Anna,2017-01-02 11:40:11,2017-01-09 08:29:28,2017-01-02 11:40:11,0 days 00:00:00
1,patient 0,Blood test,Lab,2017-01-02 12:47:33,2017-01-09 08:29:28,2017-01-02 11:40:11,0 days 01:07:22
2,patient 0,Physical test,Nurse Jesse,2017-01-02 12:53:50,2017-01-09 08:29:28,2017-01-02 11:40:11,0 days 01:13:39
3,patient 0,Second consult,Dr. Anna,2017-01-02 16:21:06,2017-01-09 08:29:28,2017-01-02 11:40:11,0 days 04:40:55
4,patient 0,Surgery,Dr. Charlie,2017-01-05 13:23:09,2017-01-09 08:29:28,2017-01-02 11:40:11,3 days 01:42:58


In [9]:
# Create a column for action sequence
delimiter = '___'

nameEventString = lambda x: delimiter.join(x)
nameEventString.__name__ = 'nameEventString'

numEvents = lambda x: len(x)
numEvents.__name__ = 'numEvents'

# caselogs = events.pivot