## Convert XES format data into CSV for efficient import

Original event log files were downloaded in XES format. PM4Py importer is extremely slow when handling large event logs. 

This notebook reads in event logs in XES format, converts them into a dataframe, and exports CSV format files. No other operations are performed.

In [1]:
from os.path import join as path_join

import pandas as pd
import pm4py

import ordinor.constants as const
from ordinor.utils.converter import convert_to_dataframe

In [2]:
LOGNAME = 'wabo'

fn = path_join('./data/DATA_RAW', f'{LOGNAME}.xes.gz')

log = pm4py.read_xes(fn)
log = convert_to_dataframe(log)
log

parsing log, completed traces ::   0%|          | 0/1434 [00:00<?, ?it/s]

Unnamed: 0,org:group,concept:instance,org:resource,concept:name,time:timestamp,lifecycle:transition,case:startdate,case:responsible,case:enddate_planned,case:department,case:group,case:concept:name,case:deadline,case:channel,case:enddate
0,Group 1,task-42933,Resource21,Confirmation of receipt,2011-10-11 13:45:40.276000+00:00,complete,2011-10-11 13:42:22.688000+00:00,Resource21,2011-12-06 13:41:31.788000+00:00,General,Group 2,case-10011,2011-12-06 13:41:31.788000+00:00,Internet,NaT
1,Group 4,task-42935,Resource10,T02 Check confirmation of receipt,2011-10-12 08:26:25.398000+00:00,complete,2011-10-11 13:42:22.688000+00:00,Resource21,2011-12-06 13:41:31.788000+00:00,General,Group 2,case-10011,2011-12-06 13:41:31.788000+00:00,Internet,NaT
2,Group 1,task-42957,Resource21,T03 Adjust confirmation of receipt,2011-11-24 15:36:51.302000+00:00,complete,2011-10-11 13:42:22.688000+00:00,Resource21,2011-12-06 13:41:31.788000+00:00,General,Group 2,case-10011,2011-12-06 13:41:31.788000+00:00,Internet,NaT
3,Group 4,task-47958,Resource21,T02 Check confirmation of receipt,2011-11-24 15:37:16.553000+00:00,complete,2011-10-11 13:42:22.688000+00:00,Resource21,2011-12-06 13:41:31.788000+00:00,General,Group 2,case-10011,2011-12-06 13:41:31.788000+00:00,Internet,NaT
4,EMPTY,task-43021,Resource30,Confirmation of receipt,2011-10-18 13:46:39.679000+00:00,complete,2011-10-11 01:06:40.020000+00:00,Resource04,2011-12-06 01:06:40.010000+00:00,General,Group 5,case-10017,2011-12-06 01:06:40+00:00,Internet,2011-10-18 13:56:55.943000+00:00
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
8572,Group 4,task-43560,Resource06,T02 Check confirmation of receipt,2011-10-18 09:04:48.732000+00:00,complete,2011-10-06 01:06:40.020000+00:00,Resource06,2011-12-01 01:06:40.010000+00:00,General,Group 5,case-9997,2011-12-01 01:06:40+00:00,Internet,2011-10-20 14:19:44.448000+00:00
8573,Group 3,task-43562,Resource06,T04 Determine confirmation of receipt,2011-10-18 09:05:12.359000+00:00,complete,2011-10-06 01:06:40.020000+00:00,Resource06,2011-12-01 01:06:40.010000+00:00,General,Group 5,case-9997,2011-12-01 01:06:40+00:00,Internet,2011-10-20 14:19:44.448000+00:00
8574,Group 2,task-43563,Resource06,T05 Print and send confirmation of receipt,2011-10-18 09:05:30.196000+00:00,complete,2011-10-06 01:06:40.020000+00:00,Resource06,2011-12-01 01:06:40.010000+00:00,General,Group 5,case-9997,2011-12-01 01:06:40+00:00,Internet,2011-10-20 14:19:44.448000+00:00
8575,Group 1,task-43561,Resource06,T06 Determine necessity of stop advice,2011-10-18 09:06:01.468000+00:00,complete,2011-10-06 01:06:40.020000+00:00,Resource06,2011-12-01 01:06:40.010000+00:00,General,Group 5,case-9997,2011-12-01 01:06:40+00:00,Internet,2011-10-20 14:19:44.448000+00:00


In [3]:
fnout = path_join('./data/DATA_csv', f'{LOGNAME}.csv')

log.to_csv(fnout)