---
#### WARNING: This notebook only supports transformation of csv to xes files to within the specifications of our project.
#### In case you'd like to utilize it to convert different data, you most likely will have to edit the notebook as well as both csv_*.py modules located in src (mostly to change the import files)
---
Step 1: Declare imports, setup dictionaries to save data into, and open files

In [1]:
import src.csv_filehandlingmodule as fhm
from lxml import etree as lxmletree
import os

fileManager = fhm.FileManager()

brands = dict()
channels = dict()
devices = dict()
sims = dict()
tariffs = dict()
existing_case_ids = set()
events = set()
main_files = [fileManager.f0, fileManager.f1, fileManager.f2, fileManager.f3, fileManager.f4, fileManager.f5,
              fileManager.f6, fileManager.f7]

File 0 opening succeeded
File 1 opening succeeded
File 2 opening succeeded
File 3 opening succeeded
File 4 opening succeeded
File 5 opening succeeded
File 6 opening succeeded
File 7 opening succeeded
File brands opening succeeded
File channels opening succeeded
File devices opening succeeded
File sim opening succeeded
File tariffs opening succeeded


Step 2: Read files and add auxiliary data to events

In [2]:
# read all auxiliary data into the proper dictionaries
fileManager.read_aux_data(fileManager.fbrands, brands)
fileManager.read_aux_data(fileManager.fchannels, channels)
fileManager.read_aux_data(fileManager.fdevices, devices)
fileManager.read_aux_data(fileManager.fsim, sims)
fileManager.read_aux_data(fileManager.ftariffs, tariffs)

fcounter = 0
# read main files data
for file in main_files:
    fileManager.read_data(file, events, existing_case_ids, brands, channels, devices, sims, tariffs)
    txt = "File no. {} read"
    print(txt.format(fcounter))
    fcounter += 1
print("Files read successfully")

# close files
fileManager.close_all()
print("Closed all files")

226400
File no. 0 read
227471
File no. 1 read
227749
File no. 2 read
227370
File no. 3 read
227154
File no. 4 read
227288
File no. 5 read
226812
File no. 6 read
228291
File no. 7 read
Files read successfully
Closed all files


Step 3: Create XML structure as defined per XES

In [3]:
# do xml
print("Creating XML structure")
# root element (log)
root = lxmletree.Element("log", {"xes.version": "2.0", "xes.features": "",
                           "xmlns": "http://www.xes-standard.org/"})
# string "creator"
strCreator = lxmletree.SubElement(root, "string", {"key": "creator", "value": "PSE_HDA_Bohling_Ehlers"})
# extension "concept"
extConcept = lxmletree.SubElement(root, "extension", {"name": "Concept", "prefix": "concept",
                                                "uri": "http://www.xes-standard.org/concept.xesext"})
# extension "time"
extTime = lxmletree.SubElement(root, "extension", {"name": "Time", "prefix": "time",
                                             "uri": "http://www.xes-standard.org/time.xesext"})
# global event definitions (required fields)
globalEvent = lxmletree.SubElement(root, "global", {"scope": "event"})
strEventTraceID = lxmletree.SubElement(globalEvent, "string", {"key": "concept:instance", "value": "0"})
strEventName = lxmletree.SubElement(globalEvent, "string", {"key": "concept:name", "value": ""})
dateEventTimestamp = lxmletree.SubElement(globalEvent, "date", {"key": "time:timestamp",
                                                          "value": "yyyy-mm-dd HH:MM:SS"})
# classifier (composed key to identify events)
classifierEvent = lxmletree.SubElement(root, "classifier", {"name": "activity", "keys": "concept:instance concept:name"})
# trace element with all events, since XES requires a trace to put events in
# but we dont group by trace IDs in this program
trace = lxmletree.SubElement(root, "trace")
# now create all events
eventcounter = 0
for event in events:
    eventcounter += 1
    event.xmlify(trace)
print(eventcounter)
print("XML structure creation complete")

Creating XML structure
1405638
XML structure creation complete


Step 4: Create output file, write data into it, and close it
Be aware, this file is proportionally larger than the csv file by a factor of roughly 15!

In [4]:
print("Removing old output file if it exists and creating new one")
if os.path.isfile("data/xes/events.xes"):
    os.remove("data/xes/events.xes")
newfile = open("data/xes/events.xes", "wb")
print("Writing to output file")
newfile.write(lxmletree.tostring(root, pretty_print=True, xml_declaration=True, encoding='UTF-8'))
print("Closing output file")
newfile.close()
print("Output file closed")

Removing old output file if it exists and creating new one
Writing to output file
Closing output file
Output file closed


The finished file should be in the output directory /data/xes
Please be aware that NO data manipulation whatsoever takes place in this notebook.
The events are NOT grouped by their actual trace, but are put into a single "default" trace node.

