# Importing your First Event Log
*This is a compressed tutorial, based on https://pm4py.fit.fraunhofer.de/getting-started-page*

Process mining exploits Event Logs to generate knowledge of a process.
A wide variety of information systems, e.g., SAP, ORACLE, SalesForce, etc., allow us to extract, in one way or the other,
event logs similar to the example event logs.
All the examples we show in this notebook and all algorithms implemented in pm4py assume that we have already extracted
the event data into an appropriate event log format.
Hence, the core of pm4py does not support any data extraction features.

n order to support interoperability between different process mining tools and libraries, two standard data formats are
used to capture event logs, i.e., Comma Separated Value (CSV) files and eXtensible Event Stream (XES) files.
CSV files resemble the example tables shown in the previous section, i.e., Table 1 and Table 2. Each line in such a file
describes an event that occurred. The columns represent the same type of data, as shown in the examples, e.g., the case
for which the event occurred, the activity, the timestamp, the resource executing the activity, etc.
The XES file format is an XML-based format that allows us to describe process behavior.
We will not go into specific details w.r.t. the format of XES files, i.e., we refer to http://xes-standard.org/ for an
overview.

this tutorial, we will use an oftenly used dummy example event log to explain the basic process mining operations.
The process that we are considering is a simplified process related to customer complaint handling, i.e., taken from the
book of van der Aalst (https://www.springer.com/de/book/9783662498507). The process, and the event data we are going to
use, looks as follows.

![Running example BPMN-based process model describing the behavior of the simple process that we use in this tutorial](files/bpmn_running_example.png)

Let’s get started!
We have prepared a small sample event log, containing behavior similar equal to the process model in Figure 3.
You can find the sample event log [here](running-example.csv).

We are going to load the event data, and, we are going to count how many cases are present in the event log, as well as
the number of events. Note that, for all this, we are effectively using a third-party library called pandas.
We do so because pandas is the de-facto standard of loading/manipulating csv-based data.
Hence, any process mining algorithm implemented in PM4Py, using an event log as an input, can work directly with a
pandas file!


In [3]:
import pandas
log = pandas.read_csv('running-example.csv', sep=';')
log

Unnamed: 0,case:concept:name,concept:name,time:timestamp,costs,org:resource
0,3,register request,2010-12-30 14:32:00+01:00,50,Pete
1,3,examine casually,2010-12-30 15:06:00+01:00,400,Mike
2,3,check ticket,2010-12-30 16:34:00+01:00,100,Ellen
3,3,decide,2011-01-06 09:18:00+01:00,200,Sara
4,3,reinitiate request,2011-01-06 12:18:00+01:00,200,Sara
5,3,examine thoroughly,2011-01-06 13:06:00+01:00,400,Sean
6,3,check ticket,2011-01-08 11:43:00+01:00,100,Pete
7,3,decide,2011-01-09 09:55:00+01:00,200,Sara
8,3,pay compensation,2011-01-15 10:45:00+01:00,200,Ellen
9,2,register request,2010-12-30 11:32:00+01:00,50,Mike


Let's inspect the small event log.
The first line (i.e., row) specifies the name of each column (i.e., event attribute).
Observe that, in the data table described by the file, we have 5 columns, being: *case:concept:name*, *concept:name*,
*time:timestamp*, *costs* and *org:resource*.
The first column represents the *case identifier*, i.e., allowing us to identify what activity has been logged in the
context of what instance of the process.
The second column (*concept:name*) shows the activity that has been performed.
The third column shows at what point in time the activity was recorded (*time:timestamp*).
In this example data, additional information is present as well.
In this case, the fourth column tracks the costs of the activity (*costs* attribute), whereas the fifth row tracks what
resource has performed the activity (*org:resource*).

Observe that, row 2-10 show the events that have been recorded for the process identified by case identifier 3.
We observe that first a register request activity was performed, followed by the examine casually, check ticket, decide,
reinitiate request, examine thoroughly, check ticket,decide, and finally, pay compensation activities.
Note that, in this case, the recorded process instance behaves as described by the model depicted in Figure 3.
