# Trace Graphs Theory
June 1, 2021 - jgil@eso.org

## Definitions

An alphabet $\Sigma$ is a finite set of symbols. 

A **trace** $T$ is a finite sequence of symbols $e \in \Sigma$ and can b written as $T=(e_i)_{i=N}$, where the length of $T$ is $|T|=N$. An empty trace where $|T|=0$ is denoted as $\varepsilon$. A **log** is a set of traces.

Note: In formal languages, a trace is a *word* and a log is a *language*. See Ref 1.

## Trace operations
### Concatenation

The concatenation of $T(s_i)_{i=N}$ after $S=(s_i)_{i=M}$ is simply append $S$ after $T$:  $U = ST = (s_i)_{i=M}(t_i)_{i=N}$.

The n-concatenation of a sequence $S$ with itself is denoted as $S^n=SS...SS$, $n \ge 0$

### Remotion

$T / S$ is a **remotion** of traces $S$ in $T$, is the trace $T$ with all symbols $s \in S$ removed.

Examples:

1. $T=aBCde$, $S=BC$, $T/S=ade$
1. $T=aBCde$, $S=BCxyz$, $T/S=ade$
1. $T=abcabcabc$, $S=abc$, $T/S=\varepsilon$

### Mixing of traces

A **mix** of two traces $T_1 * T_2$ is the set of all random concatenation of arbitrary repetitions and preserving the original order in each trace.

Examples:

For $T_1=ade$, $T_2=BC$, all the following are valid mixings:

$\begin{split}
\varepsilon& \in T_1 * T_2 \\
BC         & \in T_1 * T_2 \\
adeBC      & \in T_1 * T_2 \\
aBdCe      & \in T_1 * T_2 \\
adeBCBCBC  & \in T_1 * T_2 \\
BCadeade   & \in T_1 * T_2 \\
aBdCeadeBC & \in T_1 * T_2
\end{split}$

**Property** For disjoint traces $S$ and $T$ (they has no common symbols),  $\forall n \gt 0$ and $\forall U \in (T * S)$ we have $U / S = S^n$


## Sequences in a Trace

A **sequence** $S=(e_i)_{i=N}$ is a trace where $|S|>0$ and all its symbols are different, $e_i \neq e_j$, $i \neq j$

$S$ **is a sequence in a trace** $T$ if it is contained in the trace: $\exists m \ge 0$ such that $T / S = S^m$.

For example, if $T=abcXYabc$, the following are all sequences in $T$: $abc$, $ab$, $ac$, $XY$, $X$, $Y$

A sequence in $T$ is **maximal** if it cannot be obtained by mixing other sequences. The set of maximal sequences of a trace is denoted as $\mathcal{S}(T)$.

If all sequences in $S(T)$ are disjoint (no symbols in common), $T$ is a **serial trace**. Otherewise, $T$ is a **concurrent trace**.

### Maximal sequences example

In the examples below, all $T$ are serial.

1. No sequences
  * $T=\varepsilon$, $\mathcal{S}(T)= \{ \emptyset \}$
1. Singleton
  * $T=a$, $\mathcal{S}(T)= \{a\}$
  * $T=aaa$, $\mathcal{S}(T)= \{a\}$
1. Repetitions
  * $T=ab$, $\mathcal{S}(T)= \{ab\}$
  * $T=ababX$, $\mathcal{S}(T)= \{ab, X\}$
  * $T=abcXYabc$, $\mathcal{S}(T)= \{abc, XY\}$
1. Destructive sequences:
  * $T=abba$, $\mathcal{S}(T)= \{\emptyset\}$

The following traces are concurrent
* $T=aXYbaYXb$, $\mathcal{S}(T)= \{aXb, aYb\}$

**Property / Lemma** A trace can be generated by mixing its sequences

## Sequences in a Log

## En proceso de escritura...

- Grafo de pares
	- Un par es una secuencia de largo dos
	- Suma de grafos de pares usando matriz de adyacencia
	- Secuencias desde grafos de pares 
	- Base de la traza utilizando cliques 
	- Serializacion en traza equivalente
	- Base de traza equivalente
- Chequeo de conformancia
	- Con la base de la traza
	- Con la base de la traza equivalente
	- Smoothing
		- Secuencias recortadas
		- Inferir concurrencia
		- Simplificación de conjuntos contiguos en la base
- Experimentos
	- Logs sintéticos
	- Datos de observatorio
- trabajos futuros
	- Grafos antecesores y grafos sucesores
		- Usar la base de la traza para simplificar esos grafos
	- Incluir cada nodo de frecuencia uno
	- Asignar probabilidades a cada nodo
	- Inferir un modelo de markov 

# References

1. https://en.wikipedia.org/wiki/Formal_language