# Trace Graphs Theory
June 1, 2021 - jgil@eso.org

## Definitions

An alphabet $\Sigma$ is a finite set of symbols. 

A **trace** $T \in \Sigma^*$ is a finite sequence of symbols $e \in \Sigma$ and can b written as $T=(e_i)_{i=N}$, where the length of $T$ is $|T|=N$. An empty trace where $|T|=0$ is denoted as $\varepsilon$. A **log** is a set of traces.

**Symbols in a trace**: The set of symbols that appears in a trace $T$ are denoted by  $\|T\|$, where $\|T\| \in \Sigma^*$. We say that two traces are disjoint if $\|X\| \cap \|Y\| = \emptyset$

Note: In formal languages, a trace is a *word* and a log is a *language*. See Ref 1.

## Trace operations

**Concatenation**: The concatenation of $T(s_i)_{i=N}$ after $S=(s_i)_{i=M}$ is simply append $S$ after $T$:  $U = ST = (s_i)_{i=M}(t_i)_{i=N}$. The n-concatenation of a sequence $S$ with itself is denoted as $S^n=SS...SS$, $n \ge 0$. The 0-concatenation of any trace is $T^0=\varepsilon$.

**Restriction to symbols**: If $T$ is a trace and $P \in \Sigma^*$,  $T \sqcap P$ is the restriction of trace $T$ to symbols $P$.
From notation it follows that if $X$ is a trace, $T \sqcap \|X\|$ is the restriction of trace $T$ to symbols in trace $X$.

Examples:

$\begin{split}
T=(bc)^n &, P=\{c, b\} &, T \sqcap P = (bc)^n \\
T=(bc)^n &, X=bc &, T \sqcap \|X\| = (bc)^n \\
T=cbd &, X=bc &, T \sqcap \|X\| = cb \\
T=bdc &, X=bce &, T \sqcap \|X\| = bc \\
T=bc &, X=de &, T \sqcap \|X\| =  \varepsilon \\
\end{split}$

**Remotion**: $T / Y$ is the **remotion** of symbols in trace $Y$ from the trace $T$, it removes $Y$ from $T$.

Examples:

$\begin{split}
T=aBCde &, Y=BC &, T \setminus Y=ade \\
T=aBCde &, Y=BCxyz &, T \setminus Y=ade \\
T=(abc)^n &, Y=abc &, T \setminus Y=\varepsilon \\
\end{split}$

**Mixes of traces**: The mixes of a set of traces is the set of all finite random concatenation of arbitrary repetitions in each trace. For $ab$ and $cd$, the mixes are all the permutations of $\{a, b, c, d\}$.

**Ordered mixes of traces**: The ordered mixes of a set of $n$ traces is their mixes where the original order in each trace is preserved. It is denoted as $T_1 * ... * T_n $.

Examples of ordered mixes with $T_1=ade$, $T_2=BC$

$\begin{split}
\varepsilon& \in T_1 * T_2 \\
BC         & \in T_1 * T_2 \\
CB         & \notin T_1 * T_2 \\
BCBC         & \in T_1 * T_2 \\
adeBC      & \in T_1 * T_2 \\
CBeda      & \notin T_1 * T_2 \\
aBdCeadeBC & \in T_1 * T_2
\end{split}$

**Properties of ordered mixes**

1. $\varepsilon * T = \{\varepsilon, T, T^2, T^3, ... \}$
1. $T \in \varepsilon * T$
1. $X * Y = Y * X$

**Trace generation** If $T$, $X$, $Y$ are traces, we say that $\{X, Y\}$ generates $T$ if $X*Y = \varepsilon*T$.

If $X$, $Y$ are disjoint and generates $T$, then the following properties appears: 

1. $\{\|X\|, \|Y\|\}$ is a partition of $\|T\|$
1. $\exists n\ge0$ such that $X^n = T \sqcap \|X\| = T \setminus Y $
1. $\exists m\ge0$ such that $Y^m = T \sqcap \|Y\| = T \setminus X $
1. $ (T \sqcap \|X\|) * (T \sqcap \|Y\|) = (T \setminus X) * (T \setminus Y) = X*Y = \varepsilon*T$

## Sequences in a Trace

A **sequence** $S=(e_i)_{i=N}$ is a trace where $|S|>0$ and all its symbols are different, $e_i \neq e_j$, $i \neq j$. A n-sequence $S$ has $|S|=n$.

$S$ **is a sequence in a trace** $T$ if it is contained in the trace: $\exists m \ge 0$ such that $T \sqcap S = S^m$.

For example, if $T=abcXYabc$, the following are sequences in $T$: $abc$, $ab$, $ac$, $XY$, $X$, $Y$. Note that all singleton traces $a, b, c, X, Y$ are also sequences in $T$.

A sequence in $T$ is **maximal** if it is not a subsequence of other sequences in $T$. 

The set of **maximal sequences** of a trace is denoted as $\mathcal{S}(T)$.

The **base** of $T$, denoted as $\mathcal{B}(T)$ is the set of maximal sequences of $T$ where its elements of $S_i$ induces a partition of $T$: the sequences $S_i \neq S_j, \forall i \neq j$ and $\|T\| = \bigcup_{S_i \in \mathcal{S}(T)} {\|S_i\|}$

If all maximal sequences of $S(T)$ are disjoint then $T$ is a **serial trace**. Otherwise, $T$ is a **concurrent trace**. This implies that if $T$ is serial, then its base are its maximal sequences: $\mathcal{B}(T) = \mathcal{S}(T)$

### Maximal sequences example

In the examples below, all $T$ are serial.

1. No sequences
    * $T=\varepsilon$, $\mathcal{S}(T)=\mathcal{B}(T)= \{ \emptyset \}$
1. Singleton
    * $T=a$, $\mathcal{S}(T)=\mathcal{B}(T)= \{a\}$
    * $T=aaa$, $\mathcal{S}(T)=\mathcal{B}(T)= \{a\}$
1. Repetitions
    * $T=ab$, $\mathcal{S}(T)=\mathcal{B}(T)= \{ab\}$
    * $T=ababX$, $\mathcal{S}(T)=\mathcal{B}(T)= \{ab, X\}$
    * $T=abcXYabc$, $\mathcal{S}(T)=\mathcal{B}(T)= \{abc, XY\}$
1. Destructive sequences:
    * $T=abba$, $\mathcal{S}(T)=\mathcal{B}(T)= \{a, b\}$

The following trace is concurrent, note that the base is not their maximal sequences:
* $T=aXYbaYXb$
* $\mathcal{S}(T)= \{aXb, aYb\}$
* $\mathcal{B}(T)= \{ab, X, Y\}$

**Property / Lemma** A trace can be generated by mixing its maximal sequences: if $S_i \in \mathcal{S}(T)$, then $S_1 * ...* S_i * ... * S_N = \varepsilon * T$

## Serializarion of a concurrent trace

.. this is a technique, I'm not sure if it can be applied to any kind of concurrent trace ...

## Sequences in a Log

## En proceso de escritura...

- Grafo de pares
	- Un par es una secuencia de largo dos
	- Suma de grafos de pares usando matriz de adyacencia
	- Secuencias desde grafos de pares 
	- Base de la traza utilizando cliques 
	- Serializacion en traza equivalente
	- Base de traza equivalente
- Chequeo de conformancia
	- Con la base de la traza
	- Con la base de la traza equivalente
	- Smoothing
		- Secuencias recortadas
		- Inferir concurrencia
		- Simplificación de conjuntos contiguos en la base
- Experimentos
	- Logs sintéticos
	- Datos de observatorio
- trabajos futuros
	- Grafos antecesores y grafos sucesores
		- Usar la base de la traza para simplificar esos grafos
	- Incluir cada nodo de frecuencia uno
	- Asignar probabilidades a cada nodo
	- Inferir un modelo de markov 
    
Ideas principales

- Una secuencia es una traza no vacía con todos sus elementos distintos entre sí
- Las secuencias de una traza pueden generar esa traza
- La secuencias disjuntos de una traza son una base
- Una traza es serial si sus secuencias maximales son una base. De lo contrario es concurrente.
- Lo anterior aplica a un Log de trazas
- La base de una base es el mismo conjunto. Por lo tanto es una invariante.
- Forma canónica de un Log
- Todas estas propiedades se mantienen en un grafo de Pares.
- En un grafo serial no se necesitan cliques, basta ordenar por grados
-  así  escribir  que 

# References

1. https://en.wikipedia.org/wiki/Formal_language