# Display results for _semi-deterministic_ benchmarks
The cells `[10]` and `[11]` were used to produce the left part of Table 2 in the CAV paper.

In [1]:
from ltlcross_wrapper import ResAnalyzer, gather_cumulative, gather_mins
import pandas as pd
pd.set_option("precision",0)
import spot
spot.setup()
from spot.jupyter import display_inline

For each benchmark, we list the cumulative numbers of states for each tool. The best value for each benchmark is highlighted by green background. The benchmarks consist of `random` formulas or formulas from `literature`. The suffix `_det` indicates that `ltl2tgba` created automata, that are already deterministic, `_sd` stands for semi-deterministic (but not deterministic).

We first distinguish the `_det` and `_sd` categories and then present the merged results which were used in the paper later in section [Merged results](#Merged-results).

The considered tools are:
 * `owl#best` : `ltl2ldgba` from [Owl library](https://owl.model.in.tum.de/); the `#best` indicates the _best of owl_ approach where we run 2 runs of `ltl2ldgba` and choose the better result.
 * `seminator-1-1` is the last presented version of Seminator.
 * `seminator#def` is the default setting of Seminator 2.

`yes` in the name of tools means that the Spot simplifications were applied on the results of the tools (were not disabled for `seminator`) and `no` the opposite. 

The list of tools that are displayed can be controlled in cell `[3]`. If you want to see numbers where Spot's simplifications were disabled, change the `yes` prefix to `no`. These are (always both `yes.` and `no.` versions):

 * Owl without the _best of Owl_ approach; you can replace `#best` with `#a` or `#s` where `#a` stands for `ltl2ldgba -a` and analogously for `#s`.
 * `seminator-1-2` which implemented the SCC-aware optimization.
 * Seminator 2 set to use only one pipe-line; you can replace `#def` with `#tgba`, `#tba`, or `#sba` to see results of `seminator --via-tgba` etc.
 
 You can display all results by changing cell `[3]` to
 ```python
 tool_set = None
 ```
 
Please note that all seminator configurations basically only run `ltl2tgba -D` and check the result for semi-determinism. Thus, their results are equal.

In [2]:
sd_benchmarks = {}
for name in ["literature_sd","literature_det","random_sd","random_det"]:
    b = ResAnalyzer(f"data/{name}.csv", cols=["states","time","acc","transitions"])
    sd_benchmarks[name] = b
    b.compute_best(["yes.owl#s","yes.owl#a"],"yes.owl#best")
    b.compute_best(["no.owl#s","no.owl#a"],"no.owl#best")

In [3]:
tool_set = ["yes.seminator#def","yes.owl#best", "no.owl#best"]

In [4]:
gather_cumulative(sd_benchmarks, tool_set=tool_set)

Unnamed: 0_level_0,literature_sd,literature_det,random_sd,random_det
tool,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
no.owl#best,306,786,3497,2838
yes.owl#best,272,706,3005,2528
yes.seminator#def,207,580,2562,2385


### Minimal automata

The following table shows for how many formulas each tool produces automaton that has the smallest number of states. The minimum ranges over the considered tools selected by `tool_set` from cell `[3]`. The number in the column **min hits** shows how many times the same size as the smallest automaton was achieved. The number in **unique min hits** counts only cases where the given tool is the only tool with such a small automaton.

In [5]:
gather_mins(sd_benchmarks, tool_set=tool_set)

Unnamed: 0_level_0,literature_sd,literature_sd,literature_det,literature_det,random_sd,random_sd,random_det,random_det
Unnamed: 0_level_1,unique min hits,min hits,unique min hits,min hits,unique min hits,min hits,unique min hits,min hits
tool,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2
no.owl#best,0,16,0,86,0,147,0,307
yes.owl#best,0,19,0,120,21,275,2,449
yes.seminator#def,30,49,32,152,225,465,51,498


### Cross-comparison
The cross-comparison for a benchmark shows, in a cell (`row`,`column`) in how many cases the tool in `row` produces automaton that is better thatn the one produced by `column`. The last columns (`V`) summs the numbers across rows, while the green highlighting fill a space that is proportional to how well the tool in `row` competed agains `column` (proportional across columns).

In [6]:
for n, b in sd_benchmarks.items():
    print(n)
    display(b.cross_compare(tool_set=tool_set))

literature_sd


Unnamed: 0,yes.seminator#def,yes.owl#best,no.owl#best,V
yes.seminator#def,,32.0,40.0,72
yes.owl#best,2.0,,38.0,40
no.owl#best,2.0,0.0,,2


literature_det


Unnamed: 0,yes.seminator#def,yes.owl#best,no.owl#best,V
yes.seminator#def,,32.0,66.0,98
yes.owl#best,0.0,,66.0,66
no.owl#best,0.0,0.0,,0


random_sd


Unnamed: 0,yes.seminator#def,yes.owl#best,no.owl#best,V
yes.seminator#def,,250.0,402.0,652
yes.owl#best,135.0,,423.0,558
no.owl#best,67.0,0.0,,67


random_det


Unnamed: 0,yes.seminator#def,yes.owl#best,no.owl#best,V
yes.seminator#def,,53.0,197.0,250
yes.owl#best,4.0,,195.0,199
no.owl#best,0.0,0.0,,0


### Running times and timeouts
The `#a` variant of Owl reached the 30s timeout in one case for random formulae. Otherwise, most of the execution times were below 1s for all tools.

In [7]:
for name, b in sd_benchmarks.items():
    print(name)
    display(b.get_error_counts())
    display(b.values.time.max().loc[tool_set])

literature_sd


Unnamed: 0,timeout,parse error,incorrect,crash,no output
no.owl#a,1,0,0,0,0
yes.owl#a,1,0,0,0,0


tool
yes.seminator#def    7e-02
yes.owl#best         2e+00
no.owl#best          1e+00
dtype: float64

literature_det


Unnamed: 0_level_0,timeout,parse error,incorrect,crash,no output
tool,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1


tool
yes.seminator#def    9e-02
yes.owl#best         3e+00
no.owl#best          2e-01
dtype: float64

random_sd


Unnamed: 0_level_0,timeout,parse error,incorrect,crash,no output
tool,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1


tool
yes.seminator#def    1e-01
yes.owl#best         2e-01
no.owl#best          2e-01
dtype: float64

random_det


Unnamed: 0_level_0,timeout,parse error,incorrect,crash,no output
tool,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1


tool
yes.seminator#def    8e-02
yes.owl#best         9e-02
no.owl#best          5e-02
dtype: float64

# Merged results
We now merge the random results from the 2 categories into 1

In [8]:
!cp data/random_det.csv data/random_sd_merged.csv
!tail -n +2 data/random_sd.csv >> data/random_sd_merged.csv
!wc data/random_sd_merged.csv
!cp data/literature_det.csv data/literature_sd_merged.csv
!tail -n +2 data/literature_sd.csv >> data/literature_sd_merged.csv
!wc data/literature_sd_merged.csv

  16001 1224283 7507354 data/random_sd_merged.csv
   3217  309869 1936151 data/literature_sd_merged.csv


In [9]:
m_benchmarks = {}
for name in ["literature_sd_merged","random_sd_merged"]:
    b = ResAnalyzer(f"data/{name}.csv", cols=["states","time","acc","transitions"])
    m_benchmarks[name] = b
    b.compute_best(["yes.owl#s","yes.owl#a"],"yes.owl#best")
    b.compute_best(["no.owl#s","no.owl#a"],"no.owl#best")

In [10]:
gather_cumulative(m_benchmarks, tool_set=tool_set)

Unnamed: 0_level_0,literature_sd_merged,random_sd_merged
tool,Unnamed: 1_level_1,Unnamed: 2_level_1
no.owl#best,1092,6335
yes.owl#best,978,5533
yes.seminator#def,787,4947


In [11]:
gather_mins(m_benchmarks, tool_set=tool_set)

Unnamed: 0_level_0,literature_sd_merged,literature_sd_merged,random_sd_merged,random_sd_merged
Unnamed: 0_level_1,unique min hits,min hits,unique min hits,min hits
tool,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
no.owl#best,0,102,0,454
yes.owl#best,0,139,23,724
yes.seminator#def,62,201,276,963


In [12]:
b = m_benchmarks["random_sd_merged"]

# Scatter plots
We compare both Owl with and without Spot's simplifications to Seminator (ltl2tgba). We did not include these graphs in the paper.

In [13]:
b.bokeh_scatter_plot("yes.owl#best","yes.seminator#def", include_equal=True)

In [14]:
b.bokeh_scatter_plot("no.owl#best","yes.seminator#def", include_equal=True)

In [15]:
b.cross_compare(tool_set)

Unnamed: 0,yes.seminator#def,yes.owl#best,no.owl#best,V
yes.seminator#def,,303.0,599.0,902
yes.owl#best,139.0,,618.0,757
no.owl#best,67.0,0.0,,67
