Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reliable extraction of counterexample traces #58

Closed
Tracked by #65
andrey-kuprianov opened this issue Aug 16, 2021 · 22 comments
Closed
Tracked by #65

Reliable extraction of counterexample traces #58

andrey-kuprianov opened this issue Aug 16, 2021 · 22 comments
Assignees
Labels
enhancement New (user-facing) feature or request

Comments

@andrey-kuprianov
Copy link
Contributor

As demonstrated by #56, our extraction of traces from counterexamples is incomplete.

The way it works now, is via an ad-hoc TLA+ parser, that is specific to the simple variant of TLA+ encountered in counterexamples.

Strategically, the requirement is that we support both Apalache and TLC in the near future, though for the nearest future (couple of weeks / a month) having only Apalache might be enough.

We need to weigh on a number of alternatives on how to proceed:

  1. Extend / harden moderator's current ad-hoc parser. The advantage is that it should support both Apalache and TLC. The disadvantage is that it is a bit fragile wrt. minor format changes. So if following this route the parser should be made as relaxed wrt. input as possible.
  2. Rework the modelator's TLA+ Jsonnet library: it was written for the older Apalache JSON format, and needs to be adapted to the new Apalache's JSON format. Then, hopefully, TLC output can be rerouted via Apalache? Not sure about this...
  3. Use some third-party solution, e.g. Tla2Json, which has been used previously by Dan (he can elaborate more on that). This tool is specific to TLC, and as far as I have quickly checked it doesn't support the output as produced by Apalache.

Any further ideas are welcome! We should keep in mind that we can ask Apalache team to implement some features if needed, but can't easily do so for TLC.

@andrey-kuprianov andrey-kuprianov added the enhancement New (user-facing) feature or request label Aug 16, 2021
@shonfeder
Copy link

I'd be strongly in favor of (2), for the following reasons:

  • IIUC, the updates to the Jsonnet library needs to happen in any case, if you want to make use of continued Apalache developments (e.g., the forthcoming server), and the new JSON format in Apalache is much more standard and robust than the previous one.
  • Apalache already has lots of code dedicated to parsing TLA+, which has undergone years of refinement and fixes. It seems like it would be wasteful to not take advantage of that.
  • There should be good mutualistic benefits to sharing the same code for this, and to committing to a common data exchange format between Modelator and Apalache. We should be more responsive to your needs than any other project, and your use will help us catch bugs and defects earlier.

@andrey-kuprianov
Copy link
Contributor Author

andrey-kuprianov commented Aug 16, 2021

Thank you, for the thoughtful comments, @shonfeder!

I agree, we should reuse as much of the common functionality as possible. And, as this case shows, doing some ad-hoc solutions (especially for TLA+) is not a good idea.

The good approach for now looks to me as follows:

  • adopt Apalache JSON format as Modelator's internal representation for TLA+. Whatever transformations we do for TLA+ (like auto-introducing history) should be on top of that format.
  • for counterexamples specifically, because they have a simpler representation than general TLA+, implement a transformation from the general TLA+ JSON format into a simplified version of JSON. This transformation can be done in Jsonnet, but not nexessararily so; we'll need to discuss the alternatives.
  • For handling TLC counterexamples, extract the TLA+ representation from its MC.out file, and pass this TLA+ to Apalache parse, in order to obtain its JSON representation. This step can be done by a simple script in whatever language.

The only complication I see on this path is that Apalache will complain about types; so the generated TLA+ will need to extend the original model... What I think would be nice to have on the Apalache side, btw., is to allow passing untyped TLA+ into the parse command, to allow reusing its TLA+ -> JSON transformation for untyped specs.

@shonfeder
Copy link

I think we should be able to adapt to untyped specs, but I wonder what the downside is of requiring typed specs on the modelator side?

iiuc, at stake in this decision is whether we push Apalache to accommodate a less-disciplined TLA or whether we push a requirement for greater discipline into the specs which Modelator can consume. I am generally in favor of pursuing the latter kinds of adjustments, since it ends up propagating more robust structure up through the ecosystem. If it's not too out of band here, could you give some pointers on why Modelator must support untyped specs?

@andrey-kuprianov
Copy link
Contributor Author

The main point is the increased user base. While we all agree Apalache is great, the user base of TLC is still much larger. So if we offer some services that work transparently both for Apalache and TLC, we will be able to win some users gradually, by allowing them to just use their specs as they are, not to reject them right away because their specs are untyped. And later, when they see the benefits of typing, they will want the types themselves.

Apalache does require types to do model checking, of course, no question. But may be there is a possibility to do parsing without types? This would also serve as a low-effort entrance point to Apalache for TLC users -- if they want only to transform to JSON, for example.

@shonfeder
Copy link

shonfeder commented Aug 16, 2021

My main questions on this account would be:

  • Are there any external projects that currently use Modelator with untyped TLA+ specs?
    • If not, I think we'll be wiser to focus on building up velocity with a robust and coherent tool chain, rather than trying to account for possible complications we might eventually encounter.
  • Is it currently possible to use Modelator on untouched TLA+ specs of a system? Or does it require customizing the specs to generate the needed traces?
    • If users need to touch up their specs anyhow, then I think asking them to add a few type annotations isn't a big lift, and it is actually to their benefit.

I think what I'd like to understand is whether support for parsing untyped TLA+ is something Modelator needs in order to satisfy its current and immanent usage, or if this is an "optimization" for an anticipated benefit which isn't actually pressing.

I'd be happy to sync up on this at some point when other involved stake holders are available.

To be clear: I'm not questioning here because it seems impossible or too difficult to accommodate in Apalache. Rather, I want to make sure we correctly evaluate the tradeoffs between principled practice and compromises for the sake of short-term convenience.

@andrey-kuprianov
Copy link
Contributor Author

It is actually a usage scenario from ourselves. All current instantiations of MBT use untyped specs, but with an ad-hoc, project-specific tooling. Upgrading those to use modelator as the tooling should be relatively easy; but requiring the specs to be typed introduces an additional step. It might be that we need to do this step anyway, but it is a substantial additional time investment upfront, compared to the gradual transition.

@shonfeder
Copy link

I'd be willing to take on the work of annotating the specs.

@danwt
Copy link
Contributor

danwt commented Aug 17, 2021

I agree with most of the points made by both sides above, somehow, but wanted to highlight this point made by @shonfeder

If users need to touch up their specs anyhow, then I think asking them to add a few type annotations isn't a big lift, and it is actually to their benefit.

I think it would be great if people used typed specs, and were able to introduce types to existing untyped specs very easily in all cases, however I don't think that's realistic. From my own experience so far I think that certain types of TLC-TLA specs that use very 'dynamic' implicit typing and polymorphism will be quite hard for people to retrofit types to.

@andrey-kuprianov
Copy link
Contributor Author

The outcome of today's discussion with @rnbguy and @danwt; it seems that the best way to move forward is this:

  • There is a TLC JSON format for counterexamples; see Json.tla, Json.java; this is now already part of the standard TLC distribution. This format is very simple, and although it doesn't support the full-fledged TLA+, it is very well suited for deserializing counterexamples, which can contain only ground formulas. As far as I remember, modelator is actually using exactly this format internally.
  • It would be great if Apalache, besides outputting the counterexamples in the Apalache JSON format (which is for general TLA+), also would output them in the TLC JSON format. This would also simplify interaction of other external tools, besides modelator, with Apalache.

The plan is this:

  1. Ranadeep will do some exploratory testing of the TLC JSON format, to see how it looks like on several examples, and document the format in the human-readable way.
  2. Someone (an Apalache dev, or Ranadeep/Dan) then extend the Apalache with the ability to output counterexamples in this format.

The benefits are multiple: we get rid of an ad-hoc parsing; standardise the output between model-checkers, and simplify the whole toolchain.

@shonfeder, @Kukovec , would be great to hear your opinion on that.

@shonfeder
Copy link

shonfeder commented Aug 17, 2021

My initial thought is that we do not want two different ways of serializing TLA+, especially if one is incomplete. Why couldn't we just convert the TLC counterexamples to Apalache's Json serialization format?

@shonfeder
Copy link

shonfeder commented Aug 17, 2021

Tho if we can reuse the linked tlatools code, then it would just be matter of some basic plumbing, and we wouldn't have to maintain two different serialization formats.

@andrey-kuprianov
Copy link
Contributor Author

The point of the simplified format is that is is directly deserializable into (a sequence of) structs corresponding to the state of a TLA+ model. This makes interactions with model checkers for the tools interested in counterexamples very straightforward, compared to importing from the full Apalache TLA+ JSON.

@shonfeder
Copy link

shonfeder commented Aug 17, 2021

Prior to final decision, could we interpose an evaluation here where we compare the TLC output of its counter examples with the output from Apalache, with an eye to how much of an obstacle the serialization would be? Apalache's JSON counter examples also give an array of states, and it might be good to compare the relative difficulty of ser/de with concrete examples.

This is also with an eye towards interop by RPC with Apalache's planned server mode. I expect it would be nice from a maintenance and design perspective to not have to maintain to parallel, but somewhat different encodings?

@andrey-kuprianov
Copy link
Contributor Author

yes, sure, @rnbguy is working now on that comparison (see point 1 above).

Ranadeep, would be great indeed to compare Apalache CE JSON vs TLC CE JSON side by side, as Shon suggests; could you please do it?

@shonfeder
Copy link

Thanks! Let me know how I can help.

@rnbguy
Copy link
Member

rnbguy commented Aug 19, 2021

Sorry for the late reply. Here is the comparison on an Example.tla.

---- MODULE Example ----
EXTENDS Integers

VARIABLES
    \* @type: [val: Int];
    x

Init == x = [val |-> 0]

Next == x' = [x EXCEPT !.val = x.val + 1]

Inv == x.val < 5
====

Apalache

Apalache's counterexample.json is basically a json serialization of counterexample.tla file.

{
  "name": "ApalacheIR",
  "version": "1.0",
  "description": "https://apalache.informal.systems/docs/adr/005adr-json.html",
  "modules": [
    {
      "kind": "TlaModule",
      "name": "counterexample",
      "declarations": [
        {
          "type": "Untyped",
          "kind": "TlaOperDecl",
          "name": "ConstInit",
          "formalParams": [
            
          ],
          "isRecursive": false,
          "body": {
            "type": "Untyped",
            "kind": "ValEx",
            "value": {
              "kind": "TlaBool",
              "value": true
            }
          }
        },
        {
          "type": "Untyped",
          "kind": "TlaOperDecl",
          "name": "State0",
          "formalParams": [
            
          ],
          "isRecursive": false,
          "body": {
            "type": "Untyped",
            "kind": "OperEx",
            "oper": "AND",
            "args": [
              {
                "type": "Untyped",
                "kind": "OperEx",
                "oper": "EQ",
                "args": [
                  {
                    "type": "Untyped",
                    "kind": "NameEx",
                    "name": "x"
                  },
                  {
                    "type": "[val: Int]",
                    "kind": "OperEx",
                    "oper": "RECORD",
                    "args": [
                      {
                        "type": "Str",
                        "kind": "ValEx",
                        "value": {
                          "kind": "TlaStr",
                          "value": "val"
                        }
                      },
                      {
                        "type": "Int",
                        "kind": "ValEx",
                        "value": {
                          "kind": "TlaInt",
                          "value": 0
                        }
                      }
                    ]
                  }
                ]
              }
            ]
          }
        },
        ...
      ]
    }
  ]
}

tla2tools

Using tla2tools, one has to add some extra operators to the invariant.

---- MODULE JsonTrace ----
EXTENDS TLC, TLCExt, Json, Example

JsonInv ==
    \/ RealInv:: Inv \* The original invariant to check 
    \/ Export:: /\ JsonSerialize("trace.json", Trace)
                /\ TLCSet("exit", TRUE)

(* execute
java -jar tla2tools.jar JsonTrace.tla
*)

====

Calling tla2tools for JsonInv invariant, the trace.json has a much more succinct output than Apalache's.

[
{"x":{"val":0}},
{"x":{"val":1}},
{"x":{"val":2}},
{"x":{"val":3}},
{"x":{"val":4}},
{"x":{"val":5}}
]

@rnbguy
Copy link
Member

rnbguy commented Aug 19, 2021

To add more to this, tla2tool's json serialization seems very simple.

  • Records are serialized to json objects
  • Sets, Sequences are serialized to json arrays

@danwt
Copy link
Contributor

danwt commented Aug 19, 2021

I found this comment made by @konnov "The way TLC serializes to Json is even a greater hack. It heavily relies on the evaluation order, which is super weird for a logical language." - it sounds like some degree of caution is needed if using the TLC json dump.

@andrey-kuprianov
Copy link
Contributor Author

@rnbguy thanks! What we also need is to test the TLC behaviour on various kinds of examples:

  • functions (how about functions where the domain is something weird? e.g. it is itself a function?)
  • sets (also nested, sets of functions, etc.)
  • big numbers (such that they do not fit into JSON number)
  • etc.

@shonfeder
Copy link

shonfeder commented Aug 19, 2021

Given the need to write special specs for TLC to get JSON serialization, it seems like it might indeed be handy if we could just run TLC counterexamples through Apalache serialization. Tho I don't readily know whether that would give consistently structured data between Apalache and TLC counter examples.

@andrey-kuprianov
Copy link
Contributor Author

For the time being we decided to rework the ad-hoc parser, and to generalise it a bit (see #67)-- this is the fastest way to move forward. The more general approach is to be addressed later.

@andrey-kuprianov
Copy link
Contributor Author

Closed via #67.

We will discuss the generalised solution with the Apalache team.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New (user-facing) feature or request
Projects
None yet
Development

No branches or pull requests

4 participants