Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Print runnable scenario in case of test failure #204

Merged
merged 6 commits into from
Mar 18, 2024

Conversation

n-osborne
Copy link
Collaborator

This PR is the continuation of, and is based on, #202.
Please consider only the six last commits.

This PR makes the failure message print a runnable scenario with information about returned and expected values.

It uses the same mechanism as introduced in #202 for communicating with Qcheck.fail_reportf.

For printing the commands with their sut argument in the right place, only the generated show_cmd has been modified.

Printing the user-provided value for init_sut is quite straightforward, we just have to pass the information to the runtime.

Printing the expected value is a bit more complicated.
First, we look at the Gospel specifications to see if there is enough information to compute the expected returned value (some functions may be specified in a way that their contract doesn't include a computation of the returned value).
If there is, we wrap this value in a STM.res type and send it to the runtime through ortac_postond along all the other information.
It there is not, we send a dummy value, to inform the printer that the expected returned value is unknown.
A warning will be displayed to the user in the second case, but this will not affect whether the function is tested or not.

This PR only deals with normal behaviour, meaning that the excpetional branches in ortac_postond don't send an expected returned value to the runtime for now.
I believe that handling exceptional behaviour can be done in another PR (this is me trying to keep PRs short).

The ortac_runtime now handle this extra information to print a runnable scenario to the user in case of test failure.

For now, I've made the choice to print the function calls in an ignore and to print the returned value and the optional expected returned value in comments.
We've discussed elsewhere about placing the function calls in an assert, but we don't always have an expected returned value

If we introduce a bug in plugins/qcheck-stm/test/array.ml, we have now:

λ dune build @launchtests
File "plugins/qcheck-stm/test/dune.inc", line 49, characters 0-75:
49 | (rule
50 |  (alias launchtests)
51 |  (action
52 |   (run %{dep:array_stm_tests.exe} -v)))
random seed: 515405686
generated error fail pass / total     time test name
[✗]    1    0    1    0 / 1000     0.0s Array STM tests (generating)

--- Failure --------------------------------------------------------------------

Test Array STM tests failed (14 shrink steps):

   set sut (-2603392982468686691) '\190'
   to_list sut


+++ Messages ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Messages for test Array STM tests:

Gospel specification violation in function to_list

  File "array.mli", line 36, characters 12-26:
    l = t.contents
  
when executing the following sequence of operations:

  let sut = make 16 'a' in
  ignore (set sut (-2603392982468686691) '\190'); (* returned Error (Invalid_argument("index ou t of bounds")) *)
  ignore (to_list sut); (* returned ['a'; 'a'; 'a'; 'a'; 'a'; 'a'; 'a'; '\190'; 'a'; 'a'; 'a'; 'a'; 'a'; 'a'; 'a'; 'a']
                           expected ['a'; 'a'; 'a'; 'a'; 'a'; 'a'; 'a'; 'a'; 'a'; 'a'; 'a'; 'a' ; 'a'; 'a'; 'a'; 'a'] *)
  

================================================================================
failure (1 tests failed, 0 tests errored, ran 1 tests)
                                       
[1]

@n-osborne n-osborne force-pushed the print-runnable-scenario branch 3 times, most recently from 0d75a32 to efddee7 Compare February 23, 2024 09:33
@jmid
Copy link

jmid commented Feb 23, 2024

Nice work Nicolas 😃🎉 I particularly liked the (aligned) returned and expected results! ❤️

I suppose we could save a few characters (the shorter output the better IMO) by using

  • top-level let-forms

  • let _ = instead of ignore( ... ):

    let sut = make 16 'a'
    let _ = set sut (-2603392982468686691) '\190' (* returned Error (Invalid_argument("index ou t of bounds")) *)
    let _ = to_list sut (* returned ['a'; 'a'; 'a'; 'a'; 'a'; 'a'; 'a'; '\190'; 'a'; 'a'; 'a'; 'a'; 'a'; 'a'; 'a'; 'a']
                           expected ['a'; 'a'; 'a'; 'a'; 'a'; 'a'; 'a'; 'a'; 'a'; 'a'; 'a'; 'a' ; 'a'; 'a'; 'a'; 'a'] *)

As a runnable program, the example seems to place itself between two chairs.
It is both informative and runnable, yet requires manual changes:

  • it will stop with an exception on line 2 before reaching the model-mismatch in line 3.
  • the mismatch in line 3 needs, e.g., an assert to express the expectation as runnable code,
    otherwise it will just succeed silently.
  • as a very minor nit I prefixed it with open Array to directly copy-paste it into a top-level

Short-circuit booleans means the underlying QCheck property always stops at the first model-mismatch.
I think this means that generally

  • we can ignore all results up until the last line (annotating them as comments is however very informative!)
  • exception raising calls underway need some kind of try ... with or protect
  • only the last line needs an assert, e.g., by outputting a two-line thing:
    let r = to_list sut
    let _ = assert(r = ['a'; 'a'; 'a'; 'a'; 'a'; 'a'; 'a'; 'a'; 'a'; 'a'; 'a'; 'a' ; 'a'; 'a'; 'a'; 'a']

These are just my immediate thoughts, YMMV. 🤷
We can also leave further improvements for the summer... (hint hint 😃)

@jmid
Copy link

jmid commented Feb 23, 2024

Sorry if this didn't come across clearly above:
The output is already a nice improvement over the default qcheck-stm one! 👍

@n-osborne n-osborne force-pushed the print-runnable-scenario branch 2 times, most recently from a39c209 to 25a80af Compare February 26, 2024 13:56
@n-osborne
Copy link
Collaborator Author

Thank you for your comments :-)

I've modified the PR so that it includes an open of the tested module at the beginning of the ouptuted scenario.

Regarding the use of an assert for the failing function call, it is worth noting that we don't always have an expected value to show: specifications can be written in a way that we can know when something goes wrong but we can't compute the expected value.

That simply means that we can't guarantee to the user that we output a failing scenario for a failing test.

Copy link
Contributor

@shym shym left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I must say this took quite some time to review, maybe some comments could help.
I wrote my comments on specific lines except for:

  • commit “Print cmds with their sut argument”: the message could mention that it also adds open X at the beginning of the generated log
  • commit “Collect term containing a computation of the returned value”: I must admit I had to read the code to understand the commit message; I would suggest adding a small comment to the function returned_value (maybe it would be better named returned_value_specification?)

match (ty.ptyp_desc, args) with
| Ptyp_arrow (_, l, r), xs when Cfg.is_sut config l ->
let* fmt, pps = aux r xs in
ok ("%s" :: fmt, estring "sut" :: pps)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If fmt directly contained sut instead of %s, we could keep pp_args unchanged.
On the other hand, maybe your proposition is an altogether better approach, to build fmt and pp_args at the same time.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can indeed spare the %s and estring "sut"

| _, [] -> ok ([], [])
| _, _ ->
failwith
"shouldn't happen (list of arguments should be consistent with \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we shouldn’t create an error value ShouldNotHappen (in a separate PR if at all), as it appears in many already monadic contexts.

plugins/qcheck-stm/src/ir_of_gospel.ml Outdated Show resolved Hide resolved
plugins/qcheck-stm/src/ir_of_gospel.ml Show resolved Hide resolved
in
let ty_show = to_option @@ exp_of_core_type value.inst ty_ret in
match (ty_show, ret_val) with
| Some ty_show, Some ret_values ->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ret_values: this is not a list, shouldn't this be singular?

ok @@ esome dummy
| Some e -> ok @@ esome e
in
let wrap_check ?(normal = false) t e =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does normal really mean here? I see it set to true for (normal) postconditions and invariants, but anyway, I don’t have a clear understanding of the impact of setting it to true.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added some comments to explain.
Thanks!

This allow to output a runnable scenario in case of test failure.
The runnable scenario will be of the form:
```ocaml
open Module_under_test
let sut = init_sut
...
```
These descriptions will be used to print the expected behaviour when
possible.

Functions returning tuples are still not supported, but in the same way
that we've prepared the way by storing a list a returned values, we
store a list of list of computation of returned value.
Also adapt the runtime so that the commit build.
When a returned value can't be computed based on the information
contained in the Gospel specification, a warning is displayed to the
user at generation and a dummy one is placed to inform the message
printer that the expected returned value is unknown.
@n-osborne
Copy link
Collaborator Author

Green light to merge has been given in another chat.

@n-osborne n-osborne merged commit 83db7d2 into ocaml-gospel:main Mar 18, 2024
3 checks passed
@n-osborne n-osborne deleted the print-runnable-scenario branch March 18, 2024 09:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants