Group examples by kind and result #58

mristin · 2021-02-01T19:00:30Z

This patch groups the examples by kind (PEP 316, icontract) and outcome
(expected success, expected failure).

Additionally, the patch introduces a script to run the functional tests
and verify that the captured output and the exit code of the check
command does not deviate from the expected output and the exit code.

mristin · 2021-02-01T19:05:42Z

@pschanely could you please have a look and call run_functional_tests.py on your machine? Some of the examples with the expected "success" outcome (no error detected) return a failure, while a single failure example (with icontract) seem not to report an error.

In case the functionality is missing so that some examples fail, I'd suggest to actually remove the examples (store them in work_in_progress directory?) and have at least a subset for which we know that the functional tests pass. You can then keep adding the examples as the functionality is supported so you always know where the tool stands.

As a side note, the functional tests are probably too slow to run on every commit, but you can run them nightly or before a release.

mristin · 2021-02-01T19:25:00Z

Let's first see what to do with the current tests. I'll add more tests involving icontract afterwards.

pschanely · 2021-02-01T20:59:51Z

This is so cool! I probably should clarify that the original idea with examples/ was just to show various things you might do with CrossHair, so not all of them are expected to pass.

In addition, some are expected to fail but only with suitably long timeouts. (tic_tac_toe.py for example)

could you please have a look and call run_functional_tests.py on your machine?

This is what I get presently. Maybe we can do something about the varying path slashes. All of the PEP316 failures in that gist are supposed to fail I believe.

Some of the examples with the expected "success" outcome (no error detected) return a failure, while a single failure example (with icontract) seem not to report an error.

Indeed, some of the PEP316 ones should fail. (do your failures match mine?)

And, as for the icontract tests, it looks to me like the one in fail/ should fail and does fail?

In case the functionality is missing so that some examples fail, I'd suggest to actually remove the examples (store them in work_in_progress directory?) and have at least a subset for which we know that the functional tests pass. You can then keep adding the examples as the functionality is supported so you always know where the tool stands.

Maybe we start with those in fail/ and I can do a pass afterwards and make some tweaks. (either remove examples or re-categorize them or something) Some example files like showcase.py are better to split up under this model, as some are expected to fail and some pass.

As a side note, the functional tests are probably too slow to run on every commit, but you can run them nightly or before a release.

I might like to try and get them into the test suite eventually (it's already ~8min on my machine), but maybe we don't start with that right away.

mristin · 2021-02-01T21:13:48Z

@pschanely thanks for the clarification :) I'll split the examples on pass/fail_fast/fail_slow. Could you suggest a better naming? Pass/fail insinuates some kind of testing. What about true_negative (instead of pass), true_positive_fast/_slow (instead of fail)?

Later you might also want to add false positive/negative, and also make sure these are appropriately documented, so in light of that this naming makes sense to me.

pschanely · 2021-02-01T21:35:37Z

Someone on the internet suggested "false alarm" (for false positive) and "missed bugs" (for false negative) which I use in the github issue tags. I wonder whether some similar intuitive naming could apply for the true cases? I would like people to very quickly understand what's in there. Open to ideas, and true_positive/true_negative is also fine by me if we don't have a better idea.

Maybe skip the fast/slow distinction for now - I'll remove, optimize, or tweak settings to keep things simple.

mristin · 2021-02-02T07:56:52Z

@pschanely can you come up with a couple of examples in which side effects in the code are safe to be analysed? (Example: shutil call in a function called from a condition? Shutil call in the body of a function? Shutil call in a function called from the body of a function?)

Maybe this merits a separate kind of functional tests? Or we just pass in TMP_DIR as environment variable? How can we verify tbhat shutil was not executed?

You can write examples here and I'll incorporate them in the directories, or just push a commit, whatever is easier for you.

pschanely · 2021-02-02T15:46:16Z

Ah! Your comments highlight how much we need better documentation here!

You don't want to perform side effects anywhere: the conditions, the body, or even functions called by the body. (CrossHair may "short circuit" subroutine calls, but often doesn't)

It's best to imagine it just like a hypothesis test: your code will run and take whatever actions that code takes, given arbitrary inputs.

mristin · 2021-02-02T17:16:57Z

Sorry, my bad. I actually meant this shortcircuiting:

(CrossHair may "short circuit" subroutine calls, but often doesn't)

It's best to imagine it just like a hypothesis test: your code will run and take whatever actions that code takes, given arbitrary inputs.

(Follow this thread on the issue #61.)
(~~Let's discuss this issue in a separate issue?~~ I'll finish the examples now so that they can be merged in.)

pschanely · 2021-02-02T17:46:32Z

Let's discuss this issue in a separate issue? I'll finish the examples now so that they can be merged in

Yes, great. Just LMK when you'd like me to take another look!

This patch groups the examples by kind (PEP 316, icontract) and outcome (expected success, expected failure). Additionally, the patch introduces a script to run the functional tests and verify that the captured output and the exit code of the `check` command does not deviate from the expected output and the exit code.

mristin · 2021-02-03T04:16:51Z

@pschanely the pull request is now review-ready.

pschanely · 2021-02-03T14:43:06Z

crosshair/run_functional_tests.py

+                        else:
+                            return -1
+
+                    expected_stdout = expected_stdout_pth.read_text()


Can we normalize the path slashes in the output?
Or cut off the leading directories, e.g.?:

expected = re.compile(r'^.*[/\\]([_\w]+\.py)').sub(r'\1', expected)

Merging so you can get that formatting PR through and we can iterate on this separately. (I'll want to tweak some of the examples myself too)

Can we normalize the path slashes in the output?

Silly me, thanks for spotting it! .as_posix() does the trick. Let me create a small PR to fix this.

Sorry, as_posix was a dummy idea. I implemented your suggestion with regular expression, see #64.

mristin force-pushed the Group-examples-by-kind-and-result branch from f420d1d to 704c5cb Compare February 1, 2021 19:13

mristin mentioned this pull request Feb 2, 2021

Document what is executed and how #61

Open

mristin force-pushed the Group-examples-by-kind-and-result branch from d0eabf2 to 1a4f101 Compare February 3, 2021 04:15

mristin mentioned this pull request Feb 3, 2021

Write a pre-commit script #57

Closed

pschanely reviewed Feb 3, 2021

View reviewed changes

pschanely merged commit 9f6cea3 into pschanely:master Feb 3, 2021

mristin deleted the Group-examples-by-kind-and-result branch February 3, 2021 15:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Group examples by kind and result #58

Group examples by kind and result #58

mristin commented Feb 1, 2021

mristin commented Feb 1, 2021

mristin commented Feb 1, 2021

pschanely commented Feb 1, 2021

mristin commented Feb 1, 2021

pschanely commented Feb 1, 2021

mristin commented Feb 2, 2021

pschanely commented Feb 2, 2021

mristin commented Feb 2, 2021 •

edited

pschanely commented Feb 2, 2021

mristin commented Feb 3, 2021

pschanely Feb 3, 2021

pschanely Feb 3, 2021

mristin Feb 3, 2021

mristin Feb 3, 2021

Group examples by kind and result #58

Group examples by kind and result #58

Conversation

mristin commented Feb 1, 2021

mristin commented Feb 1, 2021

mristin commented Feb 1, 2021

pschanely commented Feb 1, 2021

mristin commented Feb 1, 2021

pschanely commented Feb 1, 2021

mristin commented Feb 2, 2021

pschanely commented Feb 2, 2021

mristin commented Feb 2, 2021 • edited

pschanely commented Feb 2, 2021

mristin commented Feb 3, 2021

pschanely Feb 3, 2021

Choose a reason for hiding this comment

pschanely Feb 3, 2021

Choose a reason for hiding this comment

mristin Feb 3, 2021

Choose a reason for hiding this comment

mristin Feb 3, 2021

Choose a reason for hiding this comment

mristin commented Feb 2, 2021 •

edited