Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Group examples by kind and result #58

Merged

Conversation

mristin
Copy link
Contributor

@mristin mristin commented Feb 1, 2021

This patch groups the examples by kind (PEP 316, icontract) and outcome
(expected success, expected failure).

Additionally, the patch introduces a script to run the functional tests
and verify that the captured output and the exit code of the check
command does not deviate from the expected output and the exit code.

@mristin
Copy link
Contributor Author

mristin commented Feb 1, 2021

@pschanely could you please have a look and call run_functional_tests.py on your machine? Some of the examples with the expected "success" outcome (no error detected) return a failure, while a single failure example (with icontract) seem not to report an error.

In case the functionality is missing so that some examples fail, I'd suggest to actually remove the examples (store them in work_in_progress directory?) and have at least a subset for which we know that the functional tests pass. You can then keep adding the examples as the functionality is supported so you always know where the tool stands.

As a side note, the functional tests are probably too slow to run on every commit, but you can run them nightly or before a release.

@mristin mristin force-pushed the Group-examples-by-kind-and-result branch from f420d1d to 704c5cb Compare February 1, 2021 19:13
@mristin
Copy link
Contributor Author

mristin commented Feb 1, 2021

Let's first see what to do with the current tests. I'll add more tests involving icontract afterwards.

@pschanely
Copy link
Owner

This is so cool! I probably should clarify that the original idea with examples/ was just to show various things you might do with CrossHair, so not all of them are expected to pass.

In addition, some are expected to fail but only with suitably long timeouts. (tic_tac_toe.py for example)

could you please have a look and call run_functional_tests.py on your machine?

This is what I get presently. Maybe we can do something about the varying path slashes. All of the PEP316 failures in that gist are supposed to fail I believe.

Some of the examples with the expected "success" outcome (no error detected) return a failure, while a single failure example (with icontract) seem not to report an error.

Indeed, some of the PEP316 ones should fail. (do your failures match mine?)

And, as for the icontract tests, it looks to me like the one in fail/ should fail and does fail?

In case the functionality is missing so that some examples fail, I'd suggest to actually remove the examples (store them in work_in_progress directory?) and have at least a subset for which we know that the functional tests pass. You can then keep adding the examples as the functionality is supported so you always know where the tool stands.

Maybe we start with those in fail/ and I can do a pass afterwards and make some tweaks. (either remove examples or re-categorize them or something) Some example files like showcase.py are better to split up under this model, as some are expected to fail and some pass.

As a side note, the functional tests are probably too slow to run on every commit, but you can run them nightly or before a release.

I might like to try and get them into the test suite eventually (it's already ~8min on my machine), but maybe we don't start with that right away.

@mristin
Copy link
Contributor Author

mristin commented Feb 1, 2021

@pschanely thanks for the clarification :) I'll split the examples on pass/fail_fast/fail_slow. Could you suggest a better naming? Pass/fail insinuates some kind of testing. What about true_negative (instead of pass), true_positive_fast/_slow (instead of fail)?

Later you might also want to add false positive/negative, and also make sure these are appropriately documented, so in light of that this naming makes sense to me.

@pschanely
Copy link
Owner

Someone on the internet suggested "false alarm" (for false positive) and "missed bugs" (for false negative) which I use in the github issue tags. I wonder whether some similar intuitive naming could apply for the true cases? I would like people to very quickly understand what's in there. Open to ideas, and true_positive/true_negative is also fine by me if we don't have a better idea.

Maybe skip the fast/slow distinction for now - I'll remove, optimize, or tweak settings to keep things simple.

@mristin
Copy link
Contributor Author

mristin commented Feb 2, 2021

@pschanely can you come up with a couple of examples in which side effects in the code are safe to be analysed? (Example: shutil call in a function called from a condition? Shutil call in the body of a function? Shutil call in a function called from the body of a function?)

Maybe this merits a separate kind of functional tests? Or we just pass in TMP_DIR as environment variable? How can we verify tbhat shutil was not executed?

You can write examples here and I'll incorporate them in the directories, or just push a commit, whatever is easier for you.

@pschanely
Copy link
Owner

Ah! Your comments highlight how much we need better documentation here!

You don't want to perform side effects anywhere: the conditions, the body, or even functions called by the body. (CrossHair may "short circuit" subroutine calls, but often doesn't)

It's best to imagine it just like a hypothesis test: your code will run and take whatever actions that code takes, given arbitrary inputs.

@mristin
Copy link
Contributor Author

mristin commented Feb 2, 2021

Sorry, my bad. I actually meant this shortcircuiting:

(CrossHair may "short circuit" subroutine calls, but often doesn't)

It's best to imagine it just like a hypothesis test: your code will run and take whatever actions that code takes, given arbitrary inputs.

(Follow this thread on the issue #61.)
(Let's discuss this issue in a separate issue? I'll finish the examples now so that they can be merged in.)

@pschanely
Copy link
Owner

Let's discuss this issue in a separate issue? I'll finish the examples now so that they can be merged in

Yes, great. Just LMK when you'd like me to take another look!

This patch groups the examples by kind (PEP 316, icontract) and outcome
(expected success, expected failure).

Additionally, the patch introduces a script to run the functional tests
and verify that the captured output and the exit code of the `check`
command does not deviate from the expected output and the exit code.
@mristin mristin force-pushed the Group-examples-by-kind-and-result branch from d0eabf2 to 1a4f101 Compare February 3, 2021 04:15
@mristin
Copy link
Contributor Author

mristin commented Feb 3, 2021

@pschanely the pull request is now review-ready.

@mristin mristin mentioned this pull request Feb 3, 2021
else:
return -1

expected_stdout = expected_stdout_pth.read_text()
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we normalize the path slashes in the output?
Or cut off the leading directories, e.g.?:

expected = re.compile(r'^.*[/\\]([_\w]+\.py)').sub(r'\1', expected)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merging so you can get that formatting PR through and we can iterate on this separately. (I'll want to tweak some of the examples myself too)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we normalize the path slashes in the output?

Silly me, thanks for spotting it! .as_posix() does the trick. Let me create a small PR to fix this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, as_posix was a dummy idea. I implemented your suggestion with regular expression, see #64.

@pschanely pschanely merged commit 9f6cea3 into pschanely:master Feb 3, 2021
@mristin mristin deleted the Group-examples-by-kind-and-result branch February 3, 2021 15:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants