New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Group examples by kind and result #58
Group examples by kind and result #58
Conversation
@pschanely could you please have a look and call In case the functionality is missing so that some examples fail, I'd suggest to actually remove the examples (store them in As a side note, the functional tests are probably too slow to run on every commit, but you can run them nightly or before a release. |
f420d1d
to
704c5cb
Compare
Let's first see what to do with the current tests. I'll add more tests involving icontract afterwards. |
This is so cool! I probably should clarify that the original idea with examples/ was just to show various things you might do with CrossHair, so not all of them are expected to pass. In addition, some are expected to fail but only with suitably long timeouts. (tic_tac_toe.py for example)
This is what I get presently. Maybe we can do something about the varying path slashes. All of the PEP316 failures in that gist are supposed to fail I believe.
Indeed, some of the PEP316 ones should fail. (do your failures match mine?) And, as for the icontract tests, it looks to me like the one in fail/ should fail and does fail?
Maybe we start with those in fail/ and I can do a pass afterwards and make some tweaks. (either remove examples or re-categorize them or something) Some example files like
I might like to try and get them into the test suite eventually (it's already ~8min on my machine), but maybe we don't start with that right away. |
@pschanely thanks for the clarification :) I'll split the examples on pass/fail_fast/fail_slow. Could you suggest a better naming? Pass/fail insinuates some kind of testing. What about true_negative (instead of pass), true_positive_fast/_slow (instead of fail)? Later you might also want to add false positive/negative, and also make sure these are appropriately documented, so in light of that this naming makes sense to me. |
Someone on the internet suggested "false alarm" (for false positive) and "missed bugs" (for false negative) which I use in the github issue tags. I wonder whether some similar intuitive naming could apply for the true cases? I would like people to very quickly understand what's in there. Open to ideas, and true_positive/true_negative is also fine by me if we don't have a better idea. Maybe skip the fast/slow distinction for now - I'll remove, optimize, or tweak settings to keep things simple. |
@pschanely can you come up with a couple of examples in which side effects in the code are safe to be analysed? (Example: shutil call in a function called from a condition? Shutil call in the body of a function? Shutil call in a function called from the body of a function?) Maybe this merits a separate kind of functional tests? Or we just pass in TMP_DIR as environment variable? How can we verify tbhat shutil was not executed? You can write examples here and I'll incorporate them in the directories, or just push a commit, whatever is easier for you. |
Ah! Your comments highlight how much we need better documentation here! You don't want to perform side effects anywhere: the conditions, the body, or even functions called by the body. (CrossHair may "short circuit" subroutine calls, but often doesn't) It's best to imagine it just like a hypothesis test: your code will run and take whatever actions that code takes, given arbitrary inputs. |
Sorry, my bad. I actually meant this shortcircuiting:
(Follow this thread on the issue #61.) |
Yes, great. Just LMK when you'd like me to take another look! |
This patch groups the examples by kind (PEP 316, icontract) and outcome (expected success, expected failure). Additionally, the patch introduces a script to run the functional tests and verify that the captured output and the exit code of the `check` command does not deviate from the expected output and the exit code.
d0eabf2
to
1a4f101
Compare
@pschanely the pull request is now review-ready. |
else: | ||
return -1 | ||
|
||
expected_stdout = expected_stdout_pth.read_text() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we normalize the path slashes in the output?
Or cut off the leading directories, e.g.?:
expected = re.compile(r'^.*[/\\]([_\w]+\.py)').sub(r'\1', expected)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Merging so you can get that formatting PR through and we can iterate on this separately. (I'll want to tweak some of the examples myself too)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we normalize the path slashes in the output?
Silly me, thanks for spotting it! .as_posix()
does the trick. Let me create a small PR to fix this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, as_posix
was a dummy idea. I implemented your suggestion with regular expression, see #64.
This patch groups the examples by kind (PEP 316, icontract) and outcome
(expected success, expected failure).
Additionally, the patch introduces a script to run the functional tests
and verify that the captured output and the exit code of the
check
command does not deviate from the expected output and the exit code.