-
Notifications
You must be signed in to change notification settings - Fork 21.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More Timer refinement #46023
More Timer refinement #46023
Conversation
Test breakage is due to where I put |
I moved the benchmark utils tests into a separate file, as they are now non-trivial. However I realize this makes it more difficult to review. The changes apart from the move are:
|
Also CC @heitorschueroff |
Codecov Report
@@ Coverage Diff @@
## master #46023 +/- ##
==========================================
+ Coverage 68.33% 68.37% +0.04%
==========================================
Files 410 411 +1
Lines 53795 53937 +142
==========================================
+ Hits 36760 36881 +121
- Misses 17035 17056 +21
Continue to review full report at Codecov.
|
💊 CI failures summary and remediationsAs of commit 931d85b (more details on the Dr. CI page):
1 failure not recognized by patterns:
This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group. This comment has been revised 74 times. |
@@ -0,0 +1,1187 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure you actually want to check this in? Need to keep reading to find out why this is used, but this seems a bit fragile.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's to hermetically mock a collected Callgrind so some of the supporting functionality can be tested as a fast unit test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see now that this is a test fixture.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would have been nice if this fixture were smaller. Not sure if this is actually possible, most of the lines are coming from Python.
You say that it is strict safe, but I don't see adjustments to the mypy config to ensure these keep getting checked as strict. CI typechecking is important for ensuring people continue to keep things strict. If there are upstream type problems I suggest suppressing them. |
"counter()", | ||
globals={"counter": pickle.loads(pickle.dumps(counter))} | ||
).timeit(20) | ||
print(counter.value) # Still 10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I must admit, I'm not terribly convinced by this argument. Yes, side effects run inside the timer may get disregarded. But there isn't really any reason to presuppose that the user cared about the side effects at all in the first place. After all, they're calling a timer on a piece of code in a loop. It's incredibly unlikely that they actually wanted to call the operation 10 times; if they are doing a side-effectful operation, they're just doing it to exercise some piece of code that they're interested in timing.
If the user doesn't care about the side effects, then blacklisting Tensor seems like it's just unnecessarily making people's life harder when they have some nontrivial setup that they don't want to have to handle inside Timer.
A better reason to block globals is if the serialization/deserialization process perturbs the representation of a tensor in such a way that would result in the timing to be different. This is not an idle concern; for example, if a tensor lives in pinned CPU memory, I'm reasonably certain this wouldn't get preserved by a dump, and that will change the performance of certain CUDA operations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't have to do this in this PR, but some more thoughts: it seems better and more explicit to make the user say that they are doing some sort of serialization. Allow the transfer of primitive types (where the serialization is well defined) but then make the user actually do serialization/deserialization if they want to. It will make it more obvious that something is going on (and if there is a bug in the user's serialization code, it will save them a lot of heartbreak).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated the language, and added a CopyIfCallgrind
wrapper for users to declare that they're willing to have their classes serialized. I realized that there's a slight chicken and egg problem. You might want setup
to execute before globals are loaded so you can setup the environment for unpickle to succeed, but you would want the reverse if you plan to use setup
to revive bytes in globals
. timeit
has the latter behavior (I thought it was the former so I need to switch the codegen order), but that seems to imply that CopyIfCallgrind
has to also allow an optional per-variable setup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"model": benchmark_utils.CopyIfCallgrind(
MyModule(),
setup=f"""\
import sys
sys.path.append({repr(os.path.split(os.path.abspath(__file__))[0])})
from test_benchmark_utils import MyModule
"""
)
Isn't pickle fun?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Also because pickle recursively unpickles I don't think we can automagically generate the CopyIfCallgrind.setup
code)
} | ||
|
||
with open("/tmp/test_callgrind_artifacts.json", "wt") as f: | ||
json.dump(artifacts, f, indent=4) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fixture for generating the test data should be actual code and be executed in CI (with some basic sanity test on the output) to prevent it from bitrotting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's now a proper function, though unlike expecttest
it doesn't auto-regen since the diff can be quite large due to changed build dir prefixes.
stats_no_data, stats_with_data = load_test_example() | ||
|
||
self.assertEqual(stats_no_data.counts(), 8869966) | ||
self.assertEqual(stats_no_data.counts(denoise=True), 8728096) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should have instructions for how to update the test numbers (and check that they're right!) if you refresh the fixture
("pass", 8e-9), | ||
("cheap_fn()", 4e-6), | ||
("expensive_fn()", 20e-6), | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't you just define this as a dictionary in the first place?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm always wary of adding mutable class fields, so it's just habit at this point.
I added a |
@ezyang I updated the way testing works. I have a sneaking suspicion that you'll either love it or hate it. (Hopefully the former...) The TL;DR is that for all the "string check heavy" tests, the test and regeneration pass are 99% the same, just swapping out This of course all needs to be documented in the test file itself, but I figured I'd give you a chance to digest the high level approach while I write docstrings and type annotations. |
Done. I had to change |
Test failure appears unrelated. |
Clang-tidy build is failing due to an unrelated issue. #46315 seems like it should fix it, it just hasn't been picked up by |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@robieta has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
torch/utils/benchmark/utils/valgrind_wrapper/*.py | ||
|
||
[mypy-torch.utils.benchmark.utils.*] | ||
follow_imports = normal |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this follow_imports
line really necessary? (If it is, does that mean we also need it for tools.codegen.gen
too?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is because of the
[mypy-torch.*]
follow_imports = skip
block. I added a comment explaining why.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@robieta has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
Thanks for the reviews! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@robieta has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@robieta has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
This PR just adds more polish to the benchmark utils:
common.py
,timer.py
, andvalgrind_wrapper/timer_interface.py
are now MyPy strict compliant. (except for three violations due to external deps.) Compare and Fuzzer will be covered in a future PR.CallgrindStats
now usesTaskSpec
rather than accepting the individual fields which brings it closer toMeasurement
.__repr__
logic has been moved intoTaskSpec
(whichMeasurement
andCallgrindStats
use in their own__repr__
s) for a more unified feel and less horrible f-string hacking, and the repr's have been given a cleanup pass.Tuple[FunctionCount, ...]
has been formalized as theFunctionCounts
class, which has a much nicer__repr__
than just the raw tuple, as well as some convenience methods (__add__
,__sub__
,filter
,transform
) for easier DIY stat exploration. (I find myself using the latter two a lot now.) My personal experience is that manipulatingFunctionCounts
is massively more pleasant than the raw tuples ofFunctionCount
. (Though it's still possible to get at the raw data if you want.)stmt
andsetup
.globals
incollect_callgrind
. This should make it easier to benchmark JIT models. (CC @ZolotukhinM)MKL_THREADING_LAYER
when run in Jupyter. (Pytorch 1.5.0 (installed from conda) errors with complaints about incompatibility between MKL and libgomp when using Pytorch's multiprocessing #37377)Test plan: changes should be covered by existing and new unit tests.