Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More Timer refinement #46023

Closed
wants to merge 27 commits into from
Closed

More Timer refinement #46023

wants to merge 27 commits into from

Conversation

robieta
Copy link

@robieta robieta commented Oct 8, 2020

This PR just adds more polish to the benchmark utils:

  1. common.py, timer.py, and valgrind_wrapper/timer_interface.py are now MyPy strict compliant. (except for three violations due to external deps.) Compare and Fuzzer will be covered in a future PR.
  2. CallgrindStats now uses TaskSpec rather than accepting the individual fields which brings it closer to Measurement.
  3. Some __repr__ logic has been moved into TaskSpec (which Measurement and CallgrindStats use in their own __repr__s) for a more unified feel and less horrible f-string hacking, and the repr's have been given a cleanup pass.
  4. Tuple[FunctionCount, ...] has been formalized as the FunctionCounts class, which has a much nicer __repr__ than just the raw tuple, as well as some convenience methods (__add__, __sub__, filter, transform) for easier DIY stat exploration. (I find myself using the latter two a lot now.) My personal experience is that manipulating FunctionCounts is massively more pleasant than the raw tuples of FunctionCount. (Though it's still possible to get at the raw data if you want.)
  5. Better support for multi-line stmt and setup.
  6. Compare now also supports rowwise coloring, which is often the more natural layout for A/B testing.
  7. Limited support for globals in collect_callgrind. This should make it easier to benchmark JIT models. (CC @ZolotukhinM)
  8. More unit tests, including extensive tests for the Callgrind stats manipulation APIs.
  9. Mitigate issue with MKL_THREADING_LAYER when run in Jupyter. (Pytorch 1.5.0 (installed from conda) errors with complaints about incompatibility between MKL and libgomp when using Pytorch's multiprocessing #37377)

Test plan: changes should be covered by existing and new unit tests.

@robieta robieta requested review from ezyang and ngimel October 8, 2020 07:36
@robieta
Copy link
Author

robieta commented Oct 8, 2020

Test breakage is due to where I put test_callgrind_artifacts.json. I'll sort it out.

@robieta
Copy link
Author

robieta commented Oct 8, 2020

I moved the benchmark utils tests into a separate file, as they are now non-trivial. However I realize this makes it more difficult to review. The changes apart from the move are:

  • Add multi-line test to test_timer
  • Add a JIT'd function and an int to globals in test_collect_callgrind to make sure artifact transfer works properly.
  • test_manipulate_callgrind_stats is all new.
  • Add test for rowwise coloring in test_compare

@robieta
Copy link
Author

robieta commented Oct 8, 2020

Also CC @heitorschueroff
I'm starting to flesh out the API docstrings, and a couple bits of this PR will allow us to formulate the latter parts of our recipe a little more elegantly.

@codecov
Copy link

codecov bot commented Oct 9, 2020

Codecov Report

Merging #46023 into master will increase coverage by 0.04%.
The diff coverage is 67.84%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #46023      +/-   ##
==========================================
+ Coverage   68.33%   68.37%   +0.04%     
==========================================
  Files         410      411       +1     
  Lines       53795    53937     +142     
==========================================
+ Hits        36760    36881     +121     
- Misses      17035    17056      +21     
Impacted Files Coverage Δ
...enchmark/utils/valgrind_wrapper/compat_bindings.py 0.00% <0.00%> (ø)
...enchmark/utils/valgrind_wrapper/timer_interface.py 48.90% <58.92%> (+23.11%) ⬆️
torch/utils/benchmark/utils/timer.py 85.88% <83.33%> (+7.05%) ⬆️
torch/utils/benchmark/utils/common.py 98.63% <97.43%> (+2.23%) ⬆️
torch/utils/benchmark/__init__.py 100.00% <100.00%> (ø)
torch/utils/benchmark/utils/compare.py 97.96% <100.00%> (+0.62%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f2e5ae4...386afb8. Read the comment docs.

@dr-ci
Copy link

dr-ci bot commented Oct 9, 2020

💊 CI failures summary and remediations

As of commit 931d85b (more details on the Dr. CI page):


  • 1/1 failures introduced in this PR---

1 failure not recognized by patterns:

Job Step Action
CircleCI binary_linux_libtorch_3_7m_cpu_devtoolset7_shared-with-deps_build Build 🔁 rerun
This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 74 times.

torch/utils/benchmark/utils/timer.py Outdated Show resolved Hide resolved
torch/utils/benchmark/utils/timer.py Outdated Show resolved Hide resolved
torch/utils/benchmark/utils/timer.py Outdated Show resolved Hide resolved
torch/utils/benchmark/utils/timer.py Outdated Show resolved Hide resolved
torch/utils/benchmark/utils/timer.py Outdated Show resolved Hide resolved
torch/utils/benchmark/utils/timer.py Outdated Show resolved Hide resolved
torch/utils/benchmark/utils/timer.py Outdated Show resolved Hide resolved
torch/utils/benchmark/utils/timer.py Outdated Show resolved Hide resolved
torch/utils/benchmark/utils/timer.py Show resolved Hide resolved
torch/utils/benchmark/utils/timer.py Show resolved Hide resolved
@@ -0,0 +1,1187 @@
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure you actually want to check this in? Need to keep reading to find out why this is used, but this seems a bit fragile.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's to hermetically mock a collected Callgrind so some of the supporting functionality can be tested as a fast unit test.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see now that this is a test fixture.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would have been nice if this fixture were smaller. Not sure if this is actually possible, most of the lines are coming from Python.

@ezyang
Copy link
Contributor

ezyang commented Oct 9, 2020

You say that it is strict safe, but I don't see adjustments to the mypy config to ensure these keep getting checked as strict. CI typechecking is important for ensuring people continue to keep things strict. If there are upstream type problems I suggest suppressing them.

"counter()",
globals={"counter": pickle.loads(pickle.dumps(counter))}
).timeit(20)
print(counter.value) # Still 10
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I must admit, I'm not terribly convinced by this argument. Yes, side effects run inside the timer may get disregarded. But there isn't really any reason to presuppose that the user cared about the side effects at all in the first place. After all, they're calling a timer on a piece of code in a loop. It's incredibly unlikely that they actually wanted to call the operation 10 times; if they are doing a side-effectful operation, they're just doing it to exercise some piece of code that they're interested in timing.

If the user doesn't care about the side effects, then blacklisting Tensor seems like it's just unnecessarily making people's life harder when they have some nontrivial setup that they don't want to have to handle inside Timer.

A better reason to block globals is if the serialization/deserialization process perturbs the representation of a tensor in such a way that would result in the timing to be different. This is not an idle concern; for example, if a tensor lives in pinned CPU memory, I'm reasonably certain this wouldn't get preserved by a dump, and that will change the performance of certain CUDA operations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't have to do this in this PR, but some more thoughts: it seems better and more explicit to make the user say that they are doing some sort of serialization. Allow the transfer of primitive types (where the serialization is well defined) but then make the user actually do serialization/deserialization if they want to. It will make it more obvious that something is going on (and if there is a bug in the user's serialization code, it will save them a lot of heartbreak).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the language, and added a CopyIfCallgrind wrapper for users to declare that they're willing to have their classes serialized. I realized that there's a slight chicken and egg problem. You might want setup to execute before globals are loaded so you can setup the environment for unpickle to succeed, but you would want the reverse if you plan to use setup to revive bytes in globals. timeit has the latter behavior (I thought it was the former so I need to switch the codegen order), but that seems to imply that CopyIfCallgrind has to also allow an optional per-variable setup.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"model": benchmark_utils.CopyIfCallgrind(
    MyModule(),
    setup=f"""\
    import sys
    sys.path.append({repr(os.path.split(os.path.abspath(__file__))[0])})
    from test_benchmark_utils import MyModule
    """
)

Isn't pickle fun?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Also because pickle recursively unpickles I don't think we can automagically generate the CopyIfCallgrind.setup code)

}

with open("/tmp/test_callgrind_artifacts.json", "wt") as f:
json.dump(artifacts, f, indent=4)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fixture for generating the test data should be actual code and be executed in CI (with some basic sanity test on the output) to prevent it from bitrotting.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's now a proper function, though unlike expecttest it doesn't auto-regen since the diff can be quite large due to changed build dir prefixes.

stats_no_data, stats_with_data = load_test_example()

self.assertEqual(stats_no_data.counts(), 8869966)
self.assertEqual(stats_no_data.counts(denoise=True), 8728096)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should have instructions for how to update the test numbers (and check that they're right!) if you refresh the fixture

("pass", 8e-9),
("cheap_fn()", 4e-6),
("expensive_fn()", 20e-6),
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't you just define this as a dictionary in the first place?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm always wary of adding mutable class fields, so it's just habit at this point.

@robieta
Copy link
Author

robieta commented Oct 9, 2020

I added a load_inline path for the binding macros so folks can back test on older versions of PyTorch. Just drop the entire torch.utils.benchmark folder into the old version and it should just work. It's a hack, but should be a useful shim for the transition period. (After which we can easily rip it out.)

CC @ailzhang @albanD

@robieta
Copy link
Author

robieta commented Oct 13, 2020

@ezyang I updated the way testing works. I have a sneaking suspicion that you'll either love it or hate it. (Hopefully the former...) The TL;DR is that for all the "string check heavy" tests, the test and regeneration pass are 99% the same, just swapping out store to golden file for check against golden file. One of the properties of the current tests that I wanted to preserve was the ability to read the tests to sanity check them. To that end the generation emits two artifacts per test: foo.json which the unit test actually runs against, and foo.txt which is formatted for human consumption. (emitted together so there's no drift.) Which also (hopefully) has the added benefit that the diffs for the human readable files will make it easier to review any future changes.

This of course all needs to be documented in the test file itself, but I figured I'd give you a chance to digest the high level approach while I write docstrings and type annotations.

torch/utils/benchmark/utils/timer.py Outdated Show resolved Hide resolved
torch/utils/benchmark/utils/timer.py Outdated Show resolved Hide resolved
@robieta
Copy link
Author

robieta commented Oct 14, 2020

You say that it is strict safe, but I don't see adjustments to the mypy config to ensure these keep getting checked as strict. CI typechecking is important for ensuring people continue to keep things strict. If there are upstream type problems I suggest suppressing them.

Done. I had to change mypy-strict.ini to exclude torch (and numpy) to avoid transitive failures.

@robieta
Copy link
Author

robieta commented Oct 14, 2020

Test failure appears unrelated.

@robieta
Copy link
Author

robieta commented Oct 14, 2020

Clang-tidy build is failing due to an unrelated issue. #46315 seems like it should fix it, it just hasn't been picked up by fbcode/warm yet.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@robieta has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

torch/utils/benchmark/utils/valgrind_wrapper/*.py

[mypy-torch.utils.benchmark.utils.*]
follow_imports = normal
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this follow_imports line really necessary? (If it is, does that mean we also need it for tools.codegen.gen too?)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because of the

[mypy-torch.*]
follow_imports = skip

block. I added a comment explaining why.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@robieta has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@robieta
Copy link
Author

robieta commented Oct 15, 2020

Thanks for the reviews!

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@robieta has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

robieta pushed a commit that referenced this pull request Oct 15, 2020
@robieta
Copy link
Author

robieta commented Oct 15, 2020

binary_linux_libtorch_3_7m_cpu_devtoolset7_shared-with-deps_build and docker-pytorch-linux-xenial-py3-clang5-android-ndk-r19c failures are unrelated. (I observed that they happen on other PRs)

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@robieta has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@robieta merged this pull request in dda95e6.

@robieta robieta deleted the gh/taylorrobie/timer_papercuts branch January 11, 2021 21:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants