-
Notifications
You must be signed in to change notification settings - Fork 24.6k
Generic test parametrization functionality #60753
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful links
💊 CI failures summary and remediationsAs of commit 1358d20 (more details on the Dr. CI page):
🕵️ 1 new failure recognized by patternsThe following CI failures do not appear to be due to upstream breakages:
|
test/test_blah.py
Outdated
@@ -0,0 +1,46 @@ | |||
import torch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there an existing test that could be used as an example instead of these samples?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line 4301 in c74c0c5
def test_gradcheck_forward_ad(self): |
Lines 3915 to 3923 in c74c0c5
def test_gradcheck_single_input(self): | |
def check(fast_mode): | |
def f(inp): | |
return inp.mul(5) | |
gradcheck(f, torch.rand(10, dtype=torch.float64, requires_grad=True), fast_mode=fast_mode) | |
gradgradcheck(f, torch.rand(10, dtype=torch.float64, requires_grad=True), fast_mode=fast_mode) | |
check(fast_mode=True) | |
check(fast_mode=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To add a data point... I like having the samples in test_blah as test cases because if something starts failing then it is clearer the failure is due to the parameterize functionality and not e.g. a bug in PyTorch operator code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can add tests to test_testing.py (there are some examples) if we want to preserve testing the decorator without a specific test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I definitely agree on adding tests to test_testing.py
for these decorators. To @zou3519's point, being able to distinguish parametrization failures vs. real test failures is crucial.
Just dumped them in test_blah.py
for the sake of demonstration, but they should definitely be moved to the proper permanent location, and test_testing.py
seems right :)
test/test_sparse.py
Outdated
@@ -738,7 +738,7 @@ def test_cross_device(x1, x2): | |||
self.assertEqual(None, x1.grad) | |||
|
|||
@onlyCUDA | |||
def test_cuda_empty(self, _): | |||
def test_cuda_empty(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tests should accept a device argument and use it. The argument is not a device type but a string representing an actual device. Like "cuda:0" or "cuda:1", but these tests are ignoring the test device in favor of "cuda:0"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has been fixed in the previous parametrization PR- once I rebase the change will show up here too
|
||
class _TestParametrizer(object): | ||
""" | ||
Decorator class for parametrizing a test function, yielding a set of new tests spawned |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For language consistency the decorated function is the test "template" which is instantiated into the actual tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Follow-up on this, this is really the base class to inherit from if writing such a decorator, right? A pointer to an example of a class that inherits from this in the docs here would be neat, too
Decorator for applying generic test parametrizations. | ||
|
||
Args: | ||
arg_str (str): String of arg names separate by commas (e.g. "x,y") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does pytest allow for naming of the variants?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree that variant naming is nice to have (and I'd consider adding it even if pytest doesn't have it)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So my understanding is that pytest's parametrize only works with pytest tests and not unittest (see here).
What this means is that pytest has an internal way of doing variant naming. I don't believe it adds e.g. a test_func_variant_a()
function to the test class for every variant, in contrast to unittest's modus operandi. So I think we'll have to add some custom mechanism for doing the naming to stay compatible with unittest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If people name the variants I think it's straightforward to generate a test for each of them. If people don't name the variants but the variants are a sequence with a fixed length (like a list or tuple) then we can probably name them _0 through _n. Where naming gets tricky is if we have a variable-length variant generator (like sample inputs), because our test template instantiation happens at "compile time" and we probably don't want to enumerate arbitrary generators just to discover how many elements they have. If people are really keen on using generators maybe we can have them pass the length as a compile-time constant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some updates that allow for variant naming. There are two mechanisms for doing so, and whichever is most convenient can be used:
- Specify
name_mappings
(open to other names), which is a list of either dicts of callables that map a parameter value to its string representation. The test suffix is formed by joining parameter string representations with underscores. - Specify
suffix_fn
, which is a callable that takes in the parameter values and returns a test suffix. This can be used when full control over the suffix is needed and it shouldn't just be an underscore-separated string.
If neither are specified, the default test suffix consists of the lowercase string representation for each parameter value separated by underscores.
Check out the updated PR description for examples!
|
||
def _parametrize_test(self, test, generic_cls, device_cls): | ||
|
||
# Build a single composite test that tests all the parametrized cases. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How important is each variant being instantiated in its own test, @zou3519?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's pretty important but I wouldn't block this PR on this. FWIW pytest also generates a test case per variant.
The main argument for each variant getting its own test is because it makes it easier to tell exactly what variant a test case failed for. For example, if I have some autograd tests that are parametrized over being for "reverse-mode" or "forward-mode" AD (this happens in some tests in test/test_autograd.py) if a test fails I want to be able to know if reverse-mode is broken, if forward-mode is broken, or if both are broken.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely need variants as individual tests, I agree! See my comment above for how pytest does this, from what I can tell. Given that, there's a UX discussion to be had on the right way to allow devs to name their variants.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really cool; I made a few comments inline.
Throwing an error when multiple decorators are used sounds good because pytest does allow multiple parametrize decorators to be used and takes their cross-product, so that's a scenario that users might reasonable expect to work the same way
Definitely agree, don't want this confusion. One alternative is that we also do the cross-product here? I have to fully think through how composing @mruberry How useful do you think being able to compose |
8e9dda3
to
3d208e5
Compare
I pushed an update that does test per variant, but the naming is currently pretty bad (it uses concatenated lower-case string representations of the param values): class TestFoo(unittest.TestCase):
@parametrize("x,y", [(1, 2), (3, 4)])
def test_bar():
... results in these test names:
|
Sure! We could also cut this PR's features for code velocity and create a follow-up issue with tracking tasks.
I think there are cases where this could be useful but wouldn't require this as a v0 feature. |
arg_values (iterable): Iterable of arg values (e.g. range(10)) or | ||
tuples of arg values (e.g. [(1, 2), (3, 4)]) | ||
""" | ||
def __init__(self, arg_str, arg_values): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about having arg_values=None
to indicate a boolean parameter? In this case, two versions of the test would be created, and instead of appending the value to the test name, append arg_str only for the True case. This could be great for generating versions based on some boolean flag parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea - I think it might be cool to have some magic for the very common case of boolean flags.
raise NotImplementedError | ||
|
||
def __call__(self, fn): | ||
fn.parametrize_fn = self._parametrize_test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this allow for multiple @parametrize to test the product of some parameters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, no. I'm looking into two types of composability for @parametrize
:
- Product-wise composability as you mentioned (each
@parametrize
instance specifies a subset of args and tests are generated from the product) - Additive composability (each
@parametrize
instance specifies all args, adding to the tests that are run)
Technically, it might be possible to have both, but I'm leaning towards only providing the latter. I think if a true product is desired, it can be generated inline manually. Do you have a strong opinion either way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update here: I ended up adding product-wise composability since several people I talked to expected it.
13e36c9
to
cd89292
Compare
Suggestion: If the case list has an additional element (3 instead of 2), then we interpret the first element as the "test variant name" (or the suffix).
Otherwise, the user is required to specify a
Alternatively, if the case list is a Dict[str, Tuple] instead of List[Tuple], then we interpret the key as the "test variant name":
|
While this does work when there are multiple parameters, there's some ambiguity for the single parameter case: # Should we use custom names OR parametrize where each x is a (str, int)? Either could be valid.
@parametrize("x", [('foo', 1), ('bar', 2), ('baz', 3)])
I like this better, but what if you want to mark 'foo' as an expected failure, for example? You'd need to rewrite the whole thing to use the |
Good point. Maybe we should just use the subtest structure. The only downside is that it could be verbose but it's gives us a good mix of things we want (as you noted, (1) it makes it easier to add expected failures and (2) it satisfies our constraint of keeping the variant name with the case) |
Pushed an update converging on the "subtest" terminology and removing most of the ways to specify subtest names. |
Good feedback from @albanD posted here for posterity: make sure it's easy to ctrl-f for test names within the code. Generated test names make this harder; it could help to make very clear what is the searchable base name part and what is the variant name part. |
I think so. In fact, I think @heitorschueroff has a PR where he'd like to do exactly this |
self.arg_values = arg_values | ||
self.name_fn = name_fn | ||
|
||
def _formatted_str_repr(self, name, value): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Naming is hard
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks really good, @jbschlosser - I just added a few inline comments for your review
f35b2f1
to
2ef20f4
Compare
2ef20f4
to
8eab8f3
Compare
5a57d9c
to
40cfc61
Compare
Codecov Report
@@ Coverage Diff @@
## master #60753 +/- ##
==========================================
- Coverage 66.69% 66.42% -0.27%
==========================================
Files 718 724 +6
Lines 92693 93271 +578
==========================================
+ Hits 61817 61958 +141
- Misses 30876 31313 +437 |
@jbschlosser has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
40cfc61
to
7a7182b
Compare
@jbschlosser has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @jbschlosser! This is super cool (I encourage you to do a post on it and update some docs to let people know they have a parametrize decorator). I left just a few super minor comments about possible opportunities to improve the docs, but nothing blocking.
@jbschlosser has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
7e8a72b
to
4bb2692
Compare
@jbschlosser has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
@jbschlosser has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
@jbschlosser has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
@jbschlosser merged this pull request in b7ec7d7. |
* Revert D30711934: [pytorch][PR] Use RDS for build size tracking Test Plan: revert-hammer Differential Revision: D30711934 (https://github.com/pytorch/pytorch/commit/1cd0252eed8ddb26e4599ef2b0fec4d8843b8828) Original commit changeset: 0af808ddf528 fbshipit-source-id: 6f67ed5cbaf333cc55729be2a23e385772e31b10 * Replace composite dispatch with `CompositeExplicitAutograd` (#64641) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64641 `sum`, `mean`, and `norm` were ported to structured kernels in #61642, #61643, and #62711, respectively. Those PRs changed related overlads into composite kernels. However, their dispatch section remained the same, when they really should be marked as `CompositeExplicitAutograd`. This PR fixes this issue. Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D30867122 Pulled By: ezyang fbshipit-source-id: b951aee41a3cab9ca546df826a285d60013e3b3a * Make {select,slice,diagonal}_backward primitives wrt autograd (#64933) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64933 Fixes https://github.com/facebookresearch/functorch/issues/108 This is a short-term fix. A longer-term fix would be to either: 1. have proper {select,slice,diagonal}_embed functions 2. have efficient {select,slice,diagonal}_scatter functions (and efficient zero tensors). NB: I didn't use diag_embed because diag_embed is slightly different from diagonal_backward. There are no BC concerns because TorchScript (luckily) does not serialize the backwards graph. Test Plan: - run tests - run benchmarks. https://gist.github.com/zou3519/e7c0774d1ac97f32aa02ec44d81e60e1. Surprisingly the instruction count goes down. This is probably because we create fewer autograd nodes now. Reviewed By: ezyang Differential Revision: D30909333 Pulled By: zou3519 fbshipit-source-id: 3b33e13010ba13b4d487b346aa9bee8a0e8c378c * print_test_stats.py: dedup test report upload name with TEST_CONFIG (#64948) Summary: Connected with issue https://github.com/pytorch/pytorch/issues/64845, takeover of https://github.com/pytorch/pytorch/issues/64091 Pull Request resolved: https://github.com/pytorch/pytorch/pull/64948 Reviewed By: malfet, seemethere Differential Revision: D30908592 Pulled By: janeyx99 fbshipit-source-id: dc31b0bbc9f4e35d23412aa14acbbab7422b4146 * Disable target determination for now (#64921) Summary: There were several reports of target determinator incorrectly skipping tests, most recent one is https://github.com/pytorch/pytorch/issues/64902 Let's disable it until it could be further stabilized Pull Request resolved: https://github.com/pytorch/pytorch/pull/64921 Reviewed By: seemethere, janeyx99 Differential Revision: D30901186 Pulled By: malfet fbshipit-source-id: 531afd2d390c6b51f727330d5dd1882d70b6fdde * Drop incremental linking on Windows with REL_WITH_DEB_INFO=1. (#64892) Summary: The library will no longer link properly on VS 2019 (14.29.30133). To ensure that engineers building on Windows can use and debug with this build type, incremental linking needs to be turned off for this build flag. Verified that this build type successfully builds, links, and provides debuggable Python modules on Windows. Pull Request resolved: https://github.com/pytorch/pytorch/pull/64892 Reviewed By: jbschlosser Differential Revision: D30902565 Pulled By: malfet fbshipit-source-id: e5286a4c6f45c7cbe4cdc1b98560129bd386970b * [Model Averaging] Revert #63895 (#64903) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64903 Fix the accuracy regression caused by https://github.com/pytorch/pytorch/pull/63895. Test Plan: buck test mode/dev-nosan //caffe2/test/distributed:distributed_nccl_spawn -- test_periodic_model_averager buck test mode/dev-nosan //caffe2/test/distributed:distributed_nccl_spawn -- test_post_localSGD_optimizer_parity Reviewed By: rohan-varma Differential Revision: D30894688 fbshipit-source-id: fe00b8b23b860d9f806f87c1b6caba1d0b807485 * [fx const fold] fix some cases with deep model hierarchy (#64945) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64945 In the const folding pass, we try to create `get_attr` nodes in submod_1 for `get_attr` nodes that are in the main graph. But we don't have the real attributes in submod_1. To fix this we assign main module as the owning module of sumod_1 graph. The fix above would cause problem for `call_module` node in submod_1 because during split modules gets inlined (target changed from "mod.a.b" -> "mod_a_b") to submod_1. Changing the owning module would make those `call_module nodes unable to find the referring module. To fix this, we set the targeting module to main module. Reviewed By: jfix71 Differential Revision: D30905949 fbshipit-source-id: cd67bc8fe4b8ad4344ae97b8e36753fdce3ece6d * [PyTorch] Don't store multiple kernels per key on mobile (#64447) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64447 As the code comment says, we needn't worry about Jupyter notebooks on mobile. ghstack-source-id: 137951718 Test Plan: Profiled startup of //caffe2/caffe2/fb/high_perf_models/pytorch/benchmark_framework_overheads:cpp_benchmark on devserver with -niter 0 -nrep 0 and `C10_DISPATCHER_ONE_KERNEL_PER_DISPATCH_KEY` defined. Time spent in sherwood_v3_table lookups went way down. Reviewed By: ezyang, bhosmer Differential Revision: D30736094 fbshipit-source-id: bcc22cd0d9adceba259a03898c992759d501fe89 * remove SkipInfo class (#64972) Summary: per title Pull Request resolved: https://github.com/pytorch/pytorch/pull/64972 Reviewed By: mruberry Differential Revision: D30924598 Pulled By: ngimel fbshipit-source-id: 1ac1ec8fd50ca27e3cd36c12a588d334e7466899 * .github: Add render test results step (#64937) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64937 Adds CLI output for rendered test results to go alongside test exeuction, users should be able to quickly diagnose test failures like so:  Signed-off-by: Eli Uriegas <eliuriegas@fb.com> cc ezyang seemethere malfet lg20987 pytorch/pytorch-dev-infra Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D30917897 Pulled By: seemethere fbshipit-source-id: f51ea499462e3cfd64496cb711b84a93971c91bd * [PyTorch Edge][Model Loading] Operator Call De-dup at TorchScript Serialization Level [1/2] (#64268) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64268 If the same pair of operator name and num inputs have been used to add an instruction to the operator table previously (and the operator's schema is not vararg), use the same index as that instruction rather than creating a new one. ghstack-source-id: 138014905 Test Plan: Phabricator tests, and test performance changes in next diff Reviewed By: iseeyuan, tugsbayasgalan Differential Revision: D30615434 fbshipit-source-id: f442f557f12412693a73004ce44733ccef063b82 * [PyTorch Edge][Model Loading] Operator Call De-dup at TorchScript Serialization Level [2/2] (#64269) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64269 Revert changes in D29826210 (https://github.com/pytorch/pytorch/commit/693d8f2f0767413bb995b895fccad87dfd4f05a7) (we don't need operator lambda caching since there aren't duplicate operators anymore) This diff stack results in an additional approx 12% speedup in model loading time (from 229ms to 200ms) when run against an 87MB speech model that jiatongzhou provided. ghstack-source-id: 138014904 Test Plan: **Speech Transducer v25 model (as in D29826210 (https://github.com/pytorch/pytorch/commit/693d8f2f0767413bb995b895fccad87dfd4f05a7))** || Before | After | |Load Time|[229ms](https://www.internalfb.com/intern/aibench/details/160889436133243)|[200ms](https://www.internalfb.com/intern/aibench/details/837884532607514)| |Save File Size|[86.23 MB](https://lookaside.facebook.com/intern/diff/file/data/?number=658544950)|[86.1 MB](https://lookaside.facebook.com/intern/diff/file/data/?number=658554403)| The "after" flamegraph shows significantly less time is spent on ```append_operator``` than before. Steps - Check out desired commit in devserver (base branch or this diff) - ```buck build bento/kernels:bento_kernel_pytorch``` - Use N1094068 with pytorch_local kernel to save model for lite interpreter - Edit ```aibench/specifications/models/pytorch/speech_transducer/v25.json ``` to have new model location and md5 - ```buck run aibench:run_bench -- -b aibench/specifications/models/pytorch/speech_transducer/v25.json --framework pytorch --platform android/arm64 --devices "S8US" --force_profile --remote ``` **Test that saving a model with de-dup ops doesn't change its output** https://www.internalfb.com/intern/anp/view/?id=1137434 Reviewed By: iseeyuan Differential Revision: D30615710 fbshipit-source-id: bb4052f0f16eccab386585e94411056f94bce43c * [fx2trt] fix elementwise op converter with one operand being a literal and has different type (#65004) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65004 If we have some code like `torch.add(x, 1)` and x is a float tensor then in conversion things would falling apart because currently we will add a constant layer of int32 dtype for `1` but we actually need float dtype. This diff adds an arg to `get_trt_tensor` which specify the dtype of the constant layer we would created. Also, start to add doc string for functions. Reviewed By: yinghai Differential Revision: D30852156 fbshipit-source-id: 650ce72d2794093a4616e640ea503dcc1c6b2bc4 * [PyTorch] Fix SourceRangeDeserializer vector copy (#64031) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64031 More copies of tuple elements. ghstack-source-id: 137978948 Test Plan: Pixel 3 before: https://our.intern.facebook.com/intern/aibench/details/724509739115867 Pixel 3 after: https://our.intern.facebook.com/intern/aibench/details/232361457767293 Top-line number doesn't seem to have moved, but we can see that the vector copy disappeared in the flame graph. Reviewed By: raziel Differential Revision: D30559545 fbshipit-source-id: e5343abae96b8e80e0ccec482ad316884ae231ea * [PyTorch] Remove implicit conversion from Tuple to vector reference (#63993) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63993 This seems to be unused, and it's pretty scary. ghstack-source-id: 137978949 Test Plan: CI Reviewed By: lw Differential Revision: D30560441 fbshipit-source-id: 08b7ce971fd1e2dbeddbf37b02413fef513b4753 * [PyTorch] Add OpCode cache in ByteCodeDeserializer (#64110) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64110 As the code comment says, we can exploit pickler string interning to accelerate OpCode parsing. No more strcmp! ghstack-source-id: 137978946 Test Plan: Pixel 3 before: https://www.internalfb.com/intern/aibench/details/591414145082422 Pixel 3 after: https://www.internalfb.com/intern/aibench/details/484557404703261 new mean is 292 ms, down from 302 ms. Reviewed By: dhruvbird Differential Revision: D30615052 fbshipit-source-id: 9707625e778388a7920ab72704d71ad57ddaac17 * [PyTorch] Add c10::hash<c10::ArrayRef<T>> (#64277) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64277 Just moved the vector implementation to ArrayRef and re-implemented the former using the latter. ghstack-source-id: 137978947 Test Plan: existing CI Reviewed By: dhruvbird Differential Revision: D30647666 fbshipit-source-id: c0f4f06c348d36882ec0db802be44d8c7749562f * [quant][tensorrt] Add tensorrt backend config (#64623) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64623 The config api will change, but we'll add configs gradually for TensorRT to unblock experimentation Test Plan: python torch/fx/experimental/fx2trt/example/unittests.py Imported from OSS Reviewed By: vkuzo Differential Revision: D30800474 fbshipit-source-id: 3c4640de1205a0f19b62943ab84f386d80394ec2 * [DataPipe] Improve Mapper to accept input/output index when apply fn (#64951) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64951 Test Plan: Imported from OSS Reviewed By: VitalyFedyunin Differential Revision: D30910035 Pulled By: ejguan fbshipit-source-id: d687fe10939920a3617a60552fe743e8526438a0 * Ported std/var to ReductionOpInfo and minimum/maximum to BinaryUfuncInfo (#63978) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63978 Test Plan: Imported from OSS Reviewed By: saketh-are Differential Revision: D30558877 Pulled By: heitorschueroff fbshipit-source-id: 3e62ff24a935784fc93a76a0f46a1deb060ba680 * [Model Averaging] Simplify PostLocalSGD Optimizer API (#64885) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64885 1) The constructor accepts a local optimizer instance instead of the inputs of local optimizer constructor and the class type. 2) The parameters are read from local optimizer's `param_groups` instead of a separate input. Proposal: https://github.com/pytorch/pytorch/issues/59699 ghstack-source-id: 137865867 Test Plan: buck test mode/dev-nosan //caffe2/test/distributed:distributed_nccl_spawn -- test_post_localSGD_optimizer_parity Reviewed By: rohan-varma Differential Revision: D30888794 fbshipit-source-id: 21261b480f6bbb9b2333426020e3f350da3f73c2 * Revert D30558877: Ported std/var to ReductionOpInfo and minimum/maximum to BinaryUfuncInfo Test Plan: revert-hammer Differential Revision: D30558877 (https://github.com/pytorch/pytorch/commit/382e008fbf5cc91c283fc902bb0dd6cb7d4bbfda) Original commit changeset: 3e62ff24a935 fbshipit-source-id: 3b9f03c1f43c6d5f2738ed139d0236f2ded78dbf * [CUDA graphs] moves memory sharing intro paragraph (#64996) Summary: Puts memory sharing intro under Sharing memory... header, where it should have been all along. Pull Request resolved: https://github.com/pytorch/pytorch/pull/64996 Reviewed By: mruberry Differential Revision: D30948619 Pulled By: ngimel fbshipit-source-id: 5d9dd267b34e9d3fc499d4738377b58a22da1dc2 * [fix] don't expose unique_dim in torch (#63080) Summary: Fixes https://github.com/pytorch/pytorch/issues/62793 This is mostly a quick fix. I think the more correct fix could be updating `unique_dim` to `_unique_dim` which could be BC-breaking for C++ users (� maybe). Maybe something else I am missing. ~~Not sure how to add a test for it.~~ Have tested it locally. We can add a test like following. Tested this locally, it fails currently but passes with the fix. ```python def test_wildcard_import(self): exec('from torch import *') ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/63080 Reviewed By: gchanan Differential Revision: D30738711 Pulled By: zou3519 fbshipit-source-id: b86d0190e45ba0b49fd2cffdcfd2e3a75cc2a35e * [vulkan] Use volk to load vulkan libraries and fix Windows build errors (#64988) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64988 Pull Request resolved: https://github.com/pytorch/pytorch/pull/64968 The current wrapper (provided by [Vulkan-Tools](https://github.com/KhronosGroup/Vulkan-Tools/tree/master/common)) can't handle dynamically loading Vulkan on Windows/Mac. Therefore, we can bring in [volk](https://github.com/zeux/volk) to load the vulkan libraries for other platforms. 1. Use `volk` with `link_style="static"` only if Windows. Use `vulkan_wrapper` for all others (temporary solution) 2. Make DotSlash work on Windows when resolving glslc path Test Plan: For Android: ``` cd ~/fbsource buck build -c ndk.custom_libcxx=false -c pt.enable_qpl=0 //xplat/caffe2:pt_vulkan_api_test_binAndroid\#android-arm64 --show-output adb push buck-out/gen/xplat/caffe2/pt_vulkan_api_test_binAndroid\#android-arm64 /data/local/tmp/vulkan_api_test adb shell "/data/local/tmp/vulkan_api_test" cd - ``` For Mac: ``` buck build //xplat/caffe2:pt_vulkan_api_test_binAppleMac ./buck-out/gen/xplat/caffe2/pt_vulkan_api_test_binAppleMac\#macosx-x86_64 ``` On Local OSS repo with `pr/64988` branch: The build and test are fine. Note that `VulkanAPITest.log_softmax()` has been broken for the past month. Ivan will take a look at when he is available. Build: `BUILD_TEST=1 USE_VULKAN=1 USE_VULKAN_SHADERC_RUNTIME=1 USE_VULKAN_WRAPPER=0 MACOSX_DEPLOYMENT_TARGET=10.9 CC=clang CXX=clang++ python setup.py install` Test: `$PYTORCH_ROOT/build/bin/vulkan_api_test /data/local/tmp` ``` Running main() from ../third_party/googletest/googletest/src/gtest_main.cc [==========] Running 69 tests from 1 test suite. [----------] Global test environment set-up. [----------] 69 tests from VulkanAPITest [ RUN ] VulkanAPITest.adaptive_avg_pool2d [ OK ] VulkanAPITest.adaptive_avg_pool2d (228 ms) [ RUN ] VulkanAPITest.add [ OK ] VulkanAPITest.add (51 ms) [ RUN ] VulkanAPITest.add_broadcast0 [ OK ] VulkanAPITest.add_broadcast0 (13 ms) [ RUN ] VulkanAPITest.add_broadcast1 [ OK ] VulkanAPITest.add_broadcast1 (9 ms) [ RUN ] VulkanAPITest.add_broadcast2 [ OK ] VulkanAPITest.add_broadcast2 (9 ms) [ RUN ] VulkanAPITest.add_ [ OK ] VulkanAPITest.add_ (60 ms) [ RUN ] VulkanAPITest.add_broadcast0_ [ OK ] VulkanAPITest.add_broadcast0_ (10 ms) [ RUN ] VulkanAPITest.add_broadcast1_ [ OK ] VulkanAPITest.add_broadcast1_ (1 ms) [ RUN ] VulkanAPITest.add_scalar [ OK ] VulkanAPITest.add_scalar (24 ms) [ RUN ] VulkanAPITest.add_scalar_ [ OK ] VulkanAPITest.add_scalar_ (8 ms) [ RUN ] VulkanAPITest.addmm [ OK ] VulkanAPITest.addmm (22 ms) [ RUN ] VulkanAPITest.addmm_expand [ OK ] VulkanAPITest.addmm_expand (12 ms) [ RUN ] VulkanAPITest.avg_pool2d [ OK ] VulkanAPITest.avg_pool2d (9 ms) [ RUN ] VulkanAPITest.clamp [ OK ] VulkanAPITest.clamp (92 ms) [ RUN ] VulkanAPITest.clamp_ [ OK ] VulkanAPITest.clamp_ (60 ms) [ RUN ] VulkanAPITest.conv2d [ OK ] VulkanAPITest.conv2d (15 ms) [ RUN ] VulkanAPITest.conv2d_dw [ OK ] VulkanAPITest.conv2d_dw (15 ms) [ RUN ] VulkanAPITest.conv2d_pw [ OK ] VulkanAPITest.conv2d_pw (34 ms) [ RUN ] VulkanAPITest.conv2d_winograd [ OK ] VulkanAPITest.conv2d_winograd (10 ms) [ RUN ] VulkanAPITest.copy [ OK ] VulkanAPITest.copy (1 ms) [ RUN ] VulkanAPITest.div [ OK ] VulkanAPITest.div (32 ms) [ RUN ] VulkanAPITest.div_broadcast0 [ OK ] VulkanAPITest.div_broadcast0 (11 ms) [ RUN ] VulkanAPITest.div_broadcast1 [ OK ] VulkanAPITest.div_broadcast1 (9 ms) [ RUN ] VulkanAPITest.div_broadcast2 [ OK ] VulkanAPITest.div_broadcast2 (7 ms) [ RUN ] VulkanAPITest.div_ [ OK ] VulkanAPITest.div_ (46 ms) [ RUN ] VulkanAPITest.div_broadcast0_ [ OK ] VulkanAPITest.div_broadcast0_ (9 ms) [ RUN ] VulkanAPITest.div_broadcast1_ [ OK ] VulkanAPITest.div_broadcast1_ (2 ms) [ RUN ] VulkanAPITest.div_scalar [ OK ] VulkanAPITest.div_scalar (95 ms) [ RUN ] VulkanAPITest.div_scalar_ [ OK ] VulkanAPITest.div_scalar_ (18 ms) [ RUN ] VulkanAPITest.empty [ OK ] VulkanAPITest.empty (0 ms) [ RUN ] VulkanAPITest.hardsigmoid [ OK ] VulkanAPITest.hardsigmoid (76 ms) [ RUN ] VulkanAPITest.hardsigmoid_ [ OK ] VulkanAPITest.hardsigmoid_ (80 ms) [ RUN ] VulkanAPITest.hardshrink [ OK ] VulkanAPITest.hardshrink (630 ms) [ RUN ] VulkanAPITest.hardshrink_ [ OK ] VulkanAPITest.hardshrink_ (573 ms) [ RUN ] VulkanAPITest.leaky_relu [ OK ] VulkanAPITest.leaky_relu (271 ms) [ RUN ] VulkanAPITest.leaky_relu_ [ OK ] VulkanAPITest.leaky_relu_ (254 ms) [ RUN ] VulkanAPITest.hardswish [ OK ] VulkanAPITest.hardswish (83 ms) [ RUN ] VulkanAPITest.hardswish_ [ OK ] VulkanAPITest.hardswish_ (72 ms) [ RUN ] VulkanAPITest.max_pool2d [ OK ] VulkanAPITest.max_pool2d (16 ms) [ RUN ] VulkanAPITest.mean [ OK ] VulkanAPITest.mean (17 ms) [ RUN ] VulkanAPITest.mean2d [ OK ] VulkanAPITest.mean2d (20 ms) [ RUN ] VulkanAPITest.mm [ OK ] VulkanAPITest.mm (12 ms) [ RUN ] VulkanAPITest.mul [ OK ] VulkanAPITest.mul (28 ms) [ RUN ] VulkanAPITest.mul_broadcast0 [ OK ] VulkanAPITest.mul_broadcast0 (9 ms) [ RUN ] VulkanAPITest.mul_broadcast1 [ OK ] VulkanAPITest.mul_broadcast1 (9 ms) [ RUN ] VulkanAPITest.mul_broadcast2 [ OK ] VulkanAPITest.mul_broadcast2 (9 ms) [ RUN ] VulkanAPITest.mul_ [ OK ] VulkanAPITest.mul_ (43 ms) [ RUN ] VulkanAPITest.mul_broadcast0_ [ OK ] VulkanAPITest.mul_broadcast0_ (8 ms) [ RUN ] VulkanAPITest.mul_broadcast1_ [ OK ] VulkanAPITest.mul_broadcast1_ (1 ms) [ RUN ] VulkanAPITest.mul_scalar [ OK ] VulkanAPITest.mul_scalar (64 ms) [ RUN ] VulkanAPITest.mul_scalar_ [ OK ] VulkanAPITest.mul_scalar_ (17 ms) [ RUN ] VulkanAPITest.reflection_pad2d [ OK ] VulkanAPITest.reflection_pad2d (7 ms) [ RUN ] VulkanAPITest.reshape [ OK ] VulkanAPITest.reshape (73 ms) [ RUN ] VulkanAPITest.reshape_ [ OK ] VulkanAPITest.reshape_ (41 ms) [ RUN ] VulkanAPITest.sigmoid [ OK ] VulkanAPITest.sigmoid (81 ms) [ RUN ] VulkanAPITest.sigmoid_ [ OK ] VulkanAPITest.sigmoid_ (68 ms) [ RUN ] VulkanAPITest.softmax [ OK ] VulkanAPITest.softmax (28 ms) [ RUN ] VulkanAPITest.log_softmax Max Diff allowed: 5.87862e-05 ../aten/src/ATen/test/vulkan_api_test.cpp:1470: Failure Value of: check Actual: false Expected: true [ FAILED ] VulkanAPITest.log_softmax (19 ms) [ RUN ] VulkanAPITest.tanh [ OK ] VulkanAPITest.tanh (63 ms) [ RUN ] VulkanAPITest.tanh_ [ OK ] VulkanAPITest.tanh_ (68 ms) [ RUN ] VulkanAPITest.sub [ OK ] VulkanAPITest.sub (28 ms) [ RUN ] VulkanAPITest.sub_broadcast0 [ OK ] VulkanAPITest.sub_broadcast0 (9 ms) [ RUN ] VulkanAPITest.sub_broadcast1 [ OK ] VulkanAPITest.sub_broadcast1 (9 ms) [ RUN ] VulkanAPITest.sub_broadcast2 [ OK ] VulkanAPITest.sub_broadcast2 (8 ms) [ RUN ] VulkanAPITest.sub_ [ OK ] VulkanAPITest.sub_ (43 ms) [ RUN ] VulkanAPITest.sub_broadcast0_ [ OK ] VulkanAPITest.sub_broadcast0_ (10 ms) [ RUN ] VulkanAPITest.sub_broadcast1_ [ OK ] VulkanAPITest.sub_broadcast1_ (2 ms) [ RUN ] VulkanAPITest.upsample_nearest2d [ OK ] VulkanAPITest.upsample_nearest2d (5 ms) [ RUN ] VulkanAPITest.mobilenetv2 [ OK ] VulkanAPITest.mobilenetv2 (82 ms) [----------] 69 tests from VulkanAPITest (3885 ms total) [----------] Global test environment tear-down [==========] 69 tests from 1 test suite ran. (3885 ms total) [ PASSED ] 68 tests. [ FAILED ] 1 test, listed below: [ FAILED ] VulkanAPITest.log_softmax 1 FAILED TEST ``` Differential Revision: D30925995 fbshipit-source-id: 1b1b7f7f22090064424a5379d2f0559d0da7846a * Generic test parametrization functionality (#60753) Summary: This PR plays around with implementation & usage of a `parametrize` decorator for test parametrization similar to `pytest.mark.parametrize`, based on previous work introducing a `_TestParametrizer` class. It works with the internal `DeviceTest` hierarchy & composes with `dtype`, `skip*`, and other decorators. Basic usage is demonstrated in `test/test_blah.py`: ```python import unittest from itertools import product from torch.testing._internal.common_device_type import ( instantiate_device_type_tests, deviceCountAtLeast, ops) from torch.testing._internal.common_methods_invocations import op_db from torch.testing._internal.common_utils import ( TestCase, run_tests, parametrize, instantiate_parametrized_tests, subtest) class TestBlah(TestCase): parametrize("x", range(5)) def test_default_names(self, x): print('Passed in:', x) # Use default names but add an expected failure. parametrize("x", [subtest(0, decorators=[unittest.expectedFailure]), *range(1, 5)]) def test_default_names_expected_failure(self, x): if x == 0: raise RuntimeError('Boom') print('Passed in:', x) parametrize("bias", [False, True], name_fn=lambda b: 'bias' if b else 'no_bias') def test_custom_names(self, bias): print('Passed in:', bias) parametrize("bias", [subtest(True, name='bias'), subtest(False, name='no_bias')]) def test_custom_names_alternate(self, bias): print('Passed in:', bias) parametrize("x,y", [(1, 2), (1, 3), (1, 4)]) def test_two_things_default_names(self, x, y): print('Passed in:', x, y) parametrize("x", [1, 2, 3]) parametrize("y", [4, 5, 6]) def test_two_things_composition(self, x, y): print('Passed in:', x, y) parametrize("x", [subtest(0, decorators=[unittest.expectedFailure]), *range(1, 3)]) parametrize("y", [4, 5, subtest(6, decorators=[unittest.expectedFailure])]) def test_two_things_composition_expected_failure(self, x, y): if x == 0 or y == 6: raise RuntimeError('Boom') print('Passed in:', x, y) parametrize("x", [1, 2]) parametrize("y", [3, 4]) parametrize("z", [5, 6]) def test_three_things_composition(self, x, y, z): print('Passed in:', x, y, z) parametrize("x", [1, 2], name_fn=str) parametrize("y", [3, 4], name_fn=str) parametrize("z", [5, 6], name_fn=str) def test_three_things_composition_custom_names(self, x, y, z): print('Passed in:', x, y, z) parametrize("x,y", product(range(2), range(3))) def test_two_things_product(self, x, y): print('Passed in:', x, y) parametrize("x,y", [subtest((1, 2), name='double'), subtest((1, 3), name='triple'), subtest((1, 4), name='quadruple')]) def test_two_things_custom_names(self, x, y): print('Passed in:', x, y) parametrize("x,y", [(1, 2), (1, 3), (1, 4)], name_fn=lambda x, y: '{}_{}'.format(x, y)) def test_two_things_custom_names_alternate(self, x, y): print('Passed in:', x, y) class TestDeviceBlah(TestCase): parametrize("x", range(10)) def test_default_names(self, device, x): print('Passed in:', device, x) parametrize("x,y", [(1, 2), (3, 4), (5, 6)]) def test_two_things(self, device, x, y): print('Passed in:', device, x, y) deviceCountAtLeast(1) def test_multiple_devices(self, devices): print('Passed in:', devices) ops(op_db) parametrize("flag", [False, True], lambda f: 'flag_enabled' if f else 'flag_disabled') def test_op_parametrized(self, device, dtype, op, flag): print('Passed in:', device, dtype, op, flag) instantiate_parametrized_tests(TestBlah) instantiate_device_type_tests(TestDeviceBlah, globals()) if __name__ == '__main__': run_tests() ``` Generated tests: ``` TestBlah.test_custom_names_alternate_bias TestBlah.test_custom_names_alternate_no_bias TestBlah.test_custom_names_bias TestBlah.test_custom_names_no_bias TestBlah.test_default_names_expected_failure_x_0 TestBlah.test_default_names_expected_failure_x_1 TestBlah.test_default_names_expected_failure_x_2 TestBlah.test_default_names_expected_failure_x_3 TestBlah.test_default_names_expected_failure_x_4 TestBlah.test_default_names_x_0 TestBlah.test_default_names_x_1 TestBlah.test_default_names_x_2 TestBlah.test_default_names_x_3 TestBlah.test_default_names_x_4 TestBlah.test_three_things_composition_custom_names_1_3_5 TestBlah.test_three_things_composition_custom_names_1_3_6 TestBlah.test_three_things_composition_custom_names_1_4_5 TestBlah.test_three_things_composition_custom_names_1_4_6 TestBlah.test_three_things_composition_custom_names_2_3_5 TestBlah.test_three_things_composition_custom_names_2_3_6 TestBlah.test_three_things_composition_custom_names_2_4_5 TestBlah.test_three_things_composition_custom_names_2_4_6 TestBlah.test_three_things_composition_x_1_y_3_z_5 TestBlah.test_three_things_composition_x_1_y_3_z_6 TestBlah.test_three_things_composition_x_1_y_4_z_5 TestBlah.test_three_things_composition_x_1_y_4_z_6 TestBlah.test_three_things_composition_x_2_y_3_z_5 TestBlah.test_three_things_composition_x_2_y_3_z_6 TestBlah.test_three_things_composition_x_2_y_4_z_5 TestBlah.test_three_things_composition_x_2_y_4_z_6 TestBlah.test_two_things_composition_expected_failure_x_0_y_4 TestBlah.test_two_things_composition_expected_failure_x_0_y_5 TestBlah.test_two_things_composition_expected_failure_x_0_y_6 TestBlah.test_two_things_composition_expected_failure_x_1_y_4 TestBlah.test_two_things_composition_expected_failure_x_1_y_5 TestBlah.test_two_things_composition_expected_failure_x_1_y_6 TestBlah.test_two_things_composition_expected_failure_x_2_y_4 TestBlah.test_two_things_composition_expected_failure_x_2_y_5 TestBlah.test_two_things_composition_expected_failure_x_2_y_6 TestBlah.test_two_things_composition_x_1_y_4 TestBlah.test_two_things_composition_x_1_y_5 TestBlah.test_two_things_composition_x_1_y_6 TestBlah.test_two_things_composition_x_2_y_4 TestBlah.test_two_things_composition_x_2_y_5 TestBlah.test_two_things_composition_x_2_y_6 TestBlah.test_two_things_composition_x_3_y_4 TestBlah.test_two_things_composition_x_3_y_5 TestBlah.test_two_things_composition_x_3_y_6 TestBlah.test_two_things_custom_names_alternate_1_2 TestBlah.test_two_things_custom_names_alternate_1_3 TestBlah.test_two_things_custom_names_alternate_1_4 TestBlah.test_two_things_custom_names_double TestBlah.test_two_things_custom_names_quadruple TestBlah.test_two_things_custom_names_triple TestBlah.test_two_things_default_names_x_1_y_2 TestBlah.test_two_things_default_names_x_1_y_3 TestBlah.test_two_things_default_names_x_1_y_4 TestBlah.test_two_things_product_x_0_y_0 TestBlah.test_two_things_product_x_0_y_1 TestBlah.test_two_things_product_x_0_y_2 TestBlah.test_two_things_product_x_1_y_0 TestBlah.test_two_things_product_x_1_y_1 TestBlah.test_two_things_product_x_1_y_2 TestDeviceBlahCPU.test_default_names_x_0_cpu TestDeviceBlahCPU.test_default_names_x_1_cpu TestDeviceBlahCPU.test_default_names_x_2_cpu TestDeviceBlahCPU.test_default_names_x_3_cpu TestDeviceBlahCPU.test_default_names_x_4_cpu TestDeviceBlahCPU.test_default_names_x_5_cpu TestDeviceBlahCPU.test_default_names_x_6_cpu TestDeviceBlahCPU.test_default_names_x_7_cpu TestDeviceBlahCPU.test_default_names_x_8_cpu TestDeviceBlahCPU.test_default_names_x_9_cpu TestDeviceBlahCPU.test_multiple_devices_cpu TestDeviceBlahCPU.test_op_parametrized_<opname>_<variant>_cpu_uint8_flag_enabled_cpu TestDeviceBlahCPU.test_two_things_x_1_y_2_cpu TestDeviceBlahCPU.test_two_things_x_3_y_4_cpu TestDeviceBlahCPU.test_two_things_x_5_y_6_cpu TestDeviceBlahMETA.test_default_names_x_0_meta TestDeviceBlahMETA.test_default_names_x_1_meta TestDeviceBlahMETA.test_default_names_x_2_meta TestDeviceBlahMETA.test_default_names_x_3_meta TestDeviceBlahMETA.test_default_names_x_4_meta TestDeviceBlahMETA.test_default_names_x_5_meta TestDeviceBlahMETA.test_default_names_x_6_meta TestDeviceBlahMETA.test_default_names_x_7_meta TestDeviceBlahMETA.test_default_names_x_8_meta TestDeviceBlahMETA.test_default_names_x_9_meta TestDeviceBlahMETA.test_multiple_devices_meta TestDeviceBlahMETA.test_op_parametrized_<opname>_<variant>_meta_uint8_flag_enabled_meta TestDeviceBlahMETA.test_two_things_x_1_y_2_meta TestDeviceBlahMETA.test_two_things_x_3_y_4_meta TestDeviceBlahMETA.test_two_things_x_5_y_6_meta ``` Caveats: * `parametrize` decorators cannot be "stacked" yet; each one overwrites the previous. This will change to either: * Allow stacking of multiple decorators * Error out with a nice error message if multiple decorators are specified The PR introduces `instantiate_parametrized_tests()` in addition to `instantiate_device_type_tests()`. The former should be used for non-device-specific tests, and the latter should be used for device-specific tests, as usual. Both of these support the `parametrize` decorator. Only the latter supports the `ops` decorator (no change here- this was already the case). Pull Request resolved: https://github.com/pytorch/pytorch/pull/60753 Reviewed By: saketh-are Differential Revision: D30606615 Pulled By: jbschlosser fbshipit-source-id: a34f36d643f68a6e221f419d9bb3e1ae1d84dd65 * [dnnlowp] reduce num of test cases to avoid time out (#64935) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64935 As title Test Plan: CI Reviewed By: dskhudia Differential Revision: D30889157 fbshipit-source-id: 316c808806b084bd2e44c56e1cdb61adf2369a9d * add `OpInfo` for `torch.nn.functional.dropout` (#62315) Summary: Addresses facebookresearch/functorch#78. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62315 Reviewed By: mruberry Differential Revision: D30932765 Pulled By: zou3519 fbshipit-source-id: 481c67b59a966b4d640973d252b3e392d8db728e * [DataPipe] Make TarArchiveReader and ZipArchiveReader accepts FileSream with attempt to close and additional warning (#64788) Summary: ghstack is not working for the second commit so I'm manually creating this PR for now. Please only look at changes related to the second commit in this PR (there is a PR for the first commit). This PR removes TarArchiveReader's dependency on FileLoader DataPipe, by allowing it to use a IterDataPipe of path names as input rather than a tuple of path name and a stream. It also adds additional tests to ensure that the DataPipe is functioning properly when it is read multiple times or reset half way through reading. The whole stack fixes https://github.com/pytorch/pytorch/issues/64281 - issues related to unclosed buffer stream. Stack: * __->__ https://github.com/pytorch/pytorch/issues/64788 * https://github.com/pytorch/pytorch/issues/64786 cc VitalyFedyunin ejguan Pull Request resolved: https://github.com/pytorch/pytorch/pull/64788 Reviewed By: jbschlosser, ejguan Differential Revision: D30901176 Pulled By: NivekT fbshipit-source-id: 59746a8d0144fc6d3ce0feb2d76445b82e6d414e * When test set_affinity, don't hardcode the CPU ID (#65042) Summary: The setaffinity test always fails when the number of CPUs is smaller than 3. Changed the test to be dynamically based on the number of CPUs of the system. Pull Request resolved: https://github.com/pytorch/pytorch/pull/65042 Reviewed By: jbschlosser Differential Revision: D30960554 Pulled By: ejguan fbshipit-source-id: 55ac12714b4b0964b48c3617b79a7a345d40ebce * Forward fix SkipInfo missing mypy (#65063) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/65063 Reviewed By: malfet Differential Revision: D30961556 Pulled By: janeyx99 fbshipit-source-id: 9618e12ba873fb48fe5c846a48d4560ad521eb3e * [Static Runtime] Check if outputs of a node do not overlap with each other (#63013) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63013 This change enhances the current memory overlapping check to include outputs: the enhancement enforces a constraint that all outputs of a node should NOT overlap with each other since they are supposed to be update by a node at the same time, holding the node's outputs. This check will detect a problem like T97393697 immediately in debug mode. Test Plan: - Added a unittest `ProcessedNode.VerifyMemoryOverlapWithOverlappingOutputs` - Ran `inline_cvr` on ./buck-out/opt/gen/caffe2/caffe2/fb/predictor/ptvsc2_predictor_bench with this diff and confirmed that the checking condition holds true during the run. Reviewed By: hlu1 Differential Revision: D30211705 fbshipit-source-id: 994d8dace2422e2498e504eb61452a55739238c0 * [quant] Removing unnecessary import from torch/quantization/quantize.py (#64910) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64910 This bled through from the original location. Removing it is not just refactoring, but also prevents potential recursive imports. ghstack-source-id: 138112663 Test Plan: `buck test mode/dev //caffe2/test:quantization` Reviewed By: vkuzo Differential Revision: D30882924 fbshipit-source-id: 8652a334a5186c635761ea5e50f978d1f1078c12 * [PyTorch] Avoid extra std::vector in parseSchemaOrName (#64678) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64678 We know we only want one declaration, so let's not create an excess std::vector (and thus a heap allocation) for that. ghstack-source-id: 138036978 Test Plan: CI Reviewed By: dhruvbird, tugsbayasgalan Differential Revision: D30813785 fbshipit-source-id: c67e0100cdef5d894282939fb6d39a57309bc240 * [PyTorch][easy] Add cbegin/cend to SmallVector (#64682) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64682 Looks like it was forked from llvm before cbegin and cend existed. ghstack-source-id: 138036981 Test Plan: CI Reviewed By: dhruvbird Differential Revision: D30814434 fbshipit-source-id: 9740fa8d3df1c90b77298a95ab9f1d0cf8c90320 * [PyTorch] remove string_view::operator[] bounds check (#64670) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64670 Bounds checking is not required for `std::string_view`, and the checking hoses performance for the following performance prototype diff. ghstack-source-id: 138037531 Test Plan: CI Reviewed By: ezyang, bhosmer Differential Revision: D30747515 fbshipit-source-id: 1f4374415a82dfdccce76ea2c6885c13cb93d369 * Port `all` and `any` full reductions to structured kernels. (#64642) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64642 Tracking issue: #55070 This PR creates out overloads for both `all` and `any` kernels (full reduction overload), and ports them to structured kernels. Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D30867354 Pulled By: ezyang fbshipit-source-id: 46bccaf6c94a09ed77cc6c724d1183c82f801751 * [ROCm] Update CI images for ROCm 4.3.1 (#64610) Summary: Signed-off-by: Kyle Chen <kylechen@amd.com> reference: https://github.com/pytorch/pytorch/issues/58017 jithunnair-amd jeffdaily arindamroy-eng cc jeffdaily sunway513 jithunnair-amd ROCmSupport Pull Request resolved: https://github.com/pytorch/pytorch/pull/64610 Reviewed By: seemethere Differential Revision: D30964582 Pulled By: malfet fbshipit-source-id: a8335d3d32d7f1557d3cf6cb055ad0f9c49ef7aa * Starter Task 1 (#64927) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64927 Mypy error corrections Test Plan: Corrected mypy errors to make code less prone to bugs by modifying types or adding lines that avoid special undesired cases e.g. asserting a variable to not None. Reviewed By: wushirong Differential Revision: D30901654 fbshipit-source-id: daae8692603b8b38203a98f673c455749c2fb855 * [CircleCI] Disable pytorch_linux_xenial_cuda10_2 test jobs (#65071) Summary: As all of them has been migrated to GHA: - pytorch_linux_pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_distributed_test -> "linux-xenial-cuda11.3-py3.6-gcc7 / test (distributed, 1, 1, linux.8xlarge.nvidia.gpu)" - pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_test1 -> "linux-xenial-cuda10.2-py3.6-gcc7 / test (default, 1, 2, linux.8xlarge.nvidia.gpu)" - pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_test2 -> "linux-xenial-cuda10.2-py3.6-gcc7 / test (default, 2, 2, linux.8xlarge.nvidia.gpu)" - pytorch_linux_xenial_cuda10_2_cudnn7_py3_multigpu_test -> "linux-xenial-cuda10.2-py3.6-gcc7 / test (multigpu, 1, 1, linux.16xlarge.nvidia.gpu)" - pytorch_linux_xenial_cuda10_2_cudnn7_py3_nogpu_NO_AVX2_test -> "linux-xenial-cuda10.2-py3.6-gcc7 / test (nogpu_NO_AVX2, 1, 1, linux.2xlarge)" - pytorch_linux_xenial_cuda10_2_cudnn7_py3_nogpu_NO_AVX_test -> "linux-xenial-cuda10.2-py3.6-gcc7 / test (nogpu_NO_AVX, 1, 1, linux.2xlarge)" - pytorch_linux_xenial_cuda10_2_cudnn7_py3_slow_test -> "linux-xenial-cuda10.2-py3.6-gcc7 / test (slow, 1, 1, linux.8xlarge.nvidia.gpu)" "pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_build" is still a holdout due to slow gradchecks Pull Request resolved: https://github.com/pytorch/pytorch/pull/65071 Reviewed By: driazati, seemethere, janeyx99 Differential Revision: D30963413 Pulled By: malfet fbshipit-source-id: d9a5188ce7eb2f60547b91b854a5db83af2b10e7 * To add state_dict and load_state_dict to SequentialLR (#65035) Summary: To add state_dict() and load_state_dict() methods to SequentialLR Pull Request resolved: https://github.com/pytorch/pytorch/pull/65035 Reviewed By: prabhat00155, nateanl Differential Revision: D30958204 Pulled By: datumbox fbshipit-source-id: 65114e1b07146526ae2680233f5cd42b2534d67a * Dispatch.h: Avoid including ivalue (#64165) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64165 Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D30728587 Pulled By: ezyang fbshipit-source-id: d0d2e97491d9d5e2d2fc2d6e51420a4467c1bba4 * Remove `run_functional_checks` from `test_autograd` and create necessary OpInfos (#64993) Summary: OpInfo tracker: https://github.com/pytorch/pytorch/issues/54261 - Eliminate duplicated testing logic in test_autograd - Moved tests that rely on this testing logic to use OpInfos - `cat` already has OpInfo (no action needed) - Created OpInfo for `block_diag` and `broadcast_tensors` Running into some FX errors. Added op to skip-list and created an issue here: https://github.com/pytorch/pytorch/issues/64997 Both `block_diag` and `broadcast_tensors` are variadic, so skipping `test_variant_consistency_jit` (from comments on other OpInfos, it looks like JIT does not support variadic tensors) Pull Request resolved: https://github.com/pytorch/pytorch/pull/64993 Reviewed By: jbschlosser Differential Revision: D30961736 Pulled By: soulitzer fbshipit-source-id: e169305384a683acae1178c4e12e9e214a67226a * (torch.distributed.elastic) properly format traceback on error (#65041) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65041 Fixes a bug introduced in https://github.com/pytorch/pytorch/pull/64036 where the traceback of the error handler is printed out rather than the traceback of the actual exception. Fixes https://github.com/pytorch/pytorch/issues/60910 Closes https://github.com/pytorch/pytorch/issues/60910 BEFORE (note that the `py_callstack` is NOT the traceback of the RuntimeError): ``` ************************************************************************************************************************************************************************************************************************************************** run_script_path FAILED ================================================================================================================================================================================================================================================== Root Cause: [0]: time: 2021-09-14_22:01:06 rank: 0 (local_rank: 0) exitcode: 1 (pid: 1092727) error_file: /tmp/torchelastic_aeyvjbpe/none_8zuih7tj/attempt_0/0/error.json msg: { "message": "RuntimeError: rasing error since --throw was specified", "extraInfo": { "py_callstack": [ " File \"<string>\", line 1, in <module>\n", " File \"/usr/local/fbcode/platform009/lib/python3.8/multiprocessing/spawn.py\", line 116, in spawn_main\n exitcode = _main(fd, parent_sentinel)\n", " File \"/usr/local/fbcode/platform009/lib/python3.8/multiprocessing/spawn.py\", line 129, in _main\n return self._bootstrap(parent_sentinel)\n", " File \"/usr/local/fbcode/platform009/lib/python3.8/multiprocessing/process.py\", line 315, in _bootstrap\n self.run()\n", " File \"/usr/local/fbcode/platform009/lib/python3.8/multiprocessing/process.py\", line 108, in run\n self._target(*self._args, **self._kwargs)\n", " File \"/data/users/kiuk/fbsource/fbcode/buck-out/dev/gen/caffe2/run#link-tree/torch/multiprocessing/spawn.py\", line 59, in _wrap\n fn(i, *args)\n", " File \"/data/users/kiuk/fbsource/fbcode/buck-out/dev/gen/caffe2/run#link-tree/torch/distributed/elastic/multiprocessing/api.py\", line 382, in _wrap\n ret = record(fn)(*args_)\n", " File \"/data/users/kiuk/fbsource/fbcode/buck-out/dev/gen/caffe2/run#link-tree/torch/distributed/elastic/multiprocessing/errors/__init__.py\", line 373, in wrapper\n error_handler.record_exception(e)\n", " File \"/data/users/kiuk/fbsource/fbcode/buck-out/dev/gen/caffe2/run#link-tree/torch/distributed/elastic/multiprocessing/errors/error_handler.py\", line 86, in record_exception\n _write_error(e, self._get_error_file_path())\n", " File \"/data/users/kiuk/fbsource/fbcode/buck-out/dev/gen/caffe2/run#link-tree/torch/distributed/elastic/multiprocessing/errors/error_handler.py\", line 26, in _write_error\n \"py_callstack\": traceback.format_stack(),\n" ], "timestamp": "1631682066" } } ================================================================================================================================================================================================================================================== Other Failures: <NO_OTHER_FAILURES> ************************************************************************************************************************************************************************************************************************************************** ``` AFTER (note the traceback is the traceback of the RuntimeError): ``` ******************************************************************************** run_script_path FAILED ================================================================================ Root Cause: [0]: time: 2021-09-14_21:49:25 rank: 0 (local_rank: 0) exitcode: 1 (pid: 1014681) error_file: /tmp/torchelastic_q0zods2c/none_qwmz5dgj/attempt_0/0/error.json msg: Traceback (most recent call last): File "/data/users/kiuk/fbsource/fbcode/buck-out/dev/gen/caffe2/run#link-tree/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 361, in wrapper return f(*args, **kwargs) File "/data/users/kiuk/fbsource/fbcode/buck-out/dev/gen/caffe2/run#link-tree/torch/distributed/run.py", line 671, in run_script_path runpy.run_path(sys.argv[0], run_name="__main__") File "/usr/local/fbcode/platform009/lib/python3.8/runpy.py", line 265, in run_path return _run_module_code(code, init_globals, run_name, File "/usr/local/fbcode/platform009/lib/python3.8/runpy.py", line 97, in _run_module_code _run_code(code, mod_globals, init_globals, File "/usr/local/fbcode/platform009/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/home/kiuk/tmp/test.py", line 55, in <module> main() File "/data/users/kiuk/fbsource/fbcode/buck-out/dev/gen/caffe2/run#link-tree/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 361, in wrapper return f(*args, **kwargs) File "/home/kiuk/tmp/test.py", line 25, in main raise RuntimeError("rasing error since --throw was specified") RuntimeError: rasing error since --throw was specified ================================================================================ Other Failures: <NO_OTHER_FAILURES> ******************************************************************************** ``` Test Plan: (see summary for before and after) `test.py` contents: ``` import argparse import os import sys import torch import torch.distributed as dist import torch.nn.functional as F from torch.distributed.elastic.multiprocessing.errors import record def parse_args(argv): parser = argparse.ArgumentParser(description="test script") parser.add_argument("--init_method", type=str, default="env://") parser.add_argument("--backend", type=str, default="gloo") parser.add_argument("--throw", action="store_true", default=False) parser.add_argument("--exit", action="store_true", default=False) return parser.parse_args() record def main(): args = parse_args(sys.argv[1:]) if args.throw: raise RuntimeError("rasing error since --throw was specified") if args.exit: sys.exit(1) init_method=args.init_method backend=args.backend world_size = int(os.environ["WORLD_SIZE"]) rank = int(os.environ["RANK"]) print(f"initializing `{backend}` process group with rank={rank}, world_size={world_size} at {init_method}") dist.init_process_group( backend=backend, init_method=init_method, world_size=world_size, rank=rank) print(f"successfully initialized process group with rank={dist.get_rank()}, world_size={dist.get_world_size()}") t = F.one_hot(torch.tensor(rank), num_classes=world_size) dist.all_reduce(t) derived_world_size = torch.sum(t).item() if derived_world_size != world_size: raise RuntimeError(f"derived world size: {derived_world_size} != actual world size: {world_size}") else: print(f"sucessfully derived world size: {derived_world_size} (expected: {world_size}). Exiting") if __name__ == "__main__": main() ``` run it as: ``` $ python -m torch.distributed.run --nproc_per_node 2 test.py --throw ``` Reviewed By: cbalioglu Differential Revision: D30953731 fbshipit-source-id: bbea04c59c2aec58969cf44d8e3723d5f8abe8a8 * [Static Runtime] Move MemoryPlanner out into memory_planner.cpp (#65011) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65011 This change moves `MemoryPlanner` out of impl.cpp into memory_planner.cpp. `MemoryPlanner` performs an independent sub-task of static analysis of a graph, and creating memory planning, and allocating/deallocating managed Tensors. This change will reduce merge conflicts as I work on MemoryPlanner more actively for output Tensor support. Test Plan: N/A Reviewed By: mikeiovine Differential Revision: D30883290 fbshipit-source-id: a37570f8d9430224a6987d2190bcf81cf875043d * [ONNX] Enhance shape (two changes merged) (#64585) Summary: Enhanced shape inference by introducing typeReliableMap. [ONNX] exporter changes for torch hub models (https://github.com/pytorch/pytorch/issues/62856) Pull Request resolved: https://github.com/pytorch/pytorch/pull/64585 Reviewed By: ezyang Differential Revision: D30870418 Pulled By: msaroufim fbshipit-source-id: 87a294799cb87d649d1d13b6114a5cfbac9be15c Co-authored-by: jiafatom <jiafa@microsoft.com> * To add state dict and load_dict for Chained Scheduler (#65034) Summary: Adding state_dict() and load_state_dict() methods for Chained Scheduler Pull Request resolved: https://github.com/pytorch/pytorch/pull/65034 Reviewed By: prabhat00155, nateanl Differential Revision: D30958207 Pulled By: datumbox fbshipit-source-id: 1a587a330d34e0548e891a39f8fb5a3d251b71fa * Add retries to ECR login step (#65013) Summary: Switch retry mode from `legacy` to `standard` (https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-retries.html#cli-usage-retries-configure) and up the number of retries. Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/65013 Reviewed By: zhouzhuojie, mruberry Differential Revision: D30943292 Pulled By: driazati fbshipit-source-id: 0a21e9b4eacbb77e6aca22f9256d94cd591b23cd * [quant][refactor] Change the structure of the ao migration tests (#64912) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64912 The test naming was confusing and ambiguous. The file was changed to reflect the framework that is being migrated ("quantization" instead of "quantize"). Also, the common testing class was extracted out ghstack-source-id: 138157450 Test Plan: `buck test mode/dev //caffe2/test:quantization -- TestAOMigrationQuantization` Reviewed By: vkuzo Differential Revision: D30898214 fbshipit-source-id: 017f95995271d35bcdf6ff6a1b3974b837543e84 * Add Maxpool to shape analysis / Opinfo (#63530) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63530 how to review: pretty much just check that the inputs generated are a good representation of the op semantics, that should be sufficient for correctness, and then you can also double check the op size semantics by going to https://codebrowser.bddppq.com/pytorch/pytorch/ typing in native::{op_name} and looking at the op implementation as a bonus if you want Test Plan: Imported from OSS Reviewed By: driazati Differential Revision: D30738147 Pulled By: eellison fbshipit-source-id: cf52339e572ee04e0d6167fd95d8a82d58ea7706 * Max Pool with indices (#64121) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64121 Add support for aten operators which return multiple outputs Test Plan: Imported from OSS Reviewed By: driazati Differential Revision: D30738142 Pulled By: eellison fbshipit-source-id: 0d7e51187bd5e3e9b43f0fdb5178366a97aec943 * Add embedding shape analysis (#64323) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64323 Test Plan: Imported from OSS Reviewed By: driazati Differential Revision: D30738145 Pulled By: eellison fbshipit-source-id: be12408330d671bc65cf645aa2c20fafd954e6a9 * nvfuser update (#63745) Summary: Syncing nvfuser code base from devel branch, Listing a few of our development since last sync: - Extends support to normalization and reduction kernels. - Multiple kernel launch for single `CudaFusionGroup`. Hierarchical caching system has been updated to cache graph segmentation. - profile_ivalue is enabled to convert dynamic scalar into compile time constants, which are required by the codegen. (e.g. reduction axes). To keep this PR simple and relatively review-free. We stripped most external changes and submitted them as separate PRs, so this gigantic PR is easier to handle. internal updates are files located in: 1. updates in nvfuser codegen `torch/csrc/jit/coddgen/cuda` 2. added nvfuser specific benchmarks `benchmarks/cpp/nvfuser` 3. nvfuser jit cpp tests `test/cpp/jit/test_gpu.cpp` `test/cpp/jit/test_gpu_shift.cpp` `test/cpp/jit/test_gpu_validator.h` updates affecting integration: 1. profile_ivalue enabled for nvfuser. related changes are in `torch/csrc/jit/runtime/*`, 2. exposed a few more symbols `aten/src/ATen/core/*` used by codegen Pull Request resolved: https://github.com/pytorch/pytorch/pull/63745 Reviewed By: saketh-are Differential Revision: D30752939 Pulled By: malfet fbshipit-source-id: ce122e80f01bcd3865f5bd3c4dfde660665fd84c * Use RDS for build size tracking (#64303) Summary: This adds 2 utilities: `register_rds_table` and `rds_write`. `register_rds_table` needs to be called once with the schema for the data that `rds_write` will write. These go to a lambda called `rds-proxy`, which will write to/read from the DB as necessary. This data can then be arbitrarily queried via `rds-proxy` (for use in CI) or on metrics.pytorch.org (for analysis). It also hooks these up for build size tracking (which previously was not working on GHA) Pull Request resolved: https://github.com/pytorch/pytorch/pull/64303 Reviewed By: mruberry Differential Revision: D30941182 Pulled By: driazati fbshipit-source-id: 12c5575ddd29902477464fc989ad76a052306b9b * [Caffe2] Don't pass vector by value in SqueezeOp (#64400) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64400 There appears to be no need to copy this vector. ghstack-source-id: 138033020 Test Plan: CI Reviewed By: smacke Differential Revision: D30711014 fbshipit-source-id: b9fcf3d496a663b8478aa22d52b2c41f8f85e90f * [Caffe2][easy] Avoid spurious vector copy in TransposeOp (#64403) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64403 No need to copy to the heap here. ghstack-source-id: 138033019 Test Plan: CI Reviewed By: smacke Differential Revision: D30712506 fbshipit-source-id: 5f4131b2569ebb1f5092262aaddb17215dea88f1 * [quant] Removing hardcoded "torch.quantization.observer" for migration (#64981) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64981 this would have cause errors when observer.py was moved to ao. see: D30391189 ghstack-source-id: 138118430 Test Plan: buck test mode/opt //caffe2/test:quantization -- --exact 'caffe2/test:quantization - test_dynamic_quant_multi_uses (quantization.jit.test_quantize_jit.TestQuantizeDynamicJitPasses)' buck test mode/opt //caffe2/test:quantization -- --exact 'caffe2/test:quantization - test_save_load_state_dict_script (quantization.core.test_workflow_module.TestObserver)' Reviewed By: supriyar Differential Revision: D30432008 fbshipit-source-id: 754727a89c78f6ceada6f8ff92c304f3953f38fc * Revert D30883290: [Static Runtime] Move MemoryPlanner out into memory_planner.cpp Test Plan: revert-hammer Differential Revision: D30883290 (https://github.com/pytorch/pytorch/commit/0e11454d19e106ba6d5819c1147ca540cbce2943) Original commit changeset: a37570f8d943 fbshipit-source-id: 65c57a2b0d2e3c7006765195dd519e8cf2472f72 * Replace windows 10.2 smoke tests on PRs to be 11.3 (#65090) Summary: As we default to linux CUDA 11.3 on PRs, we should do the same thing with Windows (instead of having 10.2 be the default). This means that 10.2 will now be master only, and 11.3 windows smoke tests will run on every PR. This also copies over the "run smoke tests only" config--removing that will be in a separate PR once there's more certain decision making. Pull Request resolved: https://github.com/pytorch/pytorch/pull/65090 Reviewed By: seemethere Differential Revision: D30968382 Pulled By: janeyx99 fbshipit-source-id: c73f9a2cc800b678909365c4d80627d29fc09f94 * CI: Upgrade windows 10.1 jobs to 10.2 (#65080) Summary: This is first 2 steps in the following task: 1. Upgrade 10.1 to 10.2 2. Migrate force_on_cpu job to GHA Pull Request resolved: https://github.com/pytorch/pytorch/pull/65080 Test Plan: https://github.com/pytorch/pytorch/pull/65086 Reviewed By: seemethere Differential Revision: D30973655 Pulled By: janeyx99 fbshipit-source-id: 67ab69ea99ff9e0336400a7173efef6d7daac07c * ci: Disable jit legacy on circleci, enable on gha (#65106) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65106 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> cc ezyang seemethere malfet lg20987 pytorch/pytorch-dev-infra Test Plan: Imported from OSS Reviewed By: malfet, janeyx99 Differential Revision: D30976186 Pulled By: seemethere fbshipit-source-id: 8958f821eab9aa284496c57915894ed70f6b2fff * .github: Enable only specific workflows for canary (#65099) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65099 Utilizes ciflow to enable only specific workflows for pytorch/pytorch-canary to reduce noise on that specific repository Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D30973691 Pulled By: seemethere fbshipit-source-id: 371765535b42a00bd72c2551c4faebf733d759f0 * [TensorExpr] Add a method for sanitizing Var and Buf names in Stmt. (#65010) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65010 This pass ensures all names are legal and not-duplicated. Fixes #52727. Test Plan: Imported from OSS Reviewed By: bertmaher, navahgar Differential Revision: D30939717 Pulled By: ZolotukhinM fbshipit-source-id: 7dbe7f937de41f22ad49137a5e067d698443ed63 * [quant] AO migration of the `fuse_modules.py` (phase 1) (#64913) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64913 AO Team is migrating the existing torch.quantization into torch.ao.quantization. We are doing it one file at a time to make sure that the internal callsites are updated properly. This migrates the fuse_module.py from torch.quantization to torch.ao.quantization. At this point both locations will be supported. Eventually the torch.quantization will be deprecated. Test Plan: `buck test mode/dev //caffe2/test:quantization` Reviewed By: vkuzo Differential Revision: D30882819 fbshipit-source-id: 1926ad6aa49136aceb5b625dcef4bfde3a2860d4 * [quant] AO migration of the `quant_types.py` (phase 1) (#64916) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64916 AO Team is migrating the existing torch.quantization into torch.ao.quantization. We are doing it one file at a time to make sure that the internal callsites are updated properly. This migrates the quant_type.py from torch.quantization to torch.ao.quantization. At this point both locations will be supported. Eventually the torch.quantization will be deprecated. Test Plan: `buck test mode/dev //caffe2/test:quantization -- TestAOMigrationQuantization` Reviewed By: vkuzo Differential Revision: D30898422 fbshipit-source-id: 3e6126b49f0565a4136d6928cea9eb25368927ff * Revert D30752939: [pytorch][PR] nvfuser update Test Plan: revert-hammer Differential Revision: D30752939 (https://github.com/pytorch/pytorch/commit/cfaecaf40bd6cabd3f4e0ef0d8c7252655349b61) Original commit changeset: ce122e80f01b fbshipit-source-id: 57685df8f9946032a06eff1de8a3d1498500d2d2 * .github: GHA add retry for docker run in chown workspace step (#65104) Summary: This should help prevent further errors in GHA workflows during the Chown Workspace step such as https://github.com/pytorch/pytorch/runs/3614067053 I did not add retries to other steps with docker run Pull Request resolved: https://github.com/pytorch/pytorch/pull/65104 Reviewed By: seemethere Differential Revision: D30976330 Pulled By: janeyx99 fbshipit-source-id: e403008548aa01c9a0a4ccebe56df0e889dd045c * .circleci/.jenkins: Remove 9.2 references in CI (#65024) Summary: Removes 9.2 references in CI scripts and configs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/65024 Reviewed By: driazati Differential Revision: D30945948 Pulled By: janeyx99 fbshipit-source-id: 77890a00520c61500a934a90a74e3fcca84c09b5 * [quant] AO migration of the `_correct_bias.py`, `_equalize.py`, and `_learnable_fake_quantize.py` (#64917) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64917 AO Team is migrating the existing torch.quantization into torch.ao.quantization. We are doing it one file at a time to make sure that the internal callsites are updated properly. This migrates from torch.quantization to torch.ao.quantization the following files: - `_correct_bias.py` - `_equalize.py` - `_learnable_fake_quantize.py` **Note:** These file are migrated completely without any warning. The old location is thus silently deprecated. Test Plan: `buck test mode/dev //caffe2/test:quantization -- TestBiasCorrection` Reviewed By: vkuzo Differential Revision: D30898565 fbshipit-source-id: 1d39be2539dd1adfcb42e16bdcc0daf5c8316bbd * Add NNC AOT Compiler executable (#63994) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63994 Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D30582149 Pulled By: priyaramani fbshipit-source-id: 3bbf085428824c3cb308e006c18bb0a57f50fef6 * [acc_ops] Add support for torch variants of squeeze and mul (#65037) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65037 att Te…
This PR plays around with implementation & usage of a
@parametrize
decorator for test parametrization similar to@pytest.mark.parametrize
, based on previous work introducing a_TestParametrizer
class. It works with the internalDeviceTest
hierarchy & composes with@dtype
,@skip*
, and other decorators. Basic usage is demonstrated intest/test_blah.py
:Generated tests:
Caveats:
@parametrize
decorators cannot be "stacked" yet; each one overwrites the previous. This will change to either:The PR introduces
instantiate_parametrized_tests()
in addition toinstantiate_device_type_tests()
. The former should be used for non-device-specific tests, and the latter should be used for device-specific tests, as usual. Both of these support the@parametrize
decorator. Only the latter supports the@ops
decorator (no change here- this was already the case).