Skip to content

Conversation

ezyang
Copy link
Contributor

@ezyang ezyang commented Feb 26, 2020

Stack from ghstack:

Then, nullopt denotes catch all, whereas everything else is specific to
a DispatchKey. I can delete the second copy of methods when I do this.
This refactor should be pushed all the way to the frontend but I am doing
it one step at a time.

Signed-off-by: Edward Z. Yang ezyang@fb.com

Differential Revision: D20125163

…al<DispatchKey>

Then, nullopt denotes catch all, whereas everything else is specific to
a DispatchKey.  I can delete the second copy of methods when I do this.
This refactor should be pushed all the way to the frontend but I am doing
it one step at a time.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

[ghstack-poisoned]
ezyang added a commit that referenced this pull request Feb 26, 2020
…al<DispatchKey>

Then, nullopt denotes catch all, whereas everything else is specific to
a DispatchKey.  I can delete the second copy of methods when I do this.
This refactor should be pushed all the way to the frontend but I am doing
it one step at a time.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

ghstack-source-id: 85a6be7
Pull Request resolved: #33817
@ezyang ezyang requested review from smessmer and bhosmer February 26, 2020 17:08
@dr-ci
Copy link

dr-ci bot commented Feb 26, 2020

💊 CircleCI build failures summary and remediations

As of commit 70ec5ef (more details on the Dr. CI page):


None of the build failures appear to be your fault 💚



❄️ 2 tentatively flaky failures

2 failures tentatively classified as flaky but have not launched reruns to confirm:

See CircleCI build pytorch_linux_xenial_cuda10_1_cudnn7_py3_multigpu_test (1/2)

Step: "Test" (full log | pattern match details) ❄️

Mar 03 23:12:38 RuntimeError: Error downloading resource!
Mar 03 23:12:38  
Mar 03 23:12:38 During handling of the above exception, another exception occurred: 
Mar 03 23:12:38  
Mar 03 23:12:38 Traceback (most recent call last): 
Mar 03 23:12:38   File "tools/download_mnist.py", line 87, in <module> 
Mar 03 23:12:38     main() 
Mar 03 23:12:38   File "tools/download_mnist.py", line 80, in main 
Mar 03 23:12:38     download(path, url, options.quiet) 
Mar 03 23:12:38   File "tools/download_mnist.py", line 41, in download 
Mar 03 23:12:38     raise RuntimeError('Error downloading resource!') 
Mar 03 23:12:38 RuntimeError: Error downloading resource! 
Mar 03 23:12:39 + cleanup 
Mar 03 23:12:39 + retcode=1 
Mar 03 23:12:39 + set +x 
Mar 03 23:12:39 =================== sccache compilation log =================== 
Mar 03 23:12:39 =========== If your build fails, please take a look at the log above for possible reasons =========== 
Mar 03 23:12:39 Compile requests                 0 
Mar 03 23:12:39 Compile requests executed        0 
Mar 03 23:12:39 Cache hits                       0 
Mar 03 23:12:39 Cache misses                     0 
Mar 03 23:12:39 Cache timeouts                   0 

See CircleCI build pytorch_linux_xenial_cuda10_1_cudnn7_py3_NO_AVX_NO_AVX2_test (2/2)

Step: "Test" (full log | pattern match details) ❄️

Mar 04 00:23:48 RuntimeError: Error downloading resource!
Mar 04 00:23:48  
Mar 04 00:23:48 During handling of the above exception, another exception occurred: 
Mar 04 00:23:48  
Mar 04 00:23:48 Traceback (most recent call last): 
Mar 04 00:23:48   File "tools/download_mnist.py", line 87, in <module> 
Mar 04 00:23:48     main() 
Mar 04 00:23:48   File "tools/download_mnist.py", line 80, in main 
Mar 04 00:23:48     download(path, url, options.quiet) 
Mar 04 00:23:48   File "tools/download_mnist.py", line 41, in download 
Mar 04 00:23:48     raise RuntimeError('Error downloading resource!') 
Mar 04 00:23:48 RuntimeError: Error downloading resource! 
Mar 04 00:23:48 + cleanup 
Mar 04 00:23:48 + retcode=1 
Mar 04 00:23:48 + set +x 
Mar 04 00:23:48 =================== sccache compilation log =================== 
Mar 04 00:23:48 =========== If your build fails, please take a look at the log above for possible reasons =========== 
Mar 04 00:23:48 Compile requests                19 
Mar 04 00:23:48 Compile requests executed        0 
Mar 04 00:23:48 Cache hits                       0 
Mar 04 00:23:48 Cache misses                     0 
Mar 04 00:23:48 Cache timeouts                   0 

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker.

This comment has been revised 6 times.

Copy link

@bhosmer bhosmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooo nice simplification!

I can't see how pushing the catchall/non-catchall distinction down costs us anything important - can't imagine the fatter keys will ever make a noticeable difference, and it's reversible if we ever need to hoist the distinction back out to the top level for some reason.

One question about a std::move removal inline and definitely worth @smessmer putting the magnifying glass to whatever seething nest of implicit conversions I'm probably glossing over

RegistrationHandleRAII Dispatcher::registerKernel(const OperatorHandle& op, DispatchKey dispatch_key, KernelFunction kernel) {
// note: this doesn't need the mutex to protect the iterator because write operations on the list keep iterators intact.
return op.operatorIterator_->op.registerKernel(std::move(dispatch_key), std::move(kernel));
return op.operatorIterator_->op.registerKernel(dispatch_key, std::move(kernel));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No longer need the move bc now implicitly converting to optional? or never needed the move?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We never needed the move, dispatch keys are small

Copy link
Contributor

@smessmer smessmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure this is a move in the right direction. Shouldn't we start using CompoundOpKey and use that to get rid of catchall kernels entirely instead of merging the catchall kernel API into the regular kernel API?

@ezyang
Copy link
Contributor Author

ezyang commented Feb 26, 2020

Not sure this is a move in the right direction. Shouldn't we start using CompoundOpKey and use that to get rid of catchall kernels entirely instead of merging the catchall kernel API into the regular kernel API?

Two reasons:

  1. The highest priority refactor I have in mind right now is enabling schema-less impl-calls (i.e., this is on the schema merge path). I don't really care about this change per se, but it is very difficult to work on the dispatcher because a lot of codepaths are duplicated. This removes some of the duplication, making further refactors easier to do.
  2. There's a yak in the way of CompoundOp, which is coming up with a plan for kernels that use "catch all" in conjunction with Variable. This is something we should do, but it doesn't seem like it is required for the highest priority refactor (described above). I can't fix everything, so I have to pick what I'm going to kill first. I choose not to kill catchall.

@bhosmer
Copy link

bhosmer commented Feb 26, 2020

Not sure this is a move in the right direction. Shouldn't we start using CompoundOpKey and use that to get rid of catchall kernels entirely instead of merging the catchall kernel API into the regular kernel API?

If IIUC and CompoundOpKey is a kind of DispatchKey - so the new API would be unitary in the same way as it is here, just trading CompoundOpKey for nullopt - then I'm not sure I see why this isn't a move in the right direction...

@smessmer
Copy link
Contributor

Ok I guess I'm fine with this. We'd have to revert the API changes introduced here once we start using CompoundOpKey, but that's not much effort.

@bhosmer
Copy link

bhosmer commented Feb 26, 2020

Ok I guess I'm fine with this. We'd have to revert the API changes introduced here once we start using CompoundOpKey, but that's not much effort.

Just to make sure I'm not missing anything though, we'd only have to revert the change to registerKernal to take an optional, right? We wouldn't have to reintroduce the parallel catchall API. (I.e., that work would amortize in the move to the final state)

@smessmer
Copy link
Contributor

Ok I guess I'm fine with this. We'd have to revert the API changes introduced here once we start using CompoundOpKey, but that's not much effort.

Just to make sure I'm not missing anything though, we'd only have to revert the change to registerKernal to take an optional, right? We wouldn't have to reintroduce the parallel catchall API. (I.e., that work would amortize in the move to the final state)

yes that's correct

…p on optional<DispatchKey>"

Then, nullopt denotes catch all, whereas everything else is specific to
a DispatchKey.  I can delete the second copy of methods when I do this.
This refactor should be pushed all the way to the frontend but I am doing
it one step at a time.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: [D20125163](https://our.internmc.facebook.com/intern/diff/D20125163)

[ghstack-poisoned]
ezyang added a commit that referenced this pull request Feb 26, 2020
…al<DispatchKey>

Then, nullopt denotes catch all, whereas everything else is specific to
a DispatchKey.  I can delete the second copy of methods when I do this.
This refactor should be pushed all the way to the frontend but I am doing
it one step at a time.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

ghstack-source-id: 6d2a7ae
Pull Request resolved: #33817
@ezyang
Copy link
Contributor Author

ezyang commented Feb 26, 2020

I updated this to plumb the change all the way through, turns out it was only a few more lines.

…p on optional<DispatchKey>"

Then, nullopt denotes catch all, whereas everything else is specific to
a DispatchKey.  I can delete the second copy of methods when I do this.
This refactor should be pushed all the way to the frontend but I am doing
it one step at a time.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: [D20125163](https://our.internmc.facebook.com/intern/diff/D20125163)

[ghstack-poisoned]
ttumiel pushed a commit to ttumiel/pytorch that referenced this pull request Mar 4, 2020
…al<DispatchKey> (pytorch#33817)

Summary:
Pull Request resolved: pytorch#33817

Then, nullopt denotes catch all, whereas everything else is specific to
a DispatchKey.  I can delete the second copy of methods when I do this.
This refactor should be pushed all the way to the frontend but I am doing
it one step at a time.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D20125163

Pulled By: ezyang

fbshipit-source-id: 026075a4bab81b0bd88b07f0800f6e6bbeb2166a
@facebook-github-bot
Copy link
Contributor

@ezyang merged this pull request in 22506ae.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants