Skip to content

Conversation

skrah
Copy link
Contributor

@skrah skrah commented May 19, 2019

No description provided.

@pytorchbot pytorchbot added module: build Build system issues module: cpu CPU specific problem (e.g., perf, algorithm) module: cuda Related to torch.cuda, and CUDA support in general module: internals Related to internal abstractions in c10 and ATen module: nn Related to torch.nn module: operators labels May 19, 2019
@skrah skrah added module: porting Issues related to porting TH/THNN legacy to ATen native triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module and removed module: build Build system issues module: cpu CPU specific problem (e.g., perf, algorithm) module: cuda Related to torch.cuda, and CUDA support in general module: internals Related to internal abstractions in c10 and ATen module: operators labels May 19, 2019
@skrah skrah closed this May 20, 2019
@skrah skrah reopened this May 20, 2019
@pytorchbot pytorchbot added module: build Build system issues module: cpu CPU specific problem (e.g., perf, algorithm) module: cuda Related to torch.cuda, and CUDA support in general module: internals Related to internal abstractions in c10 and ATen module: operators labels May 20, 2019
@skrah skrah removed module: build Build system issues module: cpu CPU specific problem (e.g., perf, algorithm) module: cuda Related to torch.cuda, and CUDA support in general module: internals Related to internal abstractions in c10 and ATen module: operators labels May 20, 2019
@pytorchbot pytorchbot added the module: build Build system issues label May 20, 2019
check_dim_size(gradOutput, ndim, ndim-2, outputHeight);
check_dim_size(gradOutput, ndim, ndim-1, outputWidth);

if (cuda) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I... would believe you if you told me this is what the code did before, but this special case is kind of shocking XD

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is shocking, but it's indeed what the old code did. :) The CUDA code here always uses 4 dimensions in the kernel for output, then resizes back after the kernel:

 if(input->dim() == 3)
    THCTensor_(resize3d)(state, output, nInputPlane, nOutputRows, nOutputCols);

The C code uses 3 or 4 dimensions for the kernel.

Perhaps the shape check should be after all resizes and just before the kernel, but the old code checked right at the top of the function.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think that's the correct fix. No need to block this PR on fixing it, would be a nice readability improvement though.

@ezyang
Copy link
Contributor

ezyang commented May 21, 2019

I've deduplicated the shape checks in the original code, which in turn led to renaming the Row/Col convention in the CUDA code to Height/Width for compatibility with the C++ code.

That is much appreciated, thank you!

"padW = ", padW, ", padH = ", padH, ", kW = ", kW, ", kH = ", kH);

if (outputWidth < 1 || outputHeight < 1) {
AT_ERROR("Given input size: (",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: TORCH_CHECK(outputWidth >= 1 && outputHeight >= 1, msg)

IntArrayRef dilation,
bool ceil_mode)
{
// XXX JIT: Pooling.cpp allows stride.empty().
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds like another misannotated function.

Copy link
Contributor Author

@skrah skrah May 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The annotation Optional[BroadcastingList2[int]] looks correct, but there's some explicit handling of stride here:

if stride is None:

I wonder if that could just be:

if stride is None:
    stride = kernel_size

The drawback is that one could crash cpp code by modifying Python code, but that is also the case for the annotation.

(Edit) "crash" meaning triggering an exception here, unless it will be changed to a real assert() in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other special case is here:

x = torch::max_pool2d(conv1->forward(x), {2, 2}).relu();

It ends up calling max_pool2d_with_indices with all default parameters.

I didn't look very hard, but I couldn't find the overload of torch::max_pool2d(conv1->forward(x), {2, 2}) that works with just two parameters in the cpp files.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The drawback is that one could crash cpp code by modifying Python code, but that is also the case for the annotation.

I think, for now, we should consider these annotations as part of the TCB. I do wonder a little if they can't be checked for consistency with the main annotations, but... well... someone would have to figure that out :>

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if that could just be:

Then we'd have to explain why it was permissible for all the other sites to also default things to empty list. Maybe they're all wrong.

@ailzhang, you helped us resolve the annotation last time in #20306. Do you know what's going on with these Optional[BroadcastingList2] things?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @skrah @ezyang , yea this annotation looks fine, but I guess JIT should have instantiate it to int[2] before passing into aten. I'm looking into it, to double check the repro process:

  1. checking out this PR
  2. Run IntegrationTest.MNIST should trigger stride.empty() here right?
    Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ailzhang, I think the stride.empty() case is also triggered by the regular Python tests if that is more convenient.

IntegrationTest.MNIST triggers all of stride.empty() && padding.size()==1 && dilation.size()==1.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And (for the sake of completeness) this PR already contains a workaround, so assert() will need to be added to reproduce it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ezyang I have a commit to fix this(and other similar ones). Once this PR is merged into master, I will rebase and send it out. :D


template <typename dest_t, typename src_t>
static inline dest_t
safe_downcast(src_t v)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ezyang is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@ezyang
Copy link
Contributor

ezyang commented May 22, 2019

You forgot to delete aten/src/THCUNN/SpatialDilatedMaxPooling.cu and it's thus failing internal contbuilds. You didn't allow maintainers to edit your branch so I can't push the fix. Can you please fix this? :)

Copy link
Contributor

@ezyang ezyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete file

@skrah
Copy link
Contributor Author

skrah commented May 22, 2019

You didn't allow maintainers to edit your branch so I can't push the fix. Can you please fix this? :)

Done, I'll leave the "allow" button on next time, it's just a habit of mine from other projects. :)

@pytorchbot pytorchbot added module: build Build system issues module: cpu CPU specific problem (e.g., perf, algorithm) module: cuda Related to torch.cuda, and CUDA support in general module: internals Related to internal abstractions in c10 and ATen module: operators labels May 22, 2019
@skrah skrah requested a review from ezyang May 22, 2019 20:04
Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ezyang is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@skrah
Copy link
Contributor Author

skrah commented May 23, 2019

@pytorchbot retest this please.

zdevito pushed a commit to zdevito/ATen that referenced this pull request May 23, 2019
Summary: Pull Request resolved: pytorch/pytorch#20691

Differential Revision: D15435960

Pulled By: ezyang

fbshipit-source-id: 548b7cc42e52ad2c641ec7d9cf78028d9411d02e
@facebook-github-bot
Copy link
Contributor

@ezyang merged this pull request in ec57d1f.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Merged module: build Build system issues module: cpu CPU specific problem (e.g., perf, algorithm) module: cuda Related to torch.cuda, and CUDA support in general module: internals Related to internal abstractions in c10 and ATen module: nn Related to torch.nn module: porting Issues related to porting TH/THNN legacy to ATen native open source triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants