Skip to content

DOC Improves shape documentation for *Flatten #60980

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 8 commits into from

Conversation

thomasjpfan
Copy link
Contributor

Fixes #60841

@thomasjpfan thomasjpfan added module: nn Related to torch.nn triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Jun 29, 2021
@thomasjpfan thomasjpfan requested a review from jbschlosser June 29, 2021 19:58
@thomasjpfan thomasjpfan requested a review from albanD as a code owner June 29, 2021 19:58
@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Jun 29, 2021

💊 CI failures summary and remediations

As of commit cad4578 (more details on the Dr. CI page and at hud.pytorch.org/pr/60980):


  • 1/1 failures introduced in this PR

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See CircleCI build pytorch_linux_xenial_py3_clang5_asan_test2 (1/1)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Jul 05 15:34:54 SUMMARY: UndefinedBehaviorSanit.../jenkins/workspace/aten/src/ATen/Utils.cpp:20:3 in
Jul 05 15:34:54     #9 0x55c4718698f2 in PyEval_EvalCode /home/builder/ktietz/cos6/ci_cos6/python_1622833237666/work/Python/ceval.c:731
Jul 05 15:34:54     #10 0x55c4718d1cd5 in run_mod /home/builder/ktietz/cos6/ci_cos6/python_1622833237666/work/Python/pythonrun.c:1025
Jul 05 15:34:54     #11 0x55c4718d3d5d in PyRun_StringFlags /home/builder/ktietz/cos6/ci_cos6/python_1622833237666/work/Python/pythonrun.c:949
Jul 05 15:34:54     #12 0x55c4718d3dbb in PyRun_SimpleStringFlags /home/builder/ktietz/cos6/ci_cos6/python_1622833237666/work/Python/pythonrun.c:445
Jul 05 15:34:54     #13 0x55c4718d4926 in run_command /home/builder/ktietz/cos6/ci_cos6/python_1622833237666/work/Modules/main.c:301
Jul 05 15:34:54     #14 0x55c4718d4926 in Py_Main /home/builder/ktietz/cos6/ci_cos6/python_1622833237666/work/Modules/main.c:749
Jul 05 15:34:54     #15 0x55c47180e196 in main /home/builder/ktietz/cos6/ci_cos6/python_1622833237666/work/Programs/python.c:69
Jul 05 15:34:54     #16 0x7fc3fb0c083f in __libc_start_main /build/glibc-S7Ft5T/glibc-2.23/csu/../csu/libc-start.c:291
Jul 05 15:34:54     #17 0x55c47189e33d in _start (/opt/conda/bin/python3.6+0x1a733d)
Jul 05 15:34:54 
Jul 05 15:34:54 SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /var/lib/jenkins/workspace/aten/src/ATen/Utils.cpp:20:3 in 
Jul 05 15:34:54 + retcode=1
Jul 05 15:34:54 + set -e
Jul 05 15:34:54 + return 1
Jul 05 15:34:54 + [[ pytorch-linux-xenial-py3-clang5-asan-test2 == *-NO_AVX-* ]]
Jul 05 15:34:54 + [[ pytorch-linux-xenial-py3-clang5-asan-test2 == *-NO_AVX2-* ]]
Jul 05 15:34:54 + '[' -n https://github.com/pytorch/pytorch/pull/60980 ']'
Jul 05 15:34:54 + [[ pytorch-linux-xenial-py3-clang5-asan-test2 != *coverage* ]]
Jul 05 15:34:54 ++ mktemp
Jul 05 15:34:54 + DETERMINE_FROM=/tmp/tmp.pjAmaL36eo
Jul 05 15:34:54 + file_diff_from_base /tmp/tmp.pjAmaL36eo

Preview docs built from this PR

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@thomasjpfan thomasjpfan changed the title DOC Improves documentation for Unflatten DOC Improves shape documentation for Unflatten Jun 29, 2021
Copy link
Contributor

@jbschlosser jbschlosser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update! Mind making the analogous change in Flatten for consistency?

- Input: :math:`(N, *dims)`
- Output: :math:`(N, C_{\text{out}}, H_{\text{out}}, W_{\text{out}})`
- Input: :math:`(*)`. Input can be any shape.
- Output: :math:`(*, *unflattened\_size, *)`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any way to indicate that *unflattened_size is expanded at dim? Maybe something like (*, dim, *) before and (*, *unflattened\_size, *) after (where * represents any number of dimensions, including none)- wdyt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(*, dim, *) would be inconsistent with how the size is normally specified in other Shape descriptions.

What do you think of using dim as a subscript: (*, S_{dim}, *), where S_{dim} is the "size at dimension dim".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, I agree with you that using dim as a subscript is more accurate. We could even explicitly point out that S_{dim} = \prod *unflattened\_size must hold - thoughts on that?

@thomasjpfan thomasjpfan changed the title DOC Improves shape documentation for Unflatten DOC Improves shape documentation for *Flatten Jul 1, 2021
Copy link
Contributor

@jbschlosser jbschlosser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good! Not to spend too much time on this, but since a bunch of doc updates are coming down the pipe soon, probably good to get a consistent set of rules in place. What are your opinions on:

  • Defining explicitly what S_{i} represents (e.g. where S_{i} is the size at dimension i) - does this add anything or is it obvious?
  • Using N_{i} instead of S_{i} here - does N_{i} fit with other docs or is it confusing?
  • Defining explicitly what * represents - should we do this in the shape section of every doc?
  • Explicitly indicating that the rest of output matches the input shape like the Linear docs do - does this add anything or is it obvious?
  • Maybe there should be a section somewhere in the PyTorch docs defining the notation used in Shape sections? vs. defining everything in-place.

I guess in general there is a spectrum of notational rigor from loose + possibly implicit to tightly defined + explicit, with the end goal of having docs that are readable, understandable, and unambiguous to at least a reasonable extent. I think the strategy up until now has been taking it on a case-by-case basis, and this unfortunately has lead to some inconsistencies throughout the docs

- Input: :math:`(N, *dims)`
- Output: :math:`(N, \prod *dims)` (for the default case).
- Input: :math:`(*, S_{start}, *, S_{end}, *)`
- Output: :math:`(*, \prod_{i=start}^{end} S_{i}, *)` (for the default case).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I don't think "(for the default case)" is needed now since the more precise edit should cover all cases I believe

@thomasjpfan
Copy link
Contributor Author

Looking good! Not to spend too much time on this, but since a bunch of doc updates are coming down the pipe soon, probably good to get a consistent set of rules in place.

I think it's good to discuss now and spend a little time on this. Looks like we are going to end up working on #59566 for the "Shape" sections, while updating for no-batch.

Defining explicitly what S_{i} represents (e.g. where S_{i} is the size at dimension i) - does this add anything or is it obvious?

I prefer explicit standing it.

Using N_{i} instead of S_{i} here - does N_{i} fit with other docs or is it confusing?

N is commonly used for "batch". But as we move forward with "no batch", we would be consistently removing N and replacing it with a *. This means N_{i} can be used. I proposed "S" because of "size" and "tensor.size(1)".

Defining explicitly what * represents - should we do this in the shape section of every doc?

I think we need to do this. Sometimes * can mean "any number of dimensions". But other times it could mean "only none or 1". For example ReflectionPad2d would be (*, C, H_{in}, W_{in}) where * is one dimension or none.

Maybe there should be a section somewhere in the PyTorch docs defining the notation used in Shape sections? vs. defining everything in-place.

Having a section defining the notation is good to have for developers regardless of if we define it in place. I do like having definitions in-place for users.

Overall I think the "Shape" section should be more on the explicit slide.

@thomasjpfan
Copy link
Contributor Author

Explicitly indicating that the rest of output matches the input shape like the Linear docs do - does this add anything or is it obvious?

For the Linear case, I think it is obvious because of how the * lines up. I think we can restrict * to only be used to be at the start and ends?

Something like this:

Screen Shot 2021-07-01 at 12 17 24 PM

@jbschlosser
Copy link
Contributor

For the Linear case, I think it is obvious because of how the * lines up. I think we can restrict * to only be used to be at the start and ends?

Something like this:

Screen Shot 2021-07-01 at 12 17 24 PM

Ah yes, I like this a lot!

Copy link
Contributor

@jbschlosser jbschlosser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flatten update looks good to me! Unflatten is pretty good and I do agree with your thoughts that explicit is better in general.

Only thing I'm wondering is if a U_{i} notation will help readability- wdyt? something like:

Input: (*, S_{dim}, *), where S_{dim} is the size at dimension dim
Output: (*, U_1, ..., U_n, *), where U = unflattened_size and \prod_i^n U_i = S_{dim}

Copy link
Contributor

@jbschlosser jbschlosser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@facebook-github-bot
Copy link
Contributor

@jbschlosser has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Collaborator

@ssnl ssnl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Thanks for fixing. Btw, the formatting will look slightly nicer if you use \text{start} and \text{end}. No worries if you want to merge this as-is, since this is already drastic improvement.

@facebook-github-bot
Copy link
Contributor

@jbschlosser has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@jbschlosser merged this pull request in 5503a4a.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla signed Merged module: nn Related to torch.nn open source triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[doc] nn.Unflatten output size is wrong
5 participants