Skip to content

Conversation

lezcano
Copy link
Collaborator

@lezcano lezcano commented Mar 15, 2021

  • Corrected a few errata in the SVD docs
  • Made the notation more uniform (refer to Vh in linalg.svd, always use double tilts...)
  • Wrote a better explanation about why the gradients of U and V are not well-defined when the input is complex or real but has repeated singular values. The previous one pointed to a somewhat obscure post on gauge theory.

@lezcano lezcano added the module: docs Related to our documentation, both in docs/ and docblocks label Mar 15, 2021
@lezcano lezcano requested a review from IvanYashchuk March 15, 2021 14:27
@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Mar 15, 2021

💊 CI failures summary and remediations

As of commit 37e2f4b (more details on the Dr. CI page):


  • 3/3 failures possibly* introduced in this PR
    • 1/3 non-scanned failure(s)

🕵️ 2 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See CircleCI build pytorch_linux_xenial_py3_6_gcc5_4_build (1/2)

Step: "(Optional) Merge target branch" (full log | diagnosis details | 🔁 rerun)

Automatic merge failed; fix conflicts and then commit the result.
CONFLICT (add/add): Merge conflict in .circleci/docker/build.sh
Auto-merging .circleci/docker/build.sh
CONFLICT (add/add): Merge conflict in .circleci/config.yml
Auto-merging .circleci/config.yml
CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/windows_build_definitions.py
Auto-merging .circleci/cimodel/data/windows_build_definitions.py
CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/simple/docker_definitions.py
Auto-merging .circleci/cimodel/data/simple/docker_definitions.py
CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/dimensions.py
Auto-merging .circleci/cimodel/data/dimensions.py
Automatic merge failed; fix conflicts and then commit the result.


Exited with code exit status 1

See CircleCI build pytorch_xla_linux_bionic_py3_6_clang9_build (2/2)

Step: "(Optional) Merge target branch" (full log | diagnosis details | 🔁 rerun)

Automatic merge failed; fix conflicts and then commit the result.
CONFLICT (add/add): Merge conflict in .circleci/docker/build.sh
Auto-merging .circleci/docker/build.sh
CONFLICT (add/add): Merge conflict in .circleci/config.yml
Auto-merging .circleci/config.yml
CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/windows_build_definitions.py
Auto-merging .circleci/cimodel/data/windows_build_definitions.py
CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/simple/docker_definitions.py
Auto-merging .circleci/cimodel/data/simple/docker_definitions.py
CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/dimensions.py
Auto-merging .circleci/cimodel/data/dimensions.py
Automatic merge failed; fix conflicts and then commit the result.


Exited with code exit status 1


ci.pytorch.org: 1 failed


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

@jbschlosser jbschlosser requested a review from mruberry March 15, 2021 14:43
@jbschlosser jbschlosser added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Mar 15, 2021
@vadimkantorov
Copy link
Contributor

Maybe worth preserving the link to gauge theory blog in some Notes section?

@lezcano
Copy link
Collaborator Author

lezcano commented Mar 15, 2021

I explained the problems in simpler terms in the last note::. In my opinion, the post was too abstract for a person not used to the notation in differential geometry / physics.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For torch.linalg.svd we should clarify what actually happens. When say the gradient is only "well-defined" in certain cases we're not actually telling the reader what torch.svd() or torch.linalg.svd() does. Does it compute the gradient when the gradient is well-defined? What does it compute when the gradient is not well-defined? Does it throw a runtime error?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a great question. The answer is: I do not know.

I have been trying to undestand this problem better, which lead to the discussion in #47599 (comment)
It happens to be a surprisingly tricky problem, and I do not understand it well-enough just yet. I will come back to this once I do.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Appreciating what we don't know is a great start.

Maybe we can unblock this PR by:

  • leaving this issue open for future resolution
  • saying that produced gradients are undefined unless the grad passed to svd has certain properties (this would require some wordsmithing)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does "unstable" mean?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Close to singular. Basically the gradient has a 1. / (s_i - s_j) term, where s_i are the singular values.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we clarify that in the documentation? Or maybe we should consider adding a glossary to the linalg docs?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a better explanation now as its done in the documentation of symeig. I will open an issue to discuss how to make the documentation of torch.linalg more consistent, to discuss both the style and things like this one.

@mruberry
Copy link
Collaborator

I like a lot of what this PR does and think it does make the writing clearer. The mathematical formatting is very jarring, however. Check out: https://11545664-65600975-gh.circle-artifacts.com/0/docs/linalg.html. Maybe we can make the inline mathematical expressions more natural with the surrounding text?

There are a few places where I would like our docs to be clearer, but this PR doesn't need to improve upon them.

@IvanYashchuk
Copy link
Collaborator

Links to rendered documentation from this PR:
torch.linalg.svd
torch.svd

Screenshots of the main text
Before:
image
After:
image
I have concerns about the rendering of equations and code highlighting.

@lezcano
Copy link
Collaborator Author

lezcano commented Mar 15, 2021

About the code rendering, fair enough. I will change it back to double back quotes and unicode. This being said, it would be good to know how to write decent maths that uses variables (e.g. input / outputs from the function) with mathjax. In this case, the formula is simple enough that it can be written in unicode, but more non-trivial formulae may not. For example, the formulae in Conv2d uses text{} to reference to these variables. I do not have a preference, but it would be good to settle for one.

About the code highlighting, @IvanYashchuk raised the point that when you write equations such as (m, n), the code generated (not in this comment but in the docs) has an odd white space in the middle. I believe that this is not a problem coming from PyTorch but from sphinx, so there is not much to do about that but hope that the rendering from sphinx gets better in the future.

I think it is better to unify everything that's a variable / formula to have a double back tilt. I believe this is important as, for example, in this SVD documentation, we sometimes write input to refer to the argument, and sometimes we use it as a noun. This clearly deambiguates both usages. I also prefer it over the single tilt, as the single tilt sometimes goes unnoticed, and it also not consistent with the highligh given by :attr:. What do you think @mruberry @IvanYashchuk ?

@mruberry
Copy link
Collaborator

About the code rendering, fair enough. I will change it back to double back quotes and unicode. This being said, it would be good to know how to write decent maths that uses variables (e.g. input / outputs from the function) with mathjax. In this case, the formula is simple enough that it can be written in unicode, but more non-trivial formulae may not. For example, the formulae in Conv2d uses text{} to reference to these variables. I do not have a preference, but it would be good to settle for one.

It would be nice to have more guidelines for how to write our documentation. cc @brianjo

About the code highlighting, @IvanYashchuk raised the point that when you write equations such as (m, n), the code generated (not in this comment but in the docs) has an odd white space in the middle. I believe that this is not a problem coming from PyTorch but from sphinx, so there is not much to do about that but hope that the rendering from sphinx gets better in the future.

Maybe we can tweak the rendering with an add-on or setting? But without pursuing that I think we should try to make the docs look reasonable given our current system.

I think it is better to unify everything that's a variable / formula to have a double back tilt. I believe this is important as, for example, in this SVD documentation, we sometimes write input to refer to the argument, and sometimes we use it as a noun. This clearly deambiguates both usages. I also prefer it over the single tilt, as the single tilt sometimes goes unnoticed, and it also not consistent with the highligh given by :attr:. What do you think @mruberry @IvanYashchuk ?

:attr:`input` is the best we have, I think. Maybe we should just be more careful about not using the word "input" except to refer to the input argument?

@IvanYashchuk
Copy link
Collaborator

IvanYashchuk commented Mar 16, 2021

My suggestions for torch.linalg documentation formatting:

  • Use of TeX:
    • use Unicode instead of inline TeX math for consistent font and font-size
    • TeX math mode can be used for separate blocks of complex mathematical expressions (for example torch.kron), not for the inline math text
  • Use of code highlighting (``code_here``):
    • for inline text use it only if it's really necessary
    • use it only for real code (for example torch.linalg.tensorsolve) preferably without spaces (or fix breaking of code highlighting in sphinx settings somewhere)
    • for all function arguments use :attr:`arg`
    • use True, False, None instead of True, False, None
    • use A for mathematical variables instead of A
  • Use of technical references:
    • If the resource is not widely known (not a Wikipedia article), then it's better to have a link to resources, that better and in more details explain certain concepts and facts of linear algebra (in this case I'd keep the link about the gauge problem in automatic differentiation)

@lezcano
Copy link
Collaborator Author

lezcano commented Mar 16, 2021

I agree about the "Use of TeX" part.

About the code highlighting, in my opinion, it's better to always go for double back tilts.
Pros:

  • It's uniform. If we use single `A` for variables, :attr:`arg` for arguments then we have a mix of the two. This gets even worse if we want to write a formula involving `A` and :attr:`arg`, where half of the formula has grey background and the other half is in italics. I argue that this mix of styles is much more distracting than evening them all out to using the grey background.

Cons:

  • It has spaces. I think this should be solved by configuring Sphinx. For example, writing things like (m, n) on github does not produce any awkward space, so there's some hope there.

About the technical references, I agree. That being said, I do not think that the link to those formulas on gauges are very useful here. I think that the amount set of people that do not know that the SVD is not unique but do know what a gauge is and would not be scared by the following formula is basically empty

image

I believe that this link should go as a comment in svd_backward as a reference of the implementation followed.

Surely you can explain this lack of uniqueness as a gauge-invariance problem, as the Stiefel manifold is a principal bundle over the Grassmannian and the SVD decomposition is a map onto a product of Grassmannians, but this is overkill. I think is better to explain that what's happening is that the SVD chooses subspaces, not bases of orthonormal vectors, and to represent those subspaces we need to choose a basis, but we can choose any basis. In particular we can choose a basis and rotate it inside this subspace and that will give another basis of the subspace. This is even explained in the wikipedia article of the SVD

https://en.wikipedia.org/wiki/Singular_value_decomposition#Singular_values,_singular_vectors,_and_their_relation_to_the_SVD

In general, the SVD is unique up to arbitrary unitary transformations applied uniformly to the column vectors of both U and V spanning the subspaces of each singular value, and up to arbitrary unitary transformations on vectors of U and V spanning the kernel and cokernel, respectively, of M.

This all boils down to choosing a level of mathematical background that we are assuming on the reader and write catering to that level. When it comes to linear algebra, I think it is safer to stay on the conservative side.

@mruberry
Copy link
Collaborator

I agree about the "Use of TeX" part.

About the code highlighting, in my opinion, it's better to always go for double back tilts.
Pros:

  • It's uniform. If we use single A for variables, :attr:arg for arguments then we have a mix of the two. This gets even worse if we want to write a formula involving A and :attr:arg, where half of the formula has grey background and the other half is in italics. I argue that this mix of styles is much more distracting than evening them all out to using the grey background.

Cons:

  • It has spaces. I think this should be solved by configuring Sphinx. For example, writing things like (m, n) on github does not produce any awkward space, so there's some hope there.

I welcome the debate on how best to format the docs but would like to suggest we move it to its own issue and not block this PR. That leaves the question, then, of how do we format this PR. Would it be OK, @lezcano, if we try to minimize the disruption of weird whitespace for now, follow-up fixing that issue, create a standard, and then possibly revisit this to impose that standard?

About the technical references, I agree. That being said, I do not think that the link to those formulas on gauges are very useful here. I think that the amount set of people that do not know that the SVD is not unique but do know what a gauge is and would not be scared by the following formula is basically empty

image

I believe that this link should go as a comment in svd_backward as a reference of the implementation followed.

Surely you can explain this lack of uniqueness as a gauge-invariance problem, as the Stiefel manifold is a principal bundle over the Grassmannian and the SVD decomposition is a map onto a product of Grassmannians, but this is overkill. I think is better to explain that what's happening is that the SVD chooses subspaces, not bases of orthonormal vectors, and to represent those subspaces we need to choose a basis, but we can choose any basis. In particular we can choose a basis and rotate it inside this subspace and that will give another basis of the subspace. This is even explained in the wikipedia article of the SVD

https://en.wikipedia.org/wiki/Singular_value_decomposition#Singular_values,_singular_vectors,_and_their_relation_to_the_SVD

In general, the SVD is unique up to arbitrary unitary transformations applied uniformly to the column vectors of both U and V spanning the subspaces of each singular value, and up to arbitrary unitary transformations on vectors of U and V spanning the kernel and cokernel, respectively, of M.

This all boils down to choosing a level of mathematical background that we are assuming on the reader and write catering to that level. When it comes to linear algebra, I think it is safer to stay on the conservative side.

I agree with you on the complexity of the "Gauge-based" explanation vs. one using bases, which are familiar to anyone who's taken a course in linear algebra.

@lezcano
Copy link
Collaborator Author

lezcano commented Mar 17, 2021

Sure, let's move this discussion somewhere else.

That being said, I do not know when to use the back tilts. I see these cases:

  • If the variable is an attribute: use :attr:`var`
  • If the variable is not an attribute, such as return variables, or it's a constant like True or False: use single back tilts.
  • If we have a maths formula: use single back tilts
  • If we have some short code:
    • If it just involves non-input variables: use double tilts
    • If it also involves input variables (such as compute_uv = True): what do we do here?

For the last case, should it be:

  • :attr:`compute_uv` `= True` // This would use two different fonts one after the other
  • :attr:`compute_uv` = `True` // This would use three different fonts one after the other. This was the way it was done before.
  • :attr:`compute_uv` ``=True`` // This would use just one font, but the font of True would be different to the usual font given by rule 2. above.

Again, I'm happy to format this PR in whichever way you see fit. I just do not know what the formatting should be.
Edit. cc @mruberry @IvanYashchuk

@lezcano
Copy link
Collaborator Author

lezcano commented Mar 29, 2021

Sorry for that #nolint issue. It should be good now.
I'll create in a second an issue to discuss the docs of torch.linalg in more generality. @mruberry

@lezcano
Copy link
Collaborator Author

lezcano commented Mar 29, 2021

eI have not been able to split the long line. I really hope that flake8 does not complain. If it does, I do not know how to solve it. I have found this SO post, but I have not managed to replicate the behaviour that they show there.

Edit. It looks like flake8 does not complain.

I've also added a warning making explicit the only cases in which the backwards is correct, and I've polished a bit the explanation for why it fails in the other cases.

@lezcano
Copy link
Collaborator Author

lezcano commented Mar 29, 2021

Moving the discussion about the formatting here: #54878

Copy link
Collaborator

@mruberry mruberry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool! Always great to have better docs.

@facebook-github-bot
Copy link
Contributor

@mruberry has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@mruberry
Copy link
Collaborator

What do you think of the test failure, @lezcano?

test_doc_template - TestTorch
test_torch.py
Traceback (most recent call last):
  File "test_torch.py", line 134, in test_doc_template
    doc_strs = f.read()
  File "c:\jenkins\miniconda3\lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 263415: character maps to <undefined>

@lezcano
Copy link
Collaborator Author

lezcano commented Mar 31, 2021

Wow, that's a new one. I'll have a look in a second.

@facebook-github-bot
Copy link
Contributor

@mruberry has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@mruberry merged this pull request in 36c27fd.

@lezcano lezcano deleted the lezcano-docs branch April 1, 2021 17:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla signed Merged module: docs Related to our documentation, both in docs/ and docblocks open source triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants