SVD docs improved #54002

lezcano · 2021-03-15T14:27:16Z

Corrected a few errata in the SVD docs
Made the notation more uniform (refer to Vh in linalg.svd, always use double tilts...)
Wrote a better explanation about why the gradients of U and V are not well-defined when the input is complex or real but has repeated singular values. The previous one pointed to a somewhat obscure post on gauge theory.

facebook-github-bot · 2021-03-15T14:27:24Z

💊 CI failures summary and remediations

As of commit 37e2f4b (more details on the Dr. CI page):

3/3 failures possibly* introduced in this PR
- 1/3 non-scanned failure(s)

🕵️ 2 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

pytorch_linux_xenial_py3_6_gcc5_4_build (1/2)

Step: "(Optional) Merge target branch" (full log | diagnosis details | 🔁 rerun)

Automatic merge failed; fix conflicts and then commit the result.

CONFLICT (add/add): Merge conflict in .circleci/docker/build.sh
Auto-merging .circleci/docker/build.sh
CONFLICT (add/add): Merge conflict in .circleci/config.yml
Auto-merging .circleci/config.yml
CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/windows_build_definitions.py
Auto-merging .circleci/cimodel/data/windows_build_definitions.py
CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/simple/docker_definitions.py
Auto-merging .circleci/cimodel/data/simple/docker_definitions.py
CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/dimensions.py
Auto-merging .circleci/cimodel/data/dimensions.py
Automatic merge failed; fix conflicts and then commit the result.


Exited with code exit status 1

pytorch_xla_linux_bionic_py3_6_clang9_build (2/2)

Step: "(Optional) Merge target branch" (full log | diagnosis details | 🔁 rerun)

Automatic merge failed; fix conflicts and then commit the result.

CONFLICT (add/add): Merge conflict in .circleci/docker/build.sh
Auto-merging .circleci/docker/build.sh
CONFLICT (add/add): Merge conflict in .circleci/config.yml
Auto-merging .circleci/config.yml
CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/windows_build_definitions.py
Auto-merging .circleci/cimodel/data/windows_build_definitions.py
CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/simple/docker_definitions.py
Auto-merging .circleci/cimodel/data/simple/docker_definitions.py
CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/dimensions.py
Auto-merging .circleci/cimodel/data/dimensions.py
Automatic merge failed; fix conflicts and then commit the result.


Exited with code exit status 1

ci.pytorch.org: 1 failed

Failed: pr/pytorch-linux-bionic-rocm4.0.1-py3.6

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

vadimkantorov · 2021-03-15T14:46:56Z

Maybe worth preserving the link to gauge theory blog in some Notes section?

lezcano · 2021-03-15T14:48:50Z

I explained the problems in simpler terms in the last note::. In my opinion, the post was too abstract for a person not used to the notation in differential geometry / physics.

torch/_torch_docs.py

mruberry · 2021-03-15T16:51:58Z

torch/_torch_docs.py

For torch.linalg.svd we should clarify what actually happens. When say the gradient is only "well-defined" in certain cases we're not actually telling the reader what torch.svd() or torch.linalg.svd() does. Does it compute the gradient when the gradient is well-defined? What does it compute when the gradient is not well-defined? Does it throw a runtime error?

That is a great question. The answer is: I do not know.

I have been trying to undestand this problem better, which lead to the discussion in #47599 (comment)
It happens to be a surprisingly tricky problem, and I do not understand it well-enough just yet. I will come back to this once I do.

Appreciating what we don't know is a great start.

Maybe we can unblock this PR by:

leaving this issue open for future resolution

saying that produced gradients are undefined unless the grad passed to svd has certain properties (this would require some wordsmithing)

torch/_torch_docs.py

torch/linalg/__init__.py

mruberry · 2021-03-15T16:57:20Z

torch/linalg/__init__.py

What does "unstable" mean?

Close to singular. Basically the gradient has a 1. / (s_i - s_j) term, where s_i are the singular values.

Can we clarify that in the documentation? Or maybe we should consider adding a glossary to the linalg docs?

I've added a better explanation now as its done in the documentation of symeig. I will open an issue to discuss how to make the documentation of torch.linalg more consistent, to discuss both the style and things like this one.

torch/linalg/__init__.py

mruberry · 2021-03-15T17:01:26Z

I like a lot of what this PR does and think it does make the writing clearer. The mathematical formatting is very jarring, however. Check out: https://11545664-65600975-gh.circle-artifacts.com/0/docs/linalg.html. Maybe we can make the inline mathematical expressions more natural with the surrounding text?

There are a few places where I would like our docs to be clearer, but this PR doesn't need to improve upon them.

IvanYashchuk · 2021-03-15T17:01:37Z

Links to rendered documentation from this PR:
torch.linalg.svd
torch.svd

Screenshots of the main text
Before:

After:

I have concerns about the rendering of equations and code highlighting.

lezcano · 2021-03-15T17:52:43Z

About the code rendering, fair enough. I will change it back to double back quotes and unicode. This being said, it would be good to know how to write decent maths that uses variables (e.g. input / outputs from the function) with mathjax. In this case, the formula is simple enough that it can be written in unicode, but more non-trivial formulae may not. For example, the formulae in Conv2d uses text{} to reference to these variables. I do not have a preference, but it would be good to settle for one.

About the code highlighting, @IvanYashchuk raised the point that when you write equations such as (m, n), the code generated (not in this comment but in the docs) has an odd white space in the middle. I believe that this is not a problem coming from PyTorch but from sphinx, so there is not much to do about that but hope that the rendering from sphinx gets better in the future.

I think it is better to unify everything that's a variable / formula to have a double back tilt. I believe this is important as, for example, in this SVD documentation, we sometimes write input to refer to the argument, and sometimes we use it as a noun. This clearly deambiguates both usages. I also prefer it over the single tilt, as the single tilt sometimes goes unnoticed, and it also not consistent with the highligh given by :attr:. What do you think @mruberry @IvanYashchuk ?

mruberry · 2021-03-15T18:10:21Z

About the code rendering, fair enough. I will change it back to double back quotes and unicode. This being said, it would be good to know how to write decent maths that uses variables (e.g. input / outputs from the function) with mathjax. In this case, the formula is simple enough that it can be written in unicode, but more non-trivial formulae may not. For example, the formulae in Conv2d uses text{} to reference to these variables. I do not have a preference, but it would be good to settle for one.

It would be nice to have more guidelines for how to write our documentation. cc @brianjo

About the code highlighting, @IvanYashchuk raised the point that when you write equations such as (m, n), the code generated (not in this comment but in the docs) has an odd white space in the middle. I believe that this is not a problem coming from PyTorch but from sphinx, so there is not much to do about that but hope that the rendering from sphinx gets better in the future.

Maybe we can tweak the rendering with an add-on or setting? But without pursuing that I think we should try to make the docs look reasonable given our current system.

I think it is better to unify everything that's a variable / formula to have a double back tilt. I believe this is important as, for example, in this SVD documentation, we sometimes write input to refer to the argument, and sometimes we use it as a noun. This clearly deambiguates both usages. I also prefer it over the single tilt, as the single tilt sometimes goes unnoticed, and it also not consistent with the highligh given by :attr:. What do you think @mruberry @IvanYashchuk ?

:attr:`input` is the best we have, I think. Maybe we should just be more careful about not using the word "input" except to refer to the input argument?

IvanYashchuk · 2021-03-16T10:24:14Z

My suggestions for torch.linalg documentation formatting:

Use of TeX:
- use Unicode instead of inline TeX math for consistent font and font-size
- TeX math mode can be used for separate blocks of complex mathematical expressions (for example torch.kron), not for the inline math text
Use of code highlighting (``code_here``):
- for inline text use it only if it's really necessary
- use it only for real code (for example torch.linalg.tensorsolve) preferably without spaces (or fix breaking of code highlighting in sphinx settings somewhere)
- for all function arguments use :attr:`arg`
- use True, False, None instead of True, False, None
- use A for mathematical variables instead of A
Use of technical references:
- If the resource is not widely known (not a Wikipedia article), then it's better to have a link to resources, that better and in more details explain certain concepts and facts of linear algebra (in this case I'd keep the link about the gauge problem in automatic differentiation)

lezcano · 2021-03-16T10:45:17Z

I agree about the "Use of TeX" part.

About the code highlighting, in my opinion, it's better to always go for double back tilts.
Pros:

It's uniform. If we use single `A` for variables, :attr:`arg` for arguments then we have a mix of the two. This gets even worse if we want to write a formula involving `A` and :attr:`arg`, where half of the formula has grey background and the other half is in italics. I argue that this mix of styles is much more distracting than evening them all out to using the grey background.

Cons:

It has spaces. I think this should be solved by configuring Sphinx. For example, writing things like (m, n) on github does not produce any awkward space, so there's some hope there.

About the technical references, I agree. That being said, I do not think that the link to those formulas on gauges are very useful here. I think that the amount set of people that do not know that the SVD is not unique but do know what a gauge is and would not be scared by the following formula is basically empty

I believe that this link should go as a comment in svd_backward as a reference of the implementation followed.

Surely you can explain this lack of uniqueness as a gauge-invariance problem, as the Stiefel manifold is a principal bundle over the Grassmannian and the SVD decomposition is a map onto a product of Grassmannians, but this is overkill. I think is better to explain that what's happening is that the SVD chooses subspaces, not bases of orthonormal vectors, and to represent those subspaces we need to choose a basis, but we can choose any basis. In particular we can choose a basis and rotate it inside this subspace and that will give another basis of the subspace. This is even explained in the wikipedia article of the SVD

https://en.wikipedia.org/wiki/Singular_value_decomposition#Singular_values,_singular_vectors,_and_their_relation_to_the_SVD

In general, the SVD is unique up to arbitrary unitary transformations applied uniformly to the column vectors of both U and V spanning the subspaces of each singular value, and up to arbitrary unitary transformations on vectors of U and V spanning the kernel and cokernel, respectively, of M.

This all boils down to choosing a level of mathematical background that we are assuming on the reader and write catering to that level. When it comes to linear algebra, I think it is safer to stay on the conservative side.

mruberry · 2021-03-17T07:46:22Z

I agree about the "Use of TeX" part.

About the code highlighting, in my opinion, it's better to always go for double back tilts.
Pros:

It's uniform. If we use single A for variables, :attr:arg for arguments then we have a mix of the two. This gets even worse if we want to write a formula involving A and :attr:arg, where half of the formula has grey background and the other half is in italics. I argue that this mix of styles is much more distracting than evening them all out to using the grey background.

Cons:

It has spaces. I think this should be solved by configuring Sphinx. For example, writing things like (m, n) on github does not produce any awkward space, so there's some hope there.

I welcome the debate on how best to format the docs but would like to suggest we move it to its own issue and not block this PR. That leaves the question, then, of how do we format this PR. Would it be OK, @lezcano, if we try to minimize the disruption of weird whitespace for now, follow-up fixing that issue, create a standard, and then possibly revisit this to impose that standard?

About the technical references, I agree. That being said, I do not think that the link to those formulas on gauges are very useful here. I think that the amount set of people that do not know that the SVD is not unique but do know what a gauge is and would not be scared by the following formula is basically empty

I believe that this link should go as a comment in svd_backward as a reference of the implementation followed.

Surely you can explain this lack of uniqueness as a gauge-invariance problem, as the Stiefel manifold is a principal bundle over the Grassmannian and the SVD decomposition is a map onto a product of Grassmannians, but this is overkill. I think is better to explain that what's happening is that the SVD chooses subspaces, not bases of orthonormal vectors, and to represent those subspaces we need to choose a basis, but we can choose any basis. In particular we can choose a basis and rotate it inside this subspace and that will give another basis of the subspace. This is even explained in the wikipedia article of the SVD

https://en.wikipedia.org/wiki/Singular_value_decomposition#Singular_values,_singular_vectors,_and_their_relation_to_the_SVD

In general, the SVD is unique up to arbitrary unitary transformations applied uniformly to the column vectors of both U and V spanning the subspaces of each singular value, and up to arbitrary unitary transformations on vectors of U and V spanning the kernel and cokernel, respectively, of M.

This all boils down to choosing a level of mathematical background that we are assuming on the reader and write catering to that level. When it comes to linear algebra, I think it is safer to stay on the conservative side.

I agree with you on the complexity of the "Gauge-based" explanation vs. one using bases, which are familiar to anyone who's taken a course in linear algebra.

lezcano · 2021-03-17T11:53:42Z

Sure, let's move this discussion somewhere else.

That being said, I do not know when to use the back tilts. I see these cases:

If the variable is an attribute: use :attr:`var`
If the variable is not an attribute, such as return variables, or it's a constant like True or False: use single back tilts.
If we have a maths formula: use single back tilts
If we have some short code:
- If it just involves non-input variables: use double tilts
- If it also involves input variables (such as compute_uv = True): what do we do here?

For the last case, should it be:

:attr:`compute_uv` `= True` // This would use two different fonts one after the other
:attr:`compute_uv` = `True` // This would use three different fonts one after the other. This was the way it was done before.
:attr:`compute_uv` ``=True`` // This would use just one font, but the font of True would be different to the usual font given by rule 2. above.

Again, I'm happy to format this PR in whichever way you see fit. I just do not know what the formatting should be.
Edit. cc @mruberry @IvanYashchuk

Simpler explanation in the warnings Improve the writing

lezcano · 2021-03-29T12:23:57Z

Sorry for that #nolint issue. It should be good now.
I'll create in a second an issue to discuss the docs of torch.linalg in more generality. @mruberry

lezcano · 2021-03-29T13:56:11Z

eI have not been able to split the long line. I really hope that flake8 does not complain. If it does, I do not know how to solve it. I have found this SO post, but I have not managed to replicate the behaviour that they show there.

Edit. It looks like flake8 does not complain.

I've also added a warning making explicit the only cases in which the backwards is correct, and I've polished a bit the explanation for why it fails in the other cases.

lezcano · 2021-03-29T15:26:03Z

Moving the discussion about the formatting here: #54878

mruberry

Cool! Always great to have better docs.

facebook-github-bot · 2021-03-31T08:28:31Z

@mruberry has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

mruberry · 2021-03-31T17:32:39Z

What do you think of the test failure, @lezcano?

test_doc_template - TestTorch
test_torch.py
Traceback (most recent call last):
  File "test_torch.py", line 134, in test_doc_template
    doc_strs = f.read()
  File "c:\jenkins\miniconda3\lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 263415: character maps to <undefined>

lezcano · 2021-03-31T17:33:35Z

Wow, that's a new one. I'll have a look in a second.

facebook-github-bot · 2021-04-01T04:05:05Z

@mruberry has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2021-04-01T16:33:04Z

@mruberry merged this pull request in 36c27fd.

lezcano added the module: docs Related to our documentation, both in docs/ and docblocks label Mar 15, 2021

lezcano requested a review from IvanYashchuk March 15, 2021 14:27

facebook-github-bot added the cla signed label Mar 15, 2021

pytorchbot added the open source label Mar 15, 2021

jbschlosser requested a review from mruberry March 15, 2021 14:43

jbschlosser added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Mar 15, 2021

lezcano force-pushed the lezcano-docs branch from 56a8d9a to a4db79a Compare March 15, 2021 14:52