Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intersphinx cannot link to section headers with uppercase names #12008

Closed
goerz opened this issue Feb 25, 2024 · 8 comments · Fixed by #12033
Closed

Intersphinx cannot link to section headers with uppercase names #12008

goerz opened this issue Feb 25, 2024 · 8 comments · Fixed by #12033

Comments

@goerz
Copy link
Contributor

goerz commented Feb 25, 2024

Describe the bug

In

'ref': XRefRole(lowercase=True, innernodeclass=nodes.inline,
warn_dangling=True),
# links to labels of numbered figures, tables and code-blocks
'numref': XRefRole(lowercase=True,
warn_dangling=True),

it is specified that :ref: and :numref: references should lowercase the name of the target before resolving the link. When combined with intersphinx, this makes an implicit assumption that any "name" for a std:label entry in a loaded inventory is lowercase. Typically, the name corresponds to the HTML-anchor of the section header. Since Sphinx generates lowercase anchors by default, this assumption holds. However, other documentation generators may not choose lowercase anchors. For example, Documenter.jl currently uses capitalization in anchors for section titles, and also writes an objects.inv file with names matching those anchors.

It would be unreasonable to require that all inventory files may only contain lowercase names: since the inventory format specifies

If {uri} has an anchor (technically a “fragment identifier,” the portion following the # symbol) and the tail of the anchor is identical to {name}, that tail is replaced with $.

there is a strong incentive for the name to match the anchor. In the case of Documenter-generated inventory files, deviating from this would significantly increase the size of the inventory file.

The correct behavior would be for Sphinx to normalize the names of std:label to lowercase when reading in the objects.inv file, simply by adding

if type == 'std:label':
    name = name.lower()

before line 141 in sphinx/util/inventory.py

How to Reproduce

See the MWE at sphinx-to-documenter-links.

The problematic inventory file is, e.g., http://juliadocs.org/DocumenterInterLinks.jl/stable/objecs.inv from the DocumenterInterLinks project.

The problem is demonstrated in the following line in docs/source/index.rst:

* Referencing a heading is currently not possible due to a bug in Sphinx:
  ``:external+DocumenterInterLinks:std:ref:`Syntax``` does not work because
  Sphinx lowercases the "Syntax"

Environment Information

Platform:              darwin; (macOS-14.3.1-arm64-arm-64bit)
Python version:        3.10.10 (main, Mar 31 2023, 17:38:48) [Clang 14.0.0 (clang-1400.0.29.202)])
Python implementation: CPython
Sphinx version:        7.2.6
Docutils version:      0.20.1
Jinja2 version:        3.1.3
Pygments version:      2.17.2

Sphinx extensions

Not relevant, since the issue is with std:label, not objects in the jl domain, but technically julia_domain.py

Additional context

#8982 seems related

@electric-coder
Copy link

For reference:

Identifier Normalization

Docutils adds a normalization by downcasing

I'm not really sure what you're proposing here? That Sphinx should internally postpone the normalization step to process objects.inv that might not be normalized? But isn't the point of objects.inv files to have links that are URL normalized already?

@goerz
Copy link
Contributor Author

goerz commented Feb 26, 2024

I don't know what Docutils has to do with this or what the context of the "Identifier Normalization" is. Certainly, Sphinx does not generally use lowercase names or lowercase anchors in its URLs. For example, a random line from Sphinx' inventory is

sphinx.builders.dirhtml.DirectoryHTMLBuilder.format py:attribute 1 usage/builders/index.html#$ -

which describes the attribute at https://www.sphinx-doc.org/en/master/usage/builders/index.html#sphinx.builders.dirhtml.DirectoryHTMLBuilder.format (note the uppercase anchor name).

Just for section headers in particular, Sphinx happens to "sluggify" them to lowercase names, and that's how they get written to the inventory. There's nothing in particular that specifies that choice of sluggification method, and other systems (like Documenter) have a different sluggification that preservers case. There is no "normalization" happening when writing inventories.

Sphinx chooses to "normalize" e.g., :ref:`Syntax` to :ref:`syntax` , due to 'ref': XRefRole(lowercase=True, …). The ref and numref roles are the only two that do this. That gets handled correctly for local references, but with Intersphinx, the problem is that it then tries to look up the lowercase "syntax" (for example) directly in the inventory:

if target in inventory[objtype]:
# Case sensitive match, use it
data = inventory[objtype][target]
elif objtype == 'std:term':
# Check for potential case insensitive matches for terms only
target_lower = target.lower()
insensitive_matches = list(filter(lambda k: k.lower() == target_lower,
inventory[objtype].keys()))
if insensitive_matches:
data = inventory[objtype][insensitive_matches[0]]
else:
# No case insensitive match either, continue to the next candidate
continue
else:
# Could reach here if we're not a term but have a case insensitive match.
# This is a fix for terms specifically, but potentially should apply to
# other types.
continue
return _create_element_from_result(domain, inv_name, data, node, contnode)

(line 336). What I'm saying is that the normalization needs to happen in both places: the target, and the inventory[objtype] we're looking it up in.

So you either normalize the keys in inventory["std:label"] when you load the inventory (which this PR does), or you implement a case-insensitive lookup, just like for std:term objtype in the above code (which has a similar problem for other reasons).

I'd be happy to do an alternative PR to modify _resolve_reference_in_domain_by_target if that solution is preferable to normalizing the inventory on read.

But isn't the point of objects.inv files to have links that are URL normalized already?

I'm not sure what you mean. But no, objects.inv files do not have any kind of inherent normalization.

@picnixz
Copy link
Member

picnixz commented Feb 26, 2024

Before anything, I want to understand why the ref and numref roles use a lowercasing in the first place... Could you investigate this one?

One reason that I can think of is because of labels that are auto-generated for sections with autosectionlabel but I'm not sure about it. Also, you could perhaps find the commit or related issues concerning capitalization of section titles + references.

@goerz
Copy link
Contributor Author

goerz commented Feb 26, 2024

XRefRole(lowercase=True, …) was introduced in c02b714

Before that: 'ref': make_xref_role(lowercase_link_func, None, nodes.emphasis), introduced in 957be3b

Before that (f82a4a4):

    elif typ == 'ref':
        # reST label names are always lowercased
        target = ws_re.sub('', target).lower()

Ultimately, it comes down to

commit 2e698fcb0962fc42aae233e4b4495b74cbe0b9b6
Author: Georg Brandl <georg@python.org>
Date:   Fri Jul 4 14:27:25 2008 +0000

    Merged revisions 64642-64643,64698 via svnmerge from
    svn+ssh://pythondev@svn.python.org/doctools/branches/0.4.x

    ........
      r64642 | georg.brandl | 2008-07-01 23:02:35 +0200 (Tue, 01 Jul 2008) | 2 lines

      #3251: label names are case insensitive.
    ........
      r64643 | georg.brandl | 2008-07-01 23:24:55 +0200 (Tue, 01 Jul 2008) | 2 lines

      Add a note about decorated functions.
    ........
      r64698 | georg.brandl | 2008-07-04 12:21:09 +0200 (Fri, 04 Jul 2008) | 2 lines

      Allow setting current module to None.
    ........

 doc/ext/autodoc.rst        | 11 +++++++++++
 sphinx/directives/other.py |  5 ++++-
 sphinx/roles.py            |  3 +++
 3 files changed, 18 insertions(+), 1 deletion(-)

See also https://mail.python.org/pipermail/python-checkins/2008-July.txt

  • Label names in references are now case-insensitive, since reST label
    names are always lowercased.

At that point, we're entering the pre-git era, so I don't think I can track this much further. The https://svn.python.org/projects/doctools/ site is still around if someone wants to dig deeper.

I'm pretty sure all of this predates any serious Intersphinx capabilities, definitely the v2 inventory format.

It's perfectly fine for label names in references to be case-insensitive

since reST label names are always lowercased.

But label names in inventory files that don't originate from .rst files are not necessarily lowercased, which is why I'm normalizing them in this PR.

@goerz
Copy link
Contributor Author

goerz commented Feb 26, 2024

Maybe I should clarify the core of the issue more concisely, since it's quite narrow and specific to intersphinx:

  • Sphinx chooses to normalize (lowercase) the label in :ref:`label` before resolving it (for whatever reason, I suppose people wanted to write references without worrying about case)
  • An inventory is basically a dict of labels to URLs. Note that inventories are external (user-supplied) data, so we can't make too many assumptions about them, apart from that they parse according to the specification. In particular, if they're not originally generated by Sphinx, they might use mixed-case anchors for section titles.
  • Very generally, if we're looking up label in a dict, and we've normalized label to lowercase, we should also normalize the keys in the dict (or tweak the lookup). That's all this PR does.

@picnixz
Copy link
Member

picnixz commented Feb 27, 2024

Thank you for your very precise investigation (I didn't ask for that much actually). Personally, I prefer not changing anything on the inventory side and fix the thing on the resolver side instead as you said:

I'd be happy to do an alternative PR to modify _resolve_reference_in_domain_by_target if that solution is preferable to normalizing the inventory on read.

Maybe we will change that later again if there are still issues.

@goerz
Copy link
Contributor Author

goerz commented Feb 27, 2024

That’s fine! I’ll make a second PR for that within the next couple of day.

goerz added a commit to goerz/sphinx that referenced this issue Mar 2, 2024
Look up `:std:label:` object types case-insensitively in an intersphinx
inventory. This is needed because `ref` and `numref` references (which
both resolve to `:std:label:`) are case insensitive. They lowercase
their target under the assumption that the target name in any inventory
file is also lowercased.

That assumption holds for labels in inventory files generated by Sphinx
from `.rst` files, since it generates lowercase labels. However, other
documentation generators (such as the Julia Documenter.jl, which
also writes `objects.inv` inventory files as of version 1.3) use
mixed-case labels for section title.

With the change in this commit, if the lowercase label cannot be found
in the inventory, also try a case-insensitive match.

See sphinx-doc#12008
@goerz
Copy link
Contributor Author

goerz commented Mar 2, 2024

OK, opened the alternative PR #12033

I agree that this is a better solution, as the problem is really the same one as was resolved earlier for :term: references

goerz added a commit to goerz/sphinx that referenced this issue Mar 2, 2024
Look up `:std:label:` object types case-insensitively in an intersphinx
inventory. This is needed because `ref` and `numref` references (which
both resolve to `:std:label:`) are case insensitive. They lowercase
their target under the assumption that the target name in any inventory
file is also lowercased.

That assumption holds for labels in inventory files generated by Sphinx
from `.rst` files, since it generates lowercase labels. However, other
documentation generators (such as the Julia Documenter.jl, which
also writes `objects.inv` inventory files as of version 1.3) use
mixed-case labels for section title.

With the change in this commit, if the lowercase label cannot be found
in the inventory, also try a case-insensitive match.

See sphinx-doc#12008
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Apr 4, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
3 participants