Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: unique ids for singlehtml #9652

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

jdknight
Copy link
Contributor

Feature or Bugfix

  • Bugfix

Purpose

The following provides a proposed change to help handle an issue with the re-use of tag identifiers when using a singlehtml builder. Considering the following example documentation:

(index.rst)
test
====

.. toctree::

   page1
   page2

(page1.rst)
page
----

contents

(page2.rst)
page
----

contents

When this documentation is built with singlehtml, it will generated multiple section tags with the same id page (generated by docutils). When trying to build references to these sections (in a TOC), the TOC entries for each page will point to the first section entry.

To help deal with this, there is a desire to generate unique IDs across all processed documents (before assembling a single doctree). Inspecting docutils's implementation, there appears to be two setting options (id_prefix and auto_id_prefix) which can hint to how identifiers are generated. Assuming this is a good approach to generating unique identifiers, there will be an attempt to have a singlehtml builder to somehow configure docutils with unique prefix entries. To get this to work, an event env-prepare has been added to allow the builder to hook on when an environment is prepared for a specific document. When the event is triggered, the environment's setting is configured with a prefix value related to the document's name. When docutils prepares a document for Sphinx, there should be an (almost) unique set of identifiers for the builder to use.


There is a concern here about the introduction of the env-prepare event. I suspect that this may not be desired due to the implications/maintenance associated with introducing a new event. I am for switching up how this is done; however, I do not know enough about the internals of Sphinx to know the best way to approach this. While this maybe achieved with using a transform, I did not know what type of complexity would be involved to replace all detected duplicate identifiers (and references targeting these IDs) at a later stage in the Sphinx building processes.

This is a quickly formed commit created for review. The goal is to
provided unique IDs generated for an assembled toctree when using the
`singlehtml` builder. In the event that the code is planned to be
updated/integrated into Sphinx, a proper commit message will be made.

Signed-off-by: James Knight <james.d.knight@live.com>
@tk0miya
Copy link
Member

tk0miya commented Sep 26, 2021

Sadly, this will break the result of other builders if users build their document like make singlehtml; make html. Because Sphinx tries to go incremental-build. The node-IDs generated on building singlehtml will be cached at the first build, and re-used on the next build even for other builders.

@jdknight
Copy link
Contributor Author

An update -- been trying to find an alternative (and "proper") way to do this, but no luck yet. I assume there is a point in the process where it may be proper to perform a transform to replace identifier values. And based on above, this would need to be at some point after Sphinx may cache document identifiers (for other builders) but before it gets too complex to start swapping/replacing identifiers. Not an expert on all the internals of Sphinx, so I haven't found where this caching event occurs (I'm probably grep'ing for the wrong things). I am also assuming that trying to work this in a post-transform (over a regular transform) may be too late in identifier changes (which may be incorrect).

An extremely crude workaround for documentation which focus on singlehtml builder (and does not help deal with the ID caching issue) is as follows:

from docutils import nodes
from docutils.transforms import Transform

class SingleBuilderUniqueIdsTransform(Transform):
    default_priority = 5
    uids = set()

    def apply(self):
        for target in self.document.traverse(nodes.Element):
            new_ids = []
            for base_id in target['ids']:
                id = base_id
                idx = 1
                while id in SingleBuilderUniqueIdsTransform.uids:
                    id = base_id + str(idx)
                    idx += 1

                new_ids.append(id)
                SingleBuilderUniqueIdsTransform.uids.add(id)

            if new_ids != target['ids']:
                target['ids'] = new_ids

def setup_singlehtml_uid_hack(app):
    if app.builder.name != 'singlehtml':
        return

    app.add_transform(SingleBuilderUniqueIdsTransform)

def setup(app):
    app.connect('builder-inited', setup_singlehtml_uid_hack)

@AA-Turner AA-Turner added this to the some future version milestone Apr 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants