RFC: unique ids for singlehtml #9652

jdknight · 2021-09-19T04:05:10Z

Feature or Bugfix

Bugfix

Purpose

The following provides a proposed change to help handle an issue with the re-use of tag identifiers when using a singlehtml builder. Considering the following example documentation:

(index.rst)
test
====

.. toctree::

   page1
   page2

(page1.rst)
page
----

contents

(page2.rst)
page
----

contents

When this documentation is built with singlehtml, it will generated multiple section tags with the same id page (generated by docutils). When trying to build references to these sections (in a TOC), the TOC entries for each page will point to the first section entry.

To help deal with this, there is a desire to generate unique IDs across all processed documents (before assembling a single doctree). Inspecting docutils's implementation, there appears to be two setting options (id_prefix and auto_id_prefix) which can hint to how identifiers are generated. Assuming this is a good approach to generating unique identifiers, there will be an attempt to have a singlehtml builder to somehow configure docutils with unique prefix entries. To get this to work, an event env-prepare has been added to allow the builder to hook on when an environment is prepared for a specific document. When the event is triggered, the environment's setting is configured with a prefix value related to the document's name. When docutils prepares a document for Sphinx, there should be an (almost) unique set of identifiers for the builder to use.

There is a concern here about the introduction of the env-prepare event. I suspect that this may not be desired due to the implications/maintenance associated with introducing a new event. I am for switching up how this is done; however, I do not know enough about the internals of Sphinx to know the best way to approach this. While this maybe achieved with using a transform, I did not know what type of complexity would be involved to replace all detected duplicate identifiers (and references targeting these IDs) at a later stage in the Sphinx building processes.

This is a quickly formed commit created for review. The goal is to provided unique IDs generated for an assembled toctree when using the `singlehtml` builder. In the event that the code is planned to be updated/integrated into Sphinx, a proper commit message will be made. Signed-off-by: James Knight <james.d.knight@live.com>

tk0miya · 2021-09-26T17:54:13Z

Sadly, this will break the result of other builders if users build their document like make singlehtml; make html. Because Sphinx tries to go incremental-build. The node-IDs generated on building singlehtml will be cached at the first build, and re-used on the next build even for other builders.

jdknight · 2021-11-22T16:03:26Z

An update -- been trying to find an alternative (and "proper") way to do this, but no luck yet. I assume there is a point in the process where it may be proper to perform a transform to replace identifier values. And based on above, this would need to be at some point after Sphinx may cache document identifiers (for other builders) but before it gets too complex to start swapping/replacing identifiers. Not an expert on all the internals of Sphinx, so I haven't found where this caching event occurs (I'm probably grep'ing for the wrong things). I am also assuming that trying to work this in a post-transform (over a regular transform) may be too late in identifier changes (which may be incorrect).

An extremely crude workaround for documentation which focus on singlehtml builder (and does not help deal with the ID caching issue) is as follows:

from docutils import nodes
from docutils.transforms import Transform

class SingleBuilderUniqueIdsTransform(Transform):
    default_priority = 5
    uids = set()

    def apply(self):
        for target in self.document.traverse(nodes.Element):
            new_ids = []
            for base_id in target['ids']:
                id = base_id
                idx = 1
                while id in SingleBuilderUniqueIdsTransform.uids:
                    id = base_id + str(idx)
                    idx += 1

                new_ids.append(id)
                SingleBuilderUniqueIdsTransform.uids.add(id)

            if new_ids != target['ids']:
                target['ids'] = new_ids

def setup_singlehtml_uid_hack(app):
    if app.builder.name != 'singlehtml':
        return

    app.add_transform(SingleBuilderUniqueIdsTransform)

def setup(app):
    app.connect('builder-inited', setup_singlehtml_uid_hack)

tk0miya added type:bug builder:html type:proposal a feature suggestion labels Sep 26, 2021

AA-Turner added this to the some future version milestone Apr 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: unique ids for singlehtml #9652

RFC: unique ids for singlehtml #9652

jdknight commented Sep 19, 2021

tk0miya commented Sep 26, 2021

jdknight commented Nov 22, 2021

RFC: unique ids for singlehtml #9652

Are you sure you want to change the base?

RFC: unique ids for singlehtml #9652

Conversation

jdknight commented Sep 19, 2021

Feature or Bugfix

Purpose

tk0miya commented Sep 26, 2021

jdknight commented Nov 22, 2021