Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation build hangs indefinitely for pdfhtml on master #1308

Closed
choldgraf opened this issue Apr 19, 2021 · 20 comments · Fixed by #1293 · May be fixed by executablebooks/sphinx-panels#61
Closed

Documentation build hangs indefinitely for pdfhtml on master #1308

choldgraf opened this issue Apr 19, 2021 · 20 comments · Fixed by #1293 · May be fixed by executablebooks/sphinx-panels#61
Assignees
Labels
bug Something isn't working

Comments

@choldgraf
Copy link
Collaborator

Describe the bug

In #1283 we re-worked the documentation structure a bit. This seems to have introduced a bug where the singlepage PDF build via HTML now hangs indefinitely.

To Reproduce

Build the book PDF via HTML with

tox -e docs-pdfhtml-update

this should hang indefinitely

@mmcky
Copy link
Contributor

mmcky commented Apr 27, 2021

This seems to be an issue during the html -> pdf step using pyppeteer as

jb build docs --builder=custom --custom-builder=singlehtml

the sphinx build for singlehtml seems to be working as well as the html builder.

However when I run the builder=pdfhtml the html that is used in the _build/html folder doesn't load due to some error so must be an issue in the javascript loading. The left hand index is missing when I open _build/html/intro.html when building with pdfhtml

Screen Shot 2021-04-27 at 11 10 12 am

@choldgraf does the pdfhtml use the singlehtml builder but at the page level? Scrap that it builds into a single file

@choldgraf
Copy link
Collaborator Author

the plot thickens :-) I think you're right that this is probably some kinda infinite recursion kinda thing, or javascript library problem or something

@mmcky
Copy link
Contributor

mmcky commented Apr 27, 2021

So I commented out the tableofcontents directives 42d8934 and the pdfhtml builder executes and produces the pdf

I then tried adding singlehtml=(skip,None) to the tableofcontents node in https://github.com/executablebooks/jupyter-book/tree/test-tests to add it to the singlehtml context and the next issue seems that a node called fontawesome isn't registered with singlehtml builder and writer.

I have tried a few things this afternoon in #1307. I think the main issue is tableofcontents node isn't registered with singlehtml builder causing an unstable build process (which sometimes appears to work just fine). I think we may need to review all the extensions used by jupyter-book and match them up to the builders we use.

@mmcky
Copy link
Contributor

mmcky commented Apr 27, 2021

I also checked out ca79899 (just before the docs rework) which is building fine according to the cli output but taking a closer look at _build/pdf/book.pdf it also appears to be missing content.

book.pdf

@choldgraf
Copy link
Collaborator Author

Hmmm - I didn't realize that singlehtml and html builders needed to be manually added to a node, that's super annoying :-/

@chrisjsewell
Copy link
Contributor

Note tableofcontents doesn't now now need to be "registered" in sphinx-external-toc.
fontawesome is probably from sphinx-panels

@mmcky
Copy link
Contributor

mmcky commented Apr 27, 2021

thanks @chrisjsewell yeah in #1307 I found the fontawesome nodes but then I was left with one I can't find _TabInput.

Note tableofcontents doesn't now now need to be "registered" in sphinx-external-toc

Roger that -- where does that node get registered then? I see they will be registered here in jupyter-book 👍 like in #1307 (?)

@chrisjsewell
Copy link
Contributor

chrisjsewell commented Apr 27, 2021

No the key thing here is that you only need to register a node if they will actually reach the builder.
Since tableofcontents nodes are all now replaced/removed in a transform (that is always applied, as opposed to the current post-transform that is not), they can never reach the builder.
Hence they are not registered anywhere

@mmcky
Copy link
Contributor

mmcky commented Apr 28, 2021

Documenting a list of packages that will need to be updated for singlehtml:

  • sphinx-inline-tabs for _TabInput, _TabLabel nodes
  • jupyter-sphinx for all the nodes associated with jupyter output such as JupyterWidgetViewNode
  • sphinx-panels for fa node to support fontawesome
  • sphinx-external-toc for tableofcontents no longer required

@mmcky
Copy link
Contributor

mmcky commented Apr 28, 2021

@chrisjsewell @choldgraf happy to start coordinating updates for these packages via PR's but this might take some time to resolve for #1293 as these are split across a diverse set of repos that aren't within the executablebooks organisation.

I think this would be the best long term strategy (i.e. adding support for singlehtml in the packages themselves) but wondering if you guys thought a high priority transform might be able to attach html settings to singlehtml if they aren't found from the extension.

Update: Talked this idea through with @AakashGfude. It might not be feasible as the nodes are loaded during the app setup phase so once the singlehtml builder is instantiated and transforms run it will likely have no knowledge of the app.add_node() phase

@mmcky
Copy link
Contributor

mmcky commented Apr 28, 2021

I did a bit of digging and it looks like this could actually be done. I am just not sure if assigning singlehtml with nodes from html is a robust thing to do.

After the sphinx.app is instantiated we do have access to:

app.registry.translation_handlers

which records the handlers for html, and singlehtml (amongst others).

So we could loop through any nodes missing from singlehtml and add the nodes from the html registry.

Then we can go trough all the packages and update them with singlehtml specific methods (if needed).

Update: Bummer -- it looks like sphinx doesn't save the actual node just the node name attribute and leaves the node to be added to docutils

@mmcky
Copy link
Contributor

mmcky commented Apr 28, 2021

The nodes could be recovered from docutils:

            if builder == "singlehtml":
                singlehtml_node_names = app.registry.translation_handlers['singlehtml'].keys()
                html_nodes = app.registry.translation_handlers['html']
                #Recover registered nodes from docutils
                registered_nodes = {}
                from sphinx.util import docutils
                for node in docutils.additional_nodes:
                    registered_nodes[node.__name__] = node
                # Set missing nodes with visit,depart methods from html
                for node_name, methods in html_nodes.items():
                    if node_name not in singlehtml_node_names:
                        try:
                            app.registry.add_translation_handlers(registered_nodes[node_name], singlehtml=methods)
                        except:
                            raise ValueError(f"{node_name} is missing from the docutils registry")

and could be added here but still not sure this is a good idea or not.

This does seem to fix a lot of missing content from the html generation step using singlehtml

The resulting intro.html file is 6.4Mb

@mmcky
Copy link
Contributor

mmcky commented Apr 29, 2021

@chrisjsewell any thoughts on the above proposal to add any missing singlehtml nodes from html registry? I think most of the time they will share visit/depart methods for building the html so it seems like a pretty safe option to me. As packages are updated one-by-one with singlehtml support these nodes wouldn't be replaced so if there are differences it will take precedence.

@mmcky
Copy link
Contributor

mmcky commented Apr 29, 2021

I have checked in with the sphinx-dev community about this idea:

https://groups.google.com/g/sphinx-dev/c/J4if9AvxNaA/m/76yAj4vCDwAJ

@mmcky
Copy link
Contributor

mmcky commented May 2, 2021

@chrisjsewell @choldgraf I heard from Takeshi (on the dev community channel) and we may be able to have a look at adjusting this upstream in sphinx.

@choldgraf
Copy link
Collaborator Author

Great news! Hopefully he is cool putting a fix into 3.x

@mmcky
Copy link
Contributor

mmcky commented May 3, 2021

@choldgraf I am trying to dig in to the original issue here with pdfhtml when you merged #1283 but when using

tox -e docs-pdfhtml-update

I find I get the NotImplementedError: Unknown node: TableOfContentsNode error rather than the hanging behaviour. Any ideas?

@mmcky
Copy link
Contributor

mmcky commented May 3, 2021

I have changed the pyppeteer wait settings for PR #1293 which seems pretty reasonable which resolves part of this issue to get pdfhtml to pass ci. However the missing nodes are still required to get all content included in the pdf file.

There is discussion upstream to add html nodes based on format rather than builder.

sphinx-doc/sphinx#9160

This would however target sphinx>=4.1

@mmcky
Copy link
Contributor

mmcky commented May 4, 2021

With the latest pyppeteer settings the test for pdfhtml is now passing.
I have opened a separate issue regarding the missing content as linked above.

@mmcky mmcky closed this as completed May 4, 2021
@choldgraf
Copy link
Collaborator Author

Think we could tie this issue to that PR so that it's closed when the PR is merged, rather than closing this issue straight away?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
3 participants