Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid sphinx searching on output cells #777

Closed
pfebrer opened this issue Feb 22, 2024 · 10 comments
Closed

Avoid sphinx searching on output cells #777

pfebrer opened this issue Feb 22, 2024 · 10 comments

Comments

@pfebrer
Copy link

pfebrer commented Feb 22, 2024

In our documentation, we have some notebooks rendered by nbsphinx which include plotly plots. The output cells of the notebook include the full plotly javascript library. When we use sphinx's search bar in our documentation, we get hits for plotly's javascript (circled in red):

Screenshot from 2024-02-22 16-08-58

Is there any way to avoid this?

@mgeier
Copy link
Member

mgeier commented Feb 22, 2024

Can you please show the HTML of the affected page?

It looks like HTML tags are supposed to get stripped, see sphinx-doc/sphinx@53ea1cb

It would be ideal if you could reproduce this with the raw directive (without using nbsphinx), then you could raise this as a Sphinx issue.

@pfebrer
Copy link
Author

pfebrer commented Feb 22, 2024

Ok, so what I understand from the code there is that in principle everything inside a script tag is ignored, right?

Thanks, I'll try to dig deeper!

@pfebrer
Copy link
Author

pfebrer commented Feb 22, 2024

This is the page: https://zerothi.github.io/sisl/visualization/viz_module/showcase/GeometryPlot.html#GeometryPlot

(I can't upload html files to github)

And if I grep on that html file:

grep -n "not have a valid GeoJSON geometry" geometry_plot.html | cut -d : -f 1

I get a match on line 208, which is where the plotly library is included inside a script tag.

@mgeier
Copy link
Member

mgeier commented Mar 6, 2024

Thanks for the link!

BTW, the "download ipynb" link is broken: https://raw.githubusercontent.com/zerothi/sisl/main//home/runner/work/sisl/sisl/docs/visualization/viz_module/showcase/GeometryPlot.ipynb

I guess it is meant to be this: https://raw.githubusercontent.com/zerothi/sisl/main/docs/visualization/viz_module/showcase/GeometryPlot.ipynb

However, this doesn't contain the outputs. Can you please provide the .iypnb file with outputs?

@pfebrer
Copy link
Author

pfebrer commented Mar 6, 2024

Yes, I'll send it to you as soon as I get home 👍

Thanks for the broken link report!

@pfebrer
Copy link
Author

pfebrer commented Mar 7, 2024

Here it is: GeometryPlot.zip

@mgeier
Copy link
Member

mgeier commented Mar 7, 2024

Thanks for the notebook file!

Playing around with that, I could reduce this to a pure Sphinx problem: sphinx-doc/sphinx#12052

It looks like the <script> tag is indeed ignored when building the search index, but it is not ignored in the search preview.

Note that in your example the Plotly stuff is only shown because the word "geometry" is also used somewhere else on the page.
If you search for "GeoJSON", you'll find nothing, even though the word is right next to "geometry".

@pfebrer
Copy link
Author

pfebrer commented Mar 8, 2024

Thank you very much! That's an interesting bug 😅

I guess I can close this then 👍

@pfebrer pfebrer closed this as completed Mar 8, 2024
@mgeier
Copy link
Member

mgeier commented Mar 9, 2024

That's an interesting bug

Yes indeed!

It is a dangerous pattern to look out for: there is one piece of data (in our case the HTML source text) and there are two sub-systems handling that data separately (in our case the search index generation and the search preview generation). Those two systems are supposed to have the same behavior, but if they don't, we have a problem.

This reminds me of a vulnerability of the librsvg library I've read recently: https://nvd.nist.gov/vuln/detail/CVE-2023-38633
In that case, the common piece of data was a URL, which was rejected by one sub-system, but not by another, which resulted in a potential exploit.

@zerothi
Copy link

zerothi commented Mar 20, 2024

Love this! Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants