Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTML Search: Use anchor for search preview #11944

Merged
merged 5 commits into from Feb 27, 2024

Conversation

wlach
Copy link
Contributor

@wlach wlach commented Feb 4, 2024

Subject: Use anchor for generating search previews

Feature or Bugfix

  • Bugfix

Purpose

Attempts to use anchor for generating HTML preview, if available. This ensures that the preview is based on the anchor in the document, for cases where we matched on that title.

Detail

You can reproduce this bug using the procedure @jayaddison mentioned in #11943

sphinx.git $ sphinx-build -b html doc _build 
sphinx.git $ cd _build
sphinx.git $ python -m http.server -b 127.0.0.1

Then search for something like "test"

Before:

image

After:

image

Note how the content in the "after" image actually reflects what you'd be jumping to!

Relates

Closes #11943
Discovered when discussing #11942

@wlach wlach changed the title HTML Search: Use anchor for search previe HTML Search: Use anchor for search preview Feb 4, 2024
const docContent = htmlElement.querySelector('[role="main"]');
let docContent = undefined;
if (anchor && htmlElement.querySelector(anchor)) {
docContent = htmlElement.querySelector(anchor);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a concern here: I think this is going to retrieve the text content within the anchor element?

In practice we may want to retrieve some of the following HTML sibling element text as well.

We do have some test coverage on searchtools.js and I wonder whether we could use that to make some assertions about what we retrieve and summarize (and also safeguard the feature against regressions).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want the sibling element. The anchor defines a section within the documentation. I don't think we want (or need) any text outside of it: it's (1) likely to be out of context and (2) probably outside of the text we'd take for the snippit except in edge cases.

Here's the section for the 2to3 example you gave earlier:

image

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, brilliant. I'm doing a few more tests locally, and getting around to agreeing with all this.

@wlach
Copy link
Contributor Author

wlach commented Feb 6, 2024

@jayaddison Ready for another look whenever you are. Added some tests while I was here

tests/js/searchtools.js Outdated Show resolved Hide resolved
</html>`;

it("basic case", () => {
expect(Search.htmlToText(testHTML).trim().split(/\s+/)).toEqual([
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some ideas about the reason, but just to check they match reality: why are we testing htmlToText and not makeSearchSummary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's easier to test the lower-level construct to verify that it doesn't have any bugs. makeSearchSummary has a bunch of magic numbers and heuristics in it which seems much more subject to change.

});

it("will start reading from the anchor", () => {
expect(Search.htmlToText(testHTML, '#other-section').trim().split(/\s+/)).toEqual(['Other', 'Section', 'Other', 'text']);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the trim...split behaviour here to avoid being too-precise about whitespace in the output?

If so, an alternative could be to make a less-restrictive assertion; something like retrievedText.startsWith("Other Section").

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right, just trying not to test whitespace. I think I prefer to check for all the tokens in the output, feels more comprehensive to me.

Copy link
Contributor

@jayaddison jayaddison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me - thank you @wlach!

Copy link
Member

@picnixz picnixz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems fine to me as well.

@picnixz
Copy link
Member

picnixz commented Feb 24, 2024

@wlach Could you add a changelog entry btw?

@wlach wlach force-pushed the use-anchor-for-search-previews branch from 00607a9 to 9c5d55f Compare February 27, 2024 14:11
@wlach
Copy link
Contributor Author

wlach commented Feb 27, 2024

@wlach Could you add a changelog entry btw?

Done.

@picnixz picnixz merged commit 9a30ca7 into sphinx-doc:master Feb 27, 2024
23 checks passed
@picnixz
Copy link
Member

picnixz commented Feb 27, 2024

Thank you!

jayaddison added a commit to jayaddison/sphinx that referenced this pull request Mar 16, 2024
picnixz pushed a commit that referenced this pull request Mar 16, 2024
@wlach wlach deleted the use-anchor-for-search-previews branch April 7, 2024 09:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

HTML search results: identical summaries displayed for different link targets
3 participants