Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search: improve parser #7233

Merged
merged 5 commits into from Aug 17, 2020
Merged

Search: improve parser #7233

merged 5 commits into from Aug 17, 2020

Conversation

stsewd
Copy link
Member

@stsewd stsewd commented Jun 25, 2020

This will make the parser more general and match
#7232
(also, one bug fix).

  • Try the main tag before trying the first h1
  • Always inspect all headers till 2 levels (this removes the need for
    the special case from Sphinx, where the h tag is inside a div)
  • _parse_content now not only removes all new line chars, but it also
    reduces multiple spaces into one.
  • Remove elements with the search role in addition to the navigation
    role.
  • The headerlink class doesn't need to be inside an a tag.
  • Fix bug where calling .text() over a text node will return empty.
    (I was able to catch this one now that we are checking till 2 levels)
  • Increase the depth to 3 for the first section (one mkdocs theme was setting the main tag in a node really up from the actual content)

This doesn't change the current indexing, maybe we will be indexing more content if we had a top text node (calling .text() would have returned empty).

This will make the parser more general and match
#7232
(also, one bug fix).

- Try the main tag before trying the first h1
- Always inspect all headers till 2 levels (this removes the need for
  the special case from Sphinx, where the h tag is inside a div)
- `_parse_content` now not only removes all new line chars, but it also
  reduces multiple spaces into one.
- Remove elements with the search role in addition to the navigation
  role.
- The headerlink class doesn't need to be inside an `a` tag.
- Fix bug where calling .text() over a text node will return empty.
  (I was able to catch this one now that we are checking till 2 levels)
@@ -6,7 +6,7 @@
{
"id": "mkdocs",
"title": "MkDocs",
"content": "Project documentation with\u00a0Markdown."
"content": "Project documentation with Markdown."
Copy link
Member Author

@stsewd stsewd Jun 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No more weird chars now that we are stripping all white spaces :D

@stsewd stsewd requested review from ericholscher and a team Jun 25, 2020
stsewd added 3 commits Jun 29, 2020
Now that we prioritizes the main tag as main node,
the main node from the mkdocs material theme is more wide.
@stale
Copy link

stale bot commented Aug 16, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the Status: stale Issue will be considered inactive soon label Aug 16, 2020
@stsewd stsewd removed the Status: stale Issue will be considered inactive soon label Aug 16, 2020
Copy link
Member

@ericholscher ericholscher left a comment

This seems useful, sorry it sat for so long 👍

@stsewd stsewd merged commit 4638167 into master Aug 17, 2020
2 checks passed
@stsewd stsewd deleted the improve-parser branch Aug 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants