Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search: custom search page ranking #7237

Merged
merged 12 commits into from Jul 2, 2020
Merged

Search: custom search page ranking #7237

merged 12 commits into from Jul 2, 2020

Conversation

stsewd
Copy link
Member

@stsewd stsewd commented Jun 26, 2020

Need to test this more locally, but it should be ready for review.

Closes #7082

Ref #7217

@stsewd
Copy link
Member Author

stsewd commented Jun 30, 2020

I'm still figuring out the correct range to map to ES and make it work as we want. But I'm not changing anything else in the code, so it shouldn't change much if someone wants to review this.

# boosting for these fields need to be close enough
# to be re-boosted by the page rank.
_outer_fields = ['title^1.5']
_section_fields = ['sections.title^2', 'sections.content']
Copy link
Member Author

@stsewd stsewd Jun 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while one is greater than the other results remain the same, and we don't have any other factor that alters the boosting, so this doesn't change our current score.

0.5,
0.6,
0.7,
0.8,
Copy link
Member Author

@stsewd stsewd Jun 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can move this 0.8 up if we want to give more priority to the boost (and move the 1.3 down).

@stsewd stsewd requested review from ericholscher and a team Jun 30, 2020
api/v2/*: 4

search.ranking
``````````````
Copy link
Member Author

@stsewd stsewd Jun 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can mark this option as experimental if we want to try it first, but we can make changes without re-indexing, so we are safe to do some tuning after if needed.

Copy link
Member

@ericholscher ericholscher left a comment

This is super cool and a great addition to our search. I have a few ideas for improvements, but they can be a "v2" after we ship this and test it:

  • The ability to set the rank at the page level with metadata in RST :search_rank: 7 at the top of the file. We would then parse this in our extension and pull it out for indexing. Not a huge priority, but I think would be a nice authoring option
  • Adding ranking by pageview data. We have the data now, and this approach seems quite easy to adapt to additional data points. I think that should probably be the next small improvement, and shouldn't be difficult.

The rank can be an integer number between -10 and 10 (inclusive).
Pages with a rank closer to -10 will appear further down the list of results,
and pages with a rank closer to 10 will appear higher in the list of results.
Note that 0 means *normal rank*, not *no rank*.
Copy link
Member

@ericholscher ericholscher Jun 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point 👍

@@ -1547,13 +1553,23 @@ def _create_imported_files(version, commit, build):
version_slug=version.slug,
),
)

page_rank = 0
# Last pattern to match takes precedence
Copy link
Member

@ericholscher ericholscher Jun 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would generally expect the first match to take precedent -- is there a reason we're doing last?

Copy link
Member Author

@stsewd stsewd Jul 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is in case you have something like this:

  • api/*.html: -2
  • api/important.html: 2

The first matches can be too greedy.

Copy link
Member

@ericholscher ericholscher Jul 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hrm, I understand that, but don't quite understand why that ordering would make the most sense. I guess building some kind of algorithm for "closeness" match is too complex, and we need some kind of default, so this probably makes sense?

Copy link
Member Author

@stsewd stsewd Jul 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean something like matching the longest pattern? Actually I just thought the last pattern makes sense in general when matching this kind of patterns, gitignore for example

within one level of precedence, the last matching pattern decides the outcome
https://git-scm.com/docs/gitignore

Copy link
Member

@ericholscher ericholscher Jul 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, I don't think first or last really makes more sense, so we just need to pick one. Longest could be interesting, maybe add a comment about looking into it?

readthedocs/search/faceted_search.py Show resolved Hide resolved
@stsewd stsewd merged commit 505be8c into master Jul 2, 2020
2 checks passed
@stsewd stsewd deleted the search-ranking branch Jul 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Search: Allow authors to set a "search score" per pages
2 participants