-
-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Fix #2457] Implement exact match search #4292
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -20,7 +20,7 @@ def es_index(): | |
|
||
|
||
@pytest.fixture(autouse=True) | ||
def all_projects(es_index, mock_processed_json): | ||
def all_projects(es_index, mock_processed_json, db): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This doesn't appear to be used. Is it necessary? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Its actually a fixture which need to injected for db access. whether you use or not, if any fixture has dependency to |
||
projects_list = [] | ||
for project_slug in ALL_PROJECTS: | ||
project = G(Project, slug=project_slug, name=project_slug) | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,48 @@ | ||
import pytest | ||
|
||
from readthedocs.search.documents import PageDocument | ||
|
||
|
||
class TestFileSearch(object): | ||
|
||
@pytest.mark.parametrize('case', ['upper', 'lower', 'title']) | ||
def test_search_exact_match(self, client, project, case): | ||
"""Check quoted query match exact phrase with case insensitively | ||
|
||
Making a query with quoted text like ``"foo bar"`` should match | ||
exactly ``foo bar`` or ``Foo Bar`` etc | ||
""" | ||
# `Github` word is present both in `kuma` and `pipeline` files | ||
# But the phrase Github can is available only in kuma docs. | ||
# So search with this phrase to check | ||
query_text = r'"GitHub can"' | ||
cased_query = getattr(query_text, case) | ||
query = cased_query() | ||
|
||
page_search = PageDocument.faceted_search(query=query) | ||
results = page_search.execute() | ||
|
||
assert len(results) == 1 | ||
assert results[0]['project'] == 'kuma' | ||
assert results[0]['path'] == 'documentation' | ||
|
||
def test_search_combined_result(self, client, project): | ||
"""Check search result are combined of both `AND` and `OR` operator | ||
|
||
If query is `Foo Bar` then the result should be as following order: | ||
|
||
- Where both `Foo Bar` is present | ||
- Where `Foo` or `Bar` is present | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is the thought process that the first example -- both "foo" and "bar" are present -- will rank more highly than the second? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes. It will have higher in rank as its more relevant |
||
""" | ||
query = 'Official Support' | ||
page_search = PageDocument.faceted_search(query=query) | ||
results = page_search.execute() | ||
assert len(results) == 3 | ||
|
||
result_paths = [r.path for r in results] | ||
# ``open-source-philosophy`` page has both ``Official Support`` words | ||
# ``docker`` page has ``Support`` word | ||
# ``installation`` page has ``Official`` word | ||
expected_paths = ['open-source-philosophy', 'docker', 'installation'] | ||
|
||
assert result_paths == expected_paths |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to boost the
AND
in some way, or will it automatically sort higher?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AND
matched index will surely have higher score as it satisfies both of the query.It describes better: https://www.elastic.co/guide/en/elasticsearch/guide/current/bool-query.html#CO60-1
Or we can add
boost
value to the query explecitly!There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's leave it for now, and we can boost it later if we want.