Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add search for DomainData objects #5290

Merged
merged 22 commits into from Apr 22, 2019
Merged

Add search for DomainData objects #5290

merged 22 commits into from Apr 22, 2019

Conversation

@ericholscher
Copy link
Member

@ericholscher ericholscher commented Feb 13, 2019

This adds search indexing for DomainData objects, along with cleaning up our existing search. The major changes are:

  • Adds Code API types to the site and project search. The UX of these results is similar to how Sphinx displays them.
  • Removes the elastic_project_search view, and reuses the same view & templates as the site search. This also means we now have faceted search for dashboard project search 馃帀
  • Adds an experiment AllSearch which searches across all our indexes to return results. I'm hoping to eventually move to this as a default search type, so we can display all the types. This also allows us to facet by document type, which is pretty neat.

v2

A v2 of this will integrate the Code API & Page search results into one stream, which is what the AllSearch is currently doing -- it's implemented and working on the site search, but the UX is not ready to show to users (you can hit it by manually changing your type to all in the URL).

The v2 for integrating this into the docsearch API and indoc search will likely require #5235 first, so that we can use multiple serializers for the data, so it's a larger refactor.

Unifying the modeling between ImportedFile and SphinxDomain is also a goal. We likely need another model as well for Sphinx Domain Type's (eg. std:doc or py:function, that keep track of metadata about them (eg. display_name). That will help make this data feel much more nicely factored.

Depends on readthedocs/readthedocs-sphinx-ext#62

Closes #4289

Screenshots

Site domain search

screenshot 2019-03-01 11 30 21

Project search

screenshot 2019-03-01 11 31 24

@stsewd
Copy link
Member

@stsewd stsewd commented Feb 28, 2019

Now we can change the base branch to master here I guess

@ericholscher ericholscher changed the base branch from intersphinx-modeling to master Feb 28, 2019
@ericholscher ericholscher requested a review from Mar 1, 2019
@ericholscher ericholscher requested a review from safwanrahman Mar 1, 2019
@@ -1351,7 +1384,7 @@ def _manage_imported_files(version, path, commit):
created_html_files.append(obj)

# Send bulk_post_create signal for bulk indexing to Elasticsearch
bulk_post_create.send(sender=HTMLFile, instance_list=created_html_files)
bulk_post_create.send(sender=PageDocument, instance_list=created_html_files)
Copy link
Member

@safwanrahman safwanrahman Mar 1, 2019

What is the reason to change the sender from model instanc to document instance? the instance_list is HTMLFile object list, but you are sending PageDocument as sender.

created_sphinx_domains.append(obj)

# Send bulk_post_create signal for bulk indexing to Elasticsearch
bulk_post_create.send(sender=SphinxDomainDocument, instance_list=created_sphinx_domains)
Copy link
Member

@safwanrahman safwanrahman Mar 1, 2019

I think you should send the signal with the SphinxDomain as sender.

s = super().search()
s = s.source(exclude=['content', 'headers'])
# Return 25 results
return s[:25]
Copy link
Member

@safwanrahman safwanrahman Mar 1, 2019

I think this limiting result should be done in the consumer end, like in view end. Doing this in here will limit the usage of the search api.

@@ -106,7 +106,8 @@ def index_missing_objects(app_label, model_name, document_class, index_generatio
"""
model = apps.get_model(app_label, model_name)
document = _get_document(model=model, document_class=document_class)
queryset = document().get_queryset().exclude(modified_date__lte=index_generation_time)
query_string = '{}__lte'.format(document.modified_model_field)
queryset = document().get_queryset().exclude(**{query_string: index_generation_time})
Copy link
Member

@safwanrahman safwanrahman Mar 1, 2019

Good thing done! I have done this thing in kitsune project.

Copy link
Member

@stsewd stsewd left a comment

I don't fully understand the search part completely, but the rest looks good

project=version.project,
version=version,
domain=domain,
name=name,
display_name=display_name,
type=_type,
type_display=types.get(domain + ':' + _type, ''),
Copy link
Member

@stsewd stsewd Mar 4, 2019

We can use f-stirngs here 馃槉

SphinxDomain.objects.filter(project=version.project,
version=version
).exclude(commit=commit).delete()
created_sphinx_domains.append(obj)
Copy link
Member

@stsewd stsewd Mar 4, 2019

Don't we need to check if created before adding obj?

r'^http://(.+)\.readthedocs\.io$',
r'^https://(.+)\.readthedocs\.io$',
'^http://(.+)\.readthedocs\.io$',
'^https://(.+)\.readthedocs\.io$'
Copy link
Member

@stsewd stsewd Mar 4, 2019

Why are we removing the raw string prefix?

@ericholscher ericholscher requested review from and stsewd Mar 19, 2019
@ericholscher ericholscher requested a review from safwanrahman Mar 19, 2019
stsewd
stsewd approved these changes Mar 20, 2019
Copy link
Member

@stsewd stsewd left a comment

I'm still not sure about the search part, but the projects/tasks looks ok for me

@stsewd stsewd requested a review from Mar 20, 2019
@codecov
Copy link

@codecov codecov bot commented Apr 4, 2019

Codecov Report

Merging #5290 into master will decrease coverage by 0.2%.
The diff coverage is 68.67%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #5290      +/-   ##
==========================================
- Coverage   78.71%   78.51%   -0.21%     
==========================================
  Files         164      164              
  Lines       10101    10188      +87     
  Branches     1281     1298      +17     
==========================================
+ Hits         7951     7999      +48     
- Misses       1820     1859      +39     
  Partials      330      330
Impacted Files Coverage 螖
readthedocs/projects/urls/public.py 100% <酶> (酶) 猬嗭笍
readthedocs/search/tasks.py 29.68% <0%> (-0.48%) 猬囷笍
readthedocs/projects/models.py 82.96% <100%> (+0.03%) 猬嗭笍
readthedocs/projects/signals.py 100% <100%> (酶) 猬嗭笍
readthedocs/sphinx_domains/models.py 76.66% <100%> (+1.66%) 猬嗭笍
readthedocs/search/faceted_search.py 100% <100%> (酶) 猬嗭笍
readthedocs/projects/admin.py 71.91% <100%> (+0.19%) 猬嗭笍
readthedocs/sphinx_domains/admin.py 100% <100%> (酶) 猬嗭笍
readthedocs/restapi/urls.py 89.65% <100%> (酶) 猬嗭笍
readthedocs/projects/tasks.py 68.46% <17.14%> (-2.86%) 猬囷笍
... and 5 more

Continue to review full report at Codecov.

Legend - Click here to learn more
螖 = absolute <relative> (impact), 酶 = not affected, ? = missing data
Powered by Codecov. Last update 511741c...e1385a5. Read the comment docs.

1 similar comment
@codecov
Copy link

@codecov codecov bot commented Apr 4, 2019

Codecov Report

Merging #5290 into master will decrease coverage by 0.2%.
The diff coverage is 68.67%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #5290      +/-   ##
==========================================
- Coverage   78.71%   78.51%   -0.21%     
==========================================
  Files         164      164              
  Lines       10101    10188      +87     
  Branches     1281     1298      +17     
==========================================
+ Hits         7951     7999      +48     
- Misses       1820     1859      +39     
  Partials      330      330
Impacted Files Coverage 螖
readthedocs/projects/urls/public.py 100% <酶> (酶) 猬嗭笍
readthedocs/search/tasks.py 29.68% <0%> (-0.48%) 猬囷笍
readthedocs/projects/models.py 82.96% <100%> (+0.03%) 猬嗭笍
readthedocs/projects/signals.py 100% <100%> (酶) 猬嗭笍
readthedocs/sphinx_domains/models.py 76.66% <100%> (+1.66%) 猬嗭笍
readthedocs/search/faceted_search.py 100% <100%> (酶) 猬嗭笍
readthedocs/projects/admin.py 71.91% <100%> (+0.19%) 猬嗭笍
readthedocs/sphinx_domains/admin.py 100% <100%> (酶) 猬嗭笍
readthedocs/restapi/urls.py 89.65% <100%> (酶) 猬嗭笍
readthedocs/projects/tasks.py 68.46% <17.14%> (-2.86%) 猬囷笍
... and 5 more

Continue to review full report at Codecov.

Legend - Click here to learn more
螖 = absolute <relative> (impact), 酶 = not affected, ? = missing data
Powered by Codecov. Last update 511741c...e1385a5. Read the comment docs.

@ericholscher
Copy link
Member Author

@ericholscher ericholscher commented Apr 22, 2019

Going to merge this so it will go out in the next deploy 馃帀

@ericholscher ericholscher merged commit bbae0dc into master Apr 22, 2019
1 check passed
@delete-merged-branch delete-merged-branch bot deleted the domain-search branch Apr 22, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

3 participants