Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix #4265] Port Document search API for Elasticsearch 6.x #4309

Merged
merged 9 commits into from Jul 6, 2018

Conversation

@safwanrahman
Copy link
Member

@safwanrahman safwanrahman commented Jun 27, 2018

This is the first etaration of making a API for Documentation Search. Its not finalized yet and the work is not finished yet.
Need more work on highlighting as well as test.
Needed to do some refactor to keep it DRY.
@ericholscher Can you look for a slight review?

@safwanrahman safwanrahman requested a review from ericholscher Jun 27, 2018
@safwanrahman safwanrahman added this to Backlog in Search update via automation Jun 27, 2018
@safwanrahman safwanrahman self-assigned this Jun 27, 2018
@safwanrahman safwanrahman moved this from Backlog to In progress in Search update Jun 27, 2018
@safwanrahman
Copy link
Member Author

@safwanrahman safwanrahman commented Jun 27, 2018

Currently the API looks something like this!
image

Copy link
Member

@ericholscher ericholscher left a comment

I think most of my questions are around whether we're using proper querysets, or we have things that look like querysets, but are actually search results that we're filtering. Could use a bit more explanation, and code comments if that is what we're doing.

@@ -60,14 +61,19 @@ class Meta(object):
content = fields.TextField(attr='processed_json.content')
path = fields.TextField(attr='processed_json.path')

# Fields to perform search with weight
search_fields = ['title^10', 'headers^5', 'content']
Copy link
Member

@ericholscher ericholscher Jun 28, 2018

Is this used by the DocType class directly, or are we storing it here to add later? If the prior, we should probably set it in an __init__ method or somewhere else, so it doesn't get confused with the other data here.

Copy link
Member Author

@safwanrahman safwanrahman Jun 28, 2018

Its not actually used by DocType class, but we are using it in class method. I dont know if keeping it in __init__ method will make it work in class method!

Copy link
Member

@ericholscher ericholscher Jun 28, 2018

Hrm, good point. I guess it's fine for now then.


def get_queryset(self):
query = self.request.query_params.get('query')
queryset = PageDocument.search(query=query)
Copy link
Member

@ericholscher ericholscher Jun 28, 2018

Does search() return a queryset? Seems like it would return a search object or similar.

Copy link
Member Author

@safwanrahman safwanrahman Jun 28, 2018

Yeah. it returns a Search object.

@@ -67,6 +67,7 @@
url(r'^api/', include(v1_api.urls)),
url(r'^api/v2/', include('readthedocs.restapi.urls')),
url(r'^api-auth/', include('rest_framework.urls', namespace='rest_framework')),
url(r'^api/search/', PageSearchAPIView.as_view()),
Copy link
Member

@ericholscher ericholscher Jun 28, 2018

This should likely continue to live in v2 for now. Perhaps as api/v2/indocsearch or something, so it can live beside the old docsearch during rollout?

Copy link
Member Author

@safwanrahman safwanrahman Jun 28, 2018

Yeah! I added it for testing purpose actually!

project_slug = request.query_params.get('project')
project_slug_list = get_project_slug_list_or_404(project_slug=project_slug,
user=request.user)
return queryset.filter('terms', project=project_slug_list)
Copy link
Member

@ericholscher ericholscher Jun 28, 2018

Feels like we're conflating querysets and Search queries here again. It feels a bit odd.

Copy link
Member Author

@safwanrahman safwanrahman Jun 28, 2018

Yeah! I understand. Its not queryset, but a Search object but it is similar to queryset.

@safwanrahman
Copy link
Member Author

@safwanrahman safwanrahman commented Jun 28, 2018

I think most of my questions are around whether we're using proper querysets, or we have things that look like querysets, but are actually search results that we're filtering.

@ericholscher We actually have a Search object that act closely similar to queryset. So we can use the Search object as queryset to easily implement the API!

@safwanrahman
Copy link
Member Author

@safwanrahman safwanrahman commented Jun 30, 2018

@ericholscher Its ready for a final review.
r?

Copy link
Member

@ericholscher ericholscher left a comment

Looks good to me. We still need to wire this up to the front-end JS right?

@@ -48,7 +48,7 @@
url(r'index_search/',
search_views.index_search,
name='index_search'),
url(r'search/$', views.search_views.search, name='api_search'),
# url(r'search/$', views.search_views.search, name='api_search'),
Copy link
Member

@ericholscher ericholscher Jul 2, 2018

Let's not remove this yet, until we've deployed a full version to replace it.

Copy link
Member Author

@safwanrahman safwanrahman Jul 2, 2018

I dont know, but with this, the reverse function act differently.

Copy link
Member Author

@safwanrahman safwanrahman Jul 2, 2018

Without having comment, its returns something.

resolve('/api/v2/docsearch/')
>>> ResolverMatch(func=readthedocs.restapi.views.search_views.search, args=(), kwargs={}, url_name=api_search, app_names=[], namespaces=[])

But it should return

>>> ResolverMatch(func=readthedocs.search.api.PageSearchAPIView, args=(), kwargs={}, url_name=doc_search, app_names=[], namespaces=[])

I have no idea why this is happening. maybe a bug in django?

Copy link
Member

@ericholscher ericholscher Jul 2, 2018

Believe it's a bad regex. It should be r'^search/$' -- now it's just catching anything that ends in /api/v2/<anythinghere>search/

Copy link
Member Author

@safwanrahman safwanrahman Jul 2, 2018

Thanks @ericholscher . I did not suspect the regex can be wrong! It worked!

# Run bool query with should, so it returns result where either of the query matches
bool_query = Bool(should=all_queries)
search = search.query(bool_query)
search = search.query(query)
Copy link
Member

@ericholscher ericholscher Jul 2, 2018

Feels like we shouldn't be overwriting the object here. Will we return something invalid if there is no query because we have the same object name?

Copy link
Member Author

@safwanrahman safwanrahman Jul 2, 2018

I have actually a mixed feeling about this. Do you have any idea about how to do this withtout overriding the search method?

Copy link
Member

@ericholscher ericholscher Jul 2, 2018

I mean mostly just the name of the object. search = search.query(query) -- is that similar to a queryset = queryset.filter()? It just feels like a weird bit of syntax to rewrite the name of the object over again.

Copy link
Member Author

@safwanrahman safwanrahman Jul 3, 2018

Yeah. search = search.query(query) -- is that similar to a queryset = queryset.filter(). If we do not overwrite the same object, then another variable need to be assigned and check if that is not None

@@ -66,6 +66,8 @@
api_urls = [
url(r'^api/', include(v1_api.urls)),
url(r'^api/v2/', include('readthedocs.restapi.urls')),
# Keep the `doc_search` at root level, so the test does not fail for other API
url(r'^api/v2/docsearch/$', PageSearchAPIView.as_view(), name='doc_search'),
Copy link
Member

@ericholscher ericholscher Jul 2, 2018

Not sure I follow. What tests were failing?

Copy link
Member Author

@safwanrahman safwanrahman Jul 2, 2018

@safwanrahman
Copy link
Member Author

@safwanrahman safwanrahman commented Jul 3, 2018

@ericholscher Anything left for fixing?

@safwanrahman
Copy link
Member Author

@safwanrahman safwanrahman commented Jul 4, 2018

@ericholscher while I was working in the frontend, I realized that the link of document is needed. I have implemented that. Can you please make a review?

Copy link
Member

@ericholscher ericholscher left a comment

Looks good with a few small nits, and fixing the tests on py3. Glad we caught the link addition from the client side, I had a feeling we'd need more to render the HTML properly :)

context['projects_info'] = self.get_projects_info()
return context

def _get_all_projects(self):
Copy link
Member

@ericholscher ericholscher Jul 5, 2018

Isn't this the same logic as in get_project_slug_list_or_404? It should probably use the util function for this I think? Perhaps it could be get_project_list_or_404 and we can pass slug_only param to it or something?


def get_serializer_context(self):
context = super(PageSearchAPIView, self).get_serializer_context()
context['projects_info'] = self.get_projects_info()
Copy link
Member

@ericholscher ericholscher Jul 5, 2018

projects_info I think is a bad name. It seems like we're just getting the docs_url from it for now, so we should probably be more clear about that. Perhaps context['project_urls'] or something?

@safwanrahman
Copy link
Member Author

@safwanrahman safwanrahman commented Jul 5, 2018

@ericholscher Fixed python3 compatibility and changes as you reviewed. r?

Copy link
Member

@ericholscher ericholscher left a comment

Looks great. 👍

@ericholscher ericholscher merged commit d6638b9 into readthedocs:search_upgrade Jul 6, 2018
1 check passed
Search update automation moved this from In progress to Done Jul 6, 2018
@safwanrahman safwanrahman deleted the search_api branch Jul 7, 2018
safwanrahman added a commit to safwanrahman/readthedocs.org that referenced this issue Jul 16, 2018
[fix readthedocs#4265] Port Document search API for Elasticsearch 6.x
safwanrahman added a commit to safwanrahman/readthedocs.org that referenced this issue Jul 16, 2018
[fix readthedocs#4265] Port Document search API for Elasticsearch 6.x
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Search update
  
Done
Linked issues

Successfully merging this pull request may close these issues.

None yet

2 participants