-
Notifications
You must be signed in to change notification settings - Fork 710
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[bug 1164205] Refactor search suggest api #2525
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,141 @@ | ||
=== | ||
API | ||
=== | ||
|
||
SUMO has a series of API endpoints to access data. | ||
|
||
.. contents:: | ||
|
||
|
||
Search suggest API | ||
================== | ||
|
||
:Endpoint: ``/api/2/search/suggest/`` | ||
:Method: ``GET`` | ||
:Content type: ``application/json`` | ||
:Response: ``application/json`` | ||
|
||
The search suggest API allows you to get back kb documents and aaq | ||
questions that match specified arguments. | ||
|
||
Arguments can be specified in the url querystring or in the HTTP | ||
request body. | ||
|
||
|
||
Required arguments | ||
------------------ | ||
|
||
+-------------------+--------+--------------------------------------------------------+ | ||
|argument |type |notes | | ||
+===================+========+========================================================+ | ||
|q |string |This is the text you're querying for. | | ||
+-------------------+--------+--------------------------------------------------------+ | ||
|
||
|
||
Optional arguments | ||
------------------ | ||
|
||
+-------------------+--------+--------------------------------------------------------+ | ||
|argument |type |notes | | ||
+===================+========+========================================================+ | ||
|locale |string |default: ``settings.WIKI_DEFAULT_LANGUAGE`` | | ||
| | | | | ||
| | |The locale code to restrict results to. | | ||
| | | | | ||
| | |Examples: | | ||
| | | | | ||
| | |* ``en-US`` | | ||
| | |* ``fr`` | | ||
+-------------------+--------+--------------------------------------------------------+ | ||
|product |string |default: None | | ||
| | | | | ||
| | |The product to restrict results to. | | ||
| | | | | ||
| | |Example: | | ||
| | | | | ||
| | |* ``firefox`` | | ||
+-------------------+--------+--------------------------------------------------------+ | ||
|max_documents |integer |default: 10 | | ||
| | | | | ||
| | |The maximum number of documents you want back. | | ||
+-------------------+--------+--------------------------------------------------------+ | ||
|max_questions |integer |default: 10 | | ||
| | | | | ||
| | |The maximum number of questions you want back. | | ||
+-------------------+--------+--------------------------------------------------------+ | ||
|
||
|
||
Responses | ||
--------- | ||
|
||
All response bodies are in JSON. | ||
|
||
HTTP 200: Success | ||
~~~~~~~~~~~~~~~~~ | ||
|
||
With an HTTP 200, you'll get back a set of results in JSON. | ||
|
||
:: | ||
|
||
{ | ||
"documents": [ | ||
{ | ||
"title": ... # title of kb article | ||
"url": ... # url of kb article | ||
"slug": ... # slug of kb article | ||
"summary": ... # paragraph summary of kb article (plaintext) | ||
} | ||
... | ||
], | ||
"questions": [ | ||
{ | ||
"id": ... # integer id of the question | ||
"answers": ... # list of answer ids | ||
"content": ... # content of question (in html) | ||
"created": ... # datetime stamp in iso-8601 format | ||
"creator": ... # JSON object describing the creator | ||
"involved": ... # list of JSON objects describing everyone who | ||
participated in the question | ||
"is_archived": ... # boolean for whether this question is archived | ||
"is_locked": ... # boolean for whether this question is locked | ||
"is_solved": ... # boolean for whether this question is solved | ||
"is_spam": ... # boolean for whether this question is spam | ||
"is_taken": ... # FIXME: | ||
"last_answer": ... # id for the last answer | ||
"num_answers": ... # total number of answers | ||
"locale": ... # the locale for the question | ||
"metadata": ... # metadata collected for the question | ||
"tags": ... # tags for the qeustion | ||
"num_votes_past_week": ... # the number of votes in the last week | ||
"num_votes": ... # the total number of votes | ||
"product": ... # the product | ||
"solution": ... # id of answer marked as a solution if any | ||
"taken_until": ... # FIXME: | ||
"taken_by": ... # FIXME: | ||
"title": ... # title of the question | ||
"topic": ... # FIXME: | ||
"updated_by": ... # FIXME: | ||
"updated": ... # FIXME: | ||
}, | ||
... | ||
] | ||
} | ||
|
||
|
||
Examples | ||
-------- | ||
|
||
Using curl:: | ||
|
||
curl -X GET "http://localhost:8000/api/2/search/suggest/?q=videos" | ||
|
||
curl -X GET "http://localhost:8000/api/2/search/suggest/?q=videos&max_documents=3&max_questions=3" | ||
|
||
curl -X GET "http://localhost:8000/api/2/search/suggest/" \ | ||
-H "Content-Type: application/json" \ | ||
-d ' | ||
{ | ||
"q": "videos", | ||
"max_documents": 3, | ||
"max_questions": 0 | ||
}' |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -43,6 +43,7 @@ Part 3: SUMO | |
.. toctree:: | ||
:maxdepth: 2 | ||
|
||
api | ||
bots | ||
deployments | ||
sop | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,6 @@ | ||
from django.conf import settings | ||
|
||
from rest_framework import serializers | ||
from rest_framework.decorators import api_view | ||
from rest_framework.response import Response | ||
|
||
|
@@ -11,49 +12,75 @@ | |
from kitsune.wiki.models import DocumentMappingType | ||
|
||
|
||
def positive_integer(value): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. do you need to check if it's an There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The serializer has it as an IntegerField, so that should do the integer validation. Pretty sure one of the existing tests covers that scenario. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Makes sense. I typed before scrolling |
||
if value < 0: | ||
raise serializers.ValidationError('This field must be positive.') | ||
|
||
|
||
def valid_product(value): | ||
if not value: | ||
return | ||
|
||
if not Product.objects.filter(slug=value).exists(): | ||
raise serializers.ValidationError( | ||
'Could not find product with slug "{0}".'.format(value) | ||
) | ||
|
||
|
||
def valid_locale(value): | ||
if not value: | ||
return | ||
|
||
if value not in settings.SUMO_LANGUAGES: | ||
if value in settings.NON_SUPPORTED_LOCALES: | ||
fallback = settings.NON_SUPPORTED_LOCALES[value] or settings.WIKI_DEFAULT_LANGUAGE | ||
raise serializers.ValidationError( | ||
'"{0}" is not supported, but has fallback locale "{1}".'.format( | ||
value, fallback)) | ||
else: | ||
raise serializers.ValidationError( | ||
'Could not find locale "{0}".'.format(value) | ||
) | ||
|
||
|
||
class SuggestSerializer(serializers.Serializer): | ||
q = serializers.CharField(required=True) | ||
locale = serializers.CharField( | ||
required=False, default=settings.WIKI_DEFAULT_LANGUAGE, | ||
validators=[valid_locale]) | ||
product = serializers.CharField( | ||
required=False, default='', | ||
validators=[valid_product]) | ||
max_questions = serializers.IntegerField( | ||
required=False, default=10, | ||
validators=[positive_integer]) | ||
max_documents = serializers.IntegerField( | ||
required=False, default=10, | ||
validators=[positive_integer]) | ||
|
||
|
||
@api_view(['GET', 'POST']) | ||
def suggest(request): | ||
text = request.body or request.GET.get('q') | ||
locale = request.GET.get('locale', settings.WIKI_DEFAULT_LANGUAGE) | ||
product = request.GET.get('product') | ||
max_questions = request.GET.get('max_questions', 10) | ||
max_documents = request.GET.get('max_documents', 10) | ||
|
||
errors = {} | ||
try: | ||
max_questions = int(max_questions) | ||
except ValueError: | ||
errors['max_questions'] = 'This field must be an integer.' | ||
try: | ||
max_documents = int(max_documents) | ||
except ValueError: | ||
errors['max_documents'] = 'This field must be an integer.' | ||
if text is None: | ||
errors['q'] = 'This field is required.' | ||
if product is not None and not Product.objects.filter(slug=product).exists(): | ||
errors['product'] = 'Could not find product with slug "{0}".'.format(product) | ||
if locale not in settings.SUMO_LANGUAGES: | ||
if locale in settings.NON_SUPPORTED_LOCALES: | ||
fallback = settings.NON_SUPPORTED_LOCALES[locale] or settings.WIKI_DEFAULT_LANGUAGE | ||
errors['locale'] = ( | ||
'Locale "{0}" is not supported, but has fallback locale "{1}".'.format( | ||
locale, fallback | ||
) | ||
) | ||
else: | ||
errors['locale'] = 'Could not find locale "{0}".'.format(locale) | ||
if request.DATA and request.GET: | ||
raise GenericAPIException( | ||
400, 'Put all parameters either in the querystring or the HTTP request body.') | ||
|
||
if errors: | ||
raise GenericAPIException(400, errors) | ||
serializer = SuggestSerializer(data=(request.DATA or request.GET)) | ||
if not serializer.is_valid(): | ||
raise GenericAPIException(400, serializer.errors) | ||
|
||
searcher = ( | ||
es_utils.AnalyzerS() | ||
.es(urls=settings.ES_URLS) | ||
.indexes(es_utils.read_index('default'))) | ||
|
||
data = serializer.object | ||
|
||
return Response({ | ||
'questions': _question_suggestions(searcher, text, locale, product, max_questions), | ||
'documents': _document_suggestions(searcher, text, locale, product, max_documents), | ||
'questions': _question_suggestions( | ||
searcher, data['q'], data['locale'], data['product'], data['max_questions']), | ||
'documents': _document_suggestions( | ||
searcher, data['q'], data['locale'], data['product'], data['max_documents']), | ||
}) | ||
|
||
|
||
|
@@ -66,9 +93,9 @@ def _question_suggestions(searcher, text, locale, product, max_results): | |
question_is_archived=False, | ||
question_is_locked=False, | ||
question_is_solved=True) | ||
if product is not None: | ||
if product: | ||
search_filter &= es_utils.F(product=product) | ||
if locale is not None: | ||
if locale: | ||
search_filter &= es_utils.F(question_locale=locale) | ||
|
||
questions = [] | ||
|
@@ -91,7 +118,7 @@ def _document_suggestions(searcher, text, locale, product, max_results): | |
document_locale=locale, | ||
document_is_archived=False) | ||
|
||
if product is not None: | ||
if product: | ||
search_filter &= es_utils.F(product=product) | ||
|
||
documents = [] | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,4 @@ | ||
import json | ||
from nose.tools import eq_ | ||
|
||
from rest_framework.test import APIClient | ||
|
@@ -65,15 +66,15 @@ def test_invalid_product(self): | |
'q': 'search', | ||
}) | ||
eq_(res.status_code, 400) | ||
eq_(res.data['detail'], {'product': 'Could not find product with slug "nonexistant".'}) | ||
eq_(res.data['detail'], {'product': [u'Could not find product with slug "nonexistant".']}) | ||
|
||
def test_invalid_locale(self): | ||
res = self.client.get(reverse('search.suggest'), { | ||
'locale': 'bad-medicine', | ||
'q': 'search', | ||
}) | ||
eq_(res.status_code, 400) | ||
eq_(res.data['detail'], {'locale': 'Could not find locale "bad-medicine".'}) | ||
eq_(res.data['detail'], {'locale': [u'Could not find locale "bad-medicine".']}) | ||
|
||
def test_invalid_fallback_locale_none_case(self): | ||
# Test the locale -> locale case. | ||
|
@@ -90,8 +91,8 @@ def test_invalid_fallback_locale_none_case(self): | |
eq_(res.status_code, 400) | ||
eq_( | ||
res.data['detail'], | ||
{'locale': 'Locale "{0}" is not supported, but has fallback locale "{1}".'.format( | ||
locale, fallback)} | ||
{'locale': [u'"{0}" is not supported, but has fallback locale "{1}".'.format( | ||
locale, fallback)]} | ||
) | ||
|
||
def test_invalid_fallback_locale_non_none_case(self): | ||
|
@@ -109,8 +110,8 @@ def test_invalid_fallback_locale_non_none_case(self): | |
eq_(res.status_code, 400) | ||
eq_( | ||
res.data['detail'], | ||
{'locale': 'Locale "{0}" is not supported, but has fallback locale "{1}".'.format( | ||
locale, settings.WIKI_DEFAULT_LANGUAGE)} | ||
{'locale': [u'"{0}" is not supported, but has fallback locale "{1}".'.format( | ||
locale, settings.WIKI_DEFAULT_LANGUAGE)]} | ||
) | ||
|
||
def test_invalid_numbers(self): | ||
|
@@ -121,14 +122,14 @@ def test_invalid_numbers(self): | |
}) | ||
eq_(res.status_code, 400) | ||
eq_(res.data['detail'], { | ||
'max_questions': 'This field must be an integer.', | ||
'max_documents': 'This field must be an integer.', | ||
'max_questions': [u'Enter a whole number.'], | ||
'max_documents': [u'Enter a whole number.'], | ||
}) | ||
|
||
def test_q_required(self): | ||
res = self.client.get(reverse('search.suggest')) | ||
eq_(res.status_code, 400) | ||
eq_(res.data['detail'], {'q': 'This field is required.'}) | ||
eq_(res.data['detail'], {'q': [u'This field is required.']}) | ||
|
||
def test_it_works(self): | ||
q1 = self._make_question() | ||
|
@@ -139,6 +140,38 @@ def test_it_works(self): | |
eq_([q['id'] for q in req.data['questions']], [q1.id]) | ||
eq_([d['title'] for d in req.data['documents']], [d1.title]) | ||
|
||
def test_filters_in_postdata(self): | ||
q1 = self._make_question() | ||
d1 = self._make_document() | ||
self.refresh() | ||
|
||
data = json.dumps({'q': 'emails'}) | ||
# Note: Have to use .generic() because .get() will convert the | ||
# data into querystring params and then it's clownshoes all | ||
# the way down. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. not 👠 dance shoes? |
||
req = self.client.generic( | ||
'GET', reverse('search.suggest'), data=data, | ||
content_type='application/json') | ||
eq_(req.status_code, 200) | ||
eq_([q['id'] for q in req.data['questions']], [q1.id]) | ||
eq_([d['title'] for d in req.data['documents']], [d1.title]) | ||
|
||
def test_both_querystring_and_body_raises_error(self): | ||
self._make_question() | ||
self._make_document() | ||
self.refresh() | ||
|
||
data = json.dumps({'q': 'emails'}) | ||
# Note: Have to use .generic() because .get() will convert the | ||
# data into querystring params and then it's clownshoes all | ||
# the way down. | ||
req = self.client.generic( | ||
'GET', reverse('search.suggest') + '?max_documents=3', data=data, | ||
content_type='application/json') | ||
eq_(req.status_code, 400) | ||
eq_(req.data, | ||
{u'detail': 'Put all parameters either in the querystring or the HTTP request body.'}) | ||
|
||
def test_questions_max_results_0(self): | ||
self._make_question() | ||
self.refresh() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's a shitton of fields. But I guess that's what buddy up needed... or whoever was using this API
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a lot of data. I have no idea why questions produces that much data, but documents does not. I think this API endpoint needs some more thought. I was thinking that documenting it might highlight some of the curiosities.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I decided that the API shouldn't try and give only the minimal amount of data for "top level" results that aren't the children of another object. Instead most uses of an object give the full data available to the API. The only time the API sends reduced amounts of data is when an object is referenced from another objects as a foreign key, such as the
creator
field here. It is a lot of fields, yes, but I haven't seen any evidence that it is a problematic amount of fields. We chose to optimize for a few larger API responses instead of making smaller responses and possibly needing to make many requests to get the needed information.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that suggests someone should write up a bug for fixing documents so the returned results are more like questions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 I just filed bug 1164565.