Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug 1164205] Refactor search suggest api #2525

Merged
merged 2 commits into from
May 13, 2015
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
141 changes: 141 additions & 0 deletions docs/api.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,141 @@
===
API
===

SUMO has a series of API endpoints to access data.

.. contents::


Search suggest API
==================

:Endpoint: ``/api/2/search/suggest/``
:Method: ``GET``
:Content type: ``application/json``
:Response: ``application/json``

The search suggest API allows you to get back kb documents and aaq
questions that match specified arguments.

Arguments can be specified in the url querystring or in the HTTP
request body.


Required arguments
------------------

+-------------------+--------+--------------------------------------------------------+
|argument |type |notes |
+===================+========+========================================================+
|q |string |This is the text you're querying for. |
+-------------------+--------+--------------------------------------------------------+


Optional arguments
------------------

+-------------------+--------+--------------------------------------------------------+
|argument |type |notes |
+===================+========+========================================================+
|locale |string |default: ``settings.WIKI_DEFAULT_LANGUAGE`` |
| | | |
| | |The locale code to restrict results to. |
| | | |
| | |Examples: |
| | | |
| | |* ``en-US`` |
| | |* ``fr`` |
+-------------------+--------+--------------------------------------------------------+
|product |string |default: None |
| | | |
| | |The product to restrict results to. |
| | | |
| | |Example: |
| | | |
| | |* ``firefox`` |
+-------------------+--------+--------------------------------------------------------+
|max_documents |integer |default: 10 |
| | | |
| | |The maximum number of documents you want back. |
+-------------------+--------+--------------------------------------------------------+
|max_questions |integer |default: 10 |
| | | |
| | |The maximum number of questions you want back. |
+-------------------+--------+--------------------------------------------------------+


Responses
---------

All response bodies are in JSON.

HTTP 200: Success
~~~~~~~~~~~~~~~~~

With an HTTP 200, you'll get back a set of results in JSON.

::

{
"documents": [
{
"title": ... # title of kb article
"url": ... # url of kb article
"slug": ... # slug of kb article
"summary": ... # paragraph summary of kb article (plaintext)
}
...
],
"questions": [
{
"id": ... # integer id of the question
"answers": ... # list of answer ids
"content": ... # content of question (in html)
"created": ... # datetime stamp in iso-8601 format
"creator": ... # JSON object describing the creator
"involved": ... # list of JSON objects describing everyone who
participated in the question
"is_archived": ... # boolean for whether this question is archived
"is_locked": ... # boolean for whether this question is locked
"is_solved": ... # boolean for whether this question is solved
"is_spam": ... # boolean for whether this question is spam
"is_taken": ... # FIXME:
"last_answer": ... # id for the last answer
"num_answers": ... # total number of answers
"locale": ... # the locale for the question
"metadata": ... # metadata collected for the question
"tags": ... # tags for the qeustion
"num_votes_past_week": ... # the number of votes in the last week
"num_votes": ... # the total number of votes
"product": ... # the product
"solution": ... # id of answer marked as a solution if any
"taken_until": ... # FIXME:
"taken_by": ... # FIXME:
"title": ... # title of the question
"topic": ... # FIXME:
"updated_by": ... # FIXME:
"updated": ... # FIXME:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's a shitton of fields. But I guess that's what buddy up needed... or whoever was using this API

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a lot of data. I have no idea why questions produces that much data, but documents does not. I think this API endpoint needs some more thought. I was thinking that documenting it might highlight some of the curiosities.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided that the API shouldn't try and give only the minimal amount of data for "top level" results that aren't the children of another object. Instead most uses of an object give the full data available to the API. The only time the API sends reduced amounts of data is when an object is referenced from another objects as a foreign key, such as the creator field here. It is a lot of fields, yes, but I haven't seen any evidence that it is a problematic amount of fields. We chose to optimize for a few larger API responses instead of making smaller responses and possibly needing to make many requests to get the needed information.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that suggests someone should write up a bug for fixing documents so the returned results are more like questions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 I just filed bug 1164565.

},
...
]
}


Examples
--------

Using curl::

curl -X GET "http://localhost:8000/api/2/search/suggest/?q=videos"

curl -X GET "http://localhost:8000/api/2/search/suggest/?q=videos&max_documents=3&max_questions=3"

curl -X GET "http://localhost:8000/api/2/search/suggest/" \
-H "Content-Type: application/json" \
-d '
{
"q": "videos",
"max_documents": 3,
"max_questions": 0
}'
1 change: 1 addition & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@ Part 3: SUMO
.. toctree::
:maxdepth: 2

api
bots
deployments
sop
Expand Down
99 changes: 63 additions & 36 deletions kitsune/search/api.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
from django.conf import settings

from rest_framework import serializers
from rest_framework.decorators import api_view
from rest_framework.response import Response

Expand All @@ -11,49 +12,75 @@
from kitsune.wiki.models import DocumentMappingType


def positive_integer(value):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you need to check if it's an int in the first place?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The serializer has it as an IntegerField, so that should do the integer validation. Pretty sure one of the existing tests covers that scenario.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. I typed before scrolling

if value < 0:
raise serializers.ValidationError('This field must be positive.')


def valid_product(value):
if not value:
return

if not Product.objects.filter(slug=value).exists():
raise serializers.ValidationError(
'Could not find product with slug "{0}".'.format(value)
)


def valid_locale(value):
if not value:
return

if value not in settings.SUMO_LANGUAGES:
if value in settings.NON_SUPPORTED_LOCALES:
fallback = settings.NON_SUPPORTED_LOCALES[value] or settings.WIKI_DEFAULT_LANGUAGE
raise serializers.ValidationError(
'"{0}" is not supported, but has fallback locale "{1}".'.format(
value, fallback))
else:
raise serializers.ValidationError(
'Could not find locale "{0}".'.format(value)
)


class SuggestSerializer(serializers.Serializer):
q = serializers.CharField(required=True)
locale = serializers.CharField(
required=False, default=settings.WIKI_DEFAULT_LANGUAGE,
validators=[valid_locale])
product = serializers.CharField(
required=False, default='',
validators=[valid_product])
max_questions = serializers.IntegerField(
required=False, default=10,
validators=[positive_integer])
max_documents = serializers.IntegerField(
required=False, default=10,
validators=[positive_integer])


@api_view(['GET', 'POST'])
def suggest(request):
text = request.body or request.GET.get('q')
locale = request.GET.get('locale', settings.WIKI_DEFAULT_LANGUAGE)
product = request.GET.get('product')
max_questions = request.GET.get('max_questions', 10)
max_documents = request.GET.get('max_documents', 10)

errors = {}
try:
max_questions = int(max_questions)
except ValueError:
errors['max_questions'] = 'This field must be an integer.'
try:
max_documents = int(max_documents)
except ValueError:
errors['max_documents'] = 'This field must be an integer.'
if text is None:
errors['q'] = 'This field is required.'
if product is not None and not Product.objects.filter(slug=product).exists():
errors['product'] = 'Could not find product with slug "{0}".'.format(product)
if locale not in settings.SUMO_LANGUAGES:
if locale in settings.NON_SUPPORTED_LOCALES:
fallback = settings.NON_SUPPORTED_LOCALES[locale] or settings.WIKI_DEFAULT_LANGUAGE
errors['locale'] = (
'Locale "{0}" is not supported, but has fallback locale "{1}".'.format(
locale, fallback
)
)
else:
errors['locale'] = 'Could not find locale "{0}".'.format(locale)
if request.DATA and request.GET:
raise GenericAPIException(
400, 'Put all parameters either in the querystring or the HTTP request body.')

if errors:
raise GenericAPIException(400, errors)
serializer = SuggestSerializer(data=(request.DATA or request.GET))
if not serializer.is_valid():
raise GenericAPIException(400, serializer.errors)

searcher = (
es_utils.AnalyzerS()
.es(urls=settings.ES_URLS)
.indexes(es_utils.read_index('default')))

data = serializer.object

return Response({
'questions': _question_suggestions(searcher, text, locale, product, max_questions),
'documents': _document_suggestions(searcher, text, locale, product, max_documents),
'questions': _question_suggestions(
searcher, data['q'], data['locale'], data['product'], data['max_questions']),
'documents': _document_suggestions(
searcher, data['q'], data['locale'], data['product'], data['max_documents']),
})


Expand All @@ -66,9 +93,9 @@ def _question_suggestions(searcher, text, locale, product, max_results):
question_is_archived=False,
question_is_locked=False,
question_is_solved=True)
if product is not None:
if product:
search_filter &= es_utils.F(product=product)
if locale is not None:
if locale:
search_filter &= es_utils.F(question_locale=locale)

questions = []
Expand All @@ -91,7 +118,7 @@ def _document_suggestions(searcher, text, locale, product, max_results):
document_locale=locale,
document_is_archived=False)

if product is not None:
if product:
search_filter &= es_utils.F(product=product)

documents = []
Expand Down
51 changes: 42 additions & 9 deletions kitsune/search/tests/test_api.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
import json
from nose.tools import eq_

from rest_framework.test import APIClient
Expand Down Expand Up @@ -65,15 +66,15 @@ def test_invalid_product(self):
'q': 'search',
})
eq_(res.status_code, 400)
eq_(res.data['detail'], {'product': 'Could not find product with slug "nonexistant".'})
eq_(res.data['detail'], {'product': [u'Could not find product with slug "nonexistant".']})

def test_invalid_locale(self):
res = self.client.get(reverse('search.suggest'), {
'locale': 'bad-medicine',
'q': 'search',
})
eq_(res.status_code, 400)
eq_(res.data['detail'], {'locale': 'Could not find locale "bad-medicine".'})
eq_(res.data['detail'], {'locale': [u'Could not find locale "bad-medicine".']})

def test_invalid_fallback_locale_none_case(self):
# Test the locale -> locale case.
Expand All @@ -90,8 +91,8 @@ def test_invalid_fallback_locale_none_case(self):
eq_(res.status_code, 400)
eq_(
res.data['detail'],
{'locale': 'Locale "{0}" is not supported, but has fallback locale "{1}".'.format(
locale, fallback)}
{'locale': [u'"{0}" is not supported, but has fallback locale "{1}".'.format(
locale, fallback)]}
)

def test_invalid_fallback_locale_non_none_case(self):
Expand All @@ -109,8 +110,8 @@ def test_invalid_fallback_locale_non_none_case(self):
eq_(res.status_code, 400)
eq_(
res.data['detail'],
{'locale': 'Locale "{0}" is not supported, but has fallback locale "{1}".'.format(
locale, settings.WIKI_DEFAULT_LANGUAGE)}
{'locale': [u'"{0}" is not supported, but has fallback locale "{1}".'.format(
locale, settings.WIKI_DEFAULT_LANGUAGE)]}
)

def test_invalid_numbers(self):
Expand All @@ -121,14 +122,14 @@ def test_invalid_numbers(self):
})
eq_(res.status_code, 400)
eq_(res.data['detail'], {
'max_questions': 'This field must be an integer.',
'max_documents': 'This field must be an integer.',
'max_questions': [u'Enter a whole number.'],
'max_documents': [u'Enter a whole number.'],
})

def test_q_required(self):
res = self.client.get(reverse('search.suggest'))
eq_(res.status_code, 400)
eq_(res.data['detail'], {'q': 'This field is required.'})
eq_(res.data['detail'], {'q': [u'This field is required.']})

def test_it_works(self):
q1 = self._make_question()
Expand All @@ -139,6 +140,38 @@ def test_it_works(self):
eq_([q['id'] for q in req.data['questions']], [q1.id])
eq_([d['title'] for d in req.data['documents']], [d1.title])

def test_filters_in_postdata(self):
q1 = self._make_question()
d1 = self._make_document()
self.refresh()

data = json.dumps({'q': 'emails'})
# Note: Have to use .generic() because .get() will convert the
# data into querystring params and then it's clownshoes all
# the way down.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not 👠 dance shoes?

req = self.client.generic(
'GET', reverse('search.suggest'), data=data,
content_type='application/json')
eq_(req.status_code, 200)
eq_([q['id'] for q in req.data['questions']], [q1.id])
eq_([d['title'] for d in req.data['documents']], [d1.title])

def test_both_querystring_and_body_raises_error(self):
self._make_question()
self._make_document()
self.refresh()

data = json.dumps({'q': 'emails'})
# Note: Have to use .generic() because .get() will convert the
# data into querystring params and then it's clownshoes all
# the way down.
req = self.client.generic(
'GET', reverse('search.suggest') + '?max_documents=3', data=data,
content_type='application/json')
eq_(req.status_code, 400)
eq_(req.data,
{u'detail': 'Put all parameters either in the querystring or the HTTP request body.'})

def test_questions_max_results_0(self):
self._make_question()
self.refresh()
Expand Down