Search endpoint - request format #45

lukasgraf · 2015-11-17T14:08:34Z

I'd like to start off the discussion about implementing a service that handles searching / querying the Plone site, and hope that we can come to a decision on the fundamental points so I can start with an implementation.

I already took a stab at implementing the way search is currently outlined in docs/source/collections/searching.rst. However, I've encountered a couple issues with this approach. So I evaluated several different possible implementations, each with their own set of issues.

Common pros/cons:

GET:

Proper HTTP method for an idempotent read-only operation
If query string is used for parameters:
- Query needs to be serialized in some way
- URL length limitations (probably a non-issue in practice)
- Typing of parameters is tricky

POST:

Wrong HTTP method / not RESTful
No way to build an URL that points to search results (relevant for batching)

Note that in all examples that use a query string the parameters would need to be URL escaped.

1) `GET` with query params in query string (no type hints)

Example:

search?SearchableText=lorem&path.query=/Plone/my-folder&path.depth=2

This approach is using GET requests with parameters in query strings, without using Zope style type hints (:record, :list, etc.). Parameters that contain a dot in their key will be merged into corresponding dictionaries.

Advantages:

Simplified style for query string params - at least somewhat readable
Easy to construct for API consumers, at least for trivial cases

Disadvantages:

Arguments to index-specific query options can't be typed. This is the big one. Imagine a query like ?path.query=/Plone&path.depth=0. The 0 value for the depth option needs to be an integer when passed to catalog.searchResults(), otherwise the ExtendedPathIndex will fail with a TypeError. Given the query string syntax without the Zope type hints, there is no way for the API consumer to declare a type for these arguments, they'll all end up as a string in request.form.

This means that, even ignoring validation for now, with this approach the server side needs to do some type conversions. This is tricky, because there's no programmatic way to determine the required types for these type of index-specific options. All you get is a listing of query_options.

So we'd need to maintain a list of information objects that describe the required types of query options, not unlike to what @vangheem has done in collective/elasticsearch/indexes.py. These would need to cover at least the index types present in a standard Plone, and support some kind of extension mechanism (adapter lookup) to deal with other index types.

I've gotten an approach like this to work, but it's not pretty, and I wouldn't be looking forward to maintaining that.

2) `GET` with query params in query string (plus Zope type hints)

This would be the exact same style of query string that Plone currently uses for its @@search view.

Example:

search?SearchableText=lorem&path.query:record=/Plone/my-folder&path.depth:record:int=2

Advantages:

Typing is covered

Disadvantages

Rather ugly, hard to read
Those Zope type hints won't mean anything to anybody outside the Plone/Zope community, they're obscure and cumbersome
They're hardly documented. The only actual documentation I could find for them is on old.zope.org. How would we document this and explain it to developers that need to write API consumers?
Error handling for the Zope type hints is pretty limited. A TypeError with the value (but no key) is usually all you get

3) `GET` with an URL encoded JSON doc as single query string param

Example:

search?q={"path": {"query": "/plone/folder", "depth": 0}}

(Obviously would need to be URL encoded)

For the query, there's a single query string parameter q that contains an URL encoded JSON document that, when deserialized to a Python dictionary, contains a query that can be handed off to catalog.searchResults().

Advantages

Typing because of JSON
Relatively easy to produce for consumers
Easy to document

Disadvantages

Ugliest of them all (basically unreadable when URL encoded)
Two nested escaping contexts: JSON and URL encoding
Somewhat uncommon I think

4) `POST` with query as JSON document in body

Example Request:

POST /Plone/search

{
  "path": {
    "query": "\/plone\/folder",
    "depth": 0
  }
}

Advantages:

Ability to retain some typing information because ofthe use of JSON (but not all: datetime)
Much more readable than query string parameters, and more in line with the style of the rest API
Easier to produce (and less error-prone) for consumers (for anything other than trivial queries). Prepare a Python dictionary with a query that fits searchResults() and send it on its way with requests.post(url, json=query).

Disadvantages:

If we consider a search query a read-only and idempotent operation, POST is the wrong HTTP method. This might
- affect cacheability of responses
- trigger form resubmission prevention in browsers
Cannot generate meaningful prev/next batching links for POST requests

5) `GET` with query as JSON document in body

I briefly considered this as an approach (with a POST alternative), but the fact is, HTTP clients can't deal with GET + body very well, and ZPublisher can't either. So I think this is already out of the question, just mentioning it here for completeness.

Advantages:

Ability to retain some typing information because ofthe use of JSON (but not all: datetime)

Disadvantages:

Limited client support
Not doable in ZPublisher without some trickery
In violation to the HTTP spec, according to this post.

@tisto @bloodbare
So, moving forward, do you guys see any other options that I haven't covered? Should we swallow the red pill and continue implementing search as outlined in docs/source/collections/searching.rst, accepting that we need to maintain a list of index descriptions? Or do you see one of the mentioned approaches as a viable alternative?

The text was updated successfully, but these errors were encountered:

tisto · 2015-11-18T07:42:03Z

@lukasgraf thanks for the writeup! I tried to wrap my head around this issue as well and haven't reached a final conclusion yet.

You can ignore searching.rst. This was a very early draft before we even looked at existing solutions. I will try to find some time to do some more research. Though, this seems to be a topic that most API architects seem to avoid talking about. ;)

Just a remark regarding POST vs. GET. It seems that some people argue that you actually create a search with a query and therefore a POST request can also be correct in REST terms. Though, you are of course correct with the other disadvantages (caching, no URL, etc.).

bloodbare · 2015-11-18T08:04:12Z

Great job @lukasgraf !

I love your ideas, in some projects we are using JWT (jwt.io) to have the search query on a parameter as they can be complex. If we wanna support complex queries , the parameters get complex and jwt allows to define a JSON format.

I would support only GET verb.

It may be an option to have to endpoints ? one for text search and one for complex search ?

lukasgraf · 2015-11-18T14:13:37Z

@tisto

Just a remark regarding POST vs. GET. It seems that some people argue that you actually create a search with a query and therefore a POST request can also be correct in REST terms. Though, you are of course correct with the other disadvantages (caching, no URL, etc.).

Yes, I've also seen POST being used. I created a separate issue #46 for discussing a potential concept to maybe address some race conditions with batching - that would require a POST anyway.

@bloodbare

If we wanna support complex queries , the parameters get complex and jwt allows to define a JSON format.

Yes, for any non-trivial query I don't really see anything besides JSON as an option any more. We really don't want to write our own query DSL.

I would support only GET verb.

We (4tw) also have a strong tendency towards that, if we can make it work.

I'd argue that then means that 3) (GET + JSON in query string) emerges as a promising option.

It may be an option to have to endpoints ? one for text search and one for complex search ?

Yes, we also considered that. For now I'd like to focus on an implementation that can gracefully handle complex queries, because that's the hard case, but I could very well see a limited simple_search endpoint with a simplified syntax and reduced functionality.

jone · 2015-11-19T20:28:47Z

I'd love to have a simple JSON-payload rather than having to build a compliacated GET-querystring.
Because GET + payload is a problem I'm therefore in favor of using payloads with POST-requests (nr. 4.)

lukasgraf · 2015-11-19T20:35:59Z

For 3) it would be about as complicated as this 😜

query_json = quote_plus(json.dumps(query))
requests.get(url, params={'q': json_query})

Or with curl:

 curl -G -H 'Accept: application/json' $URL --data-urlencode 'q={"SearchableText": "foobar"}'

Or with httpie:

http -j GET $URL q=='{"SearchableText": "foobar"}'

Maybe offering both is the way to go? As long as the serialization format is JSON for both, implementation and documentation overhead should be minimal.

jone · 2015-11-20T10:30:55Z

That's for me ok as well. I think we should just implement a first version; it can be extended later..

lukasgraf · 2016-04-05T12:48:44Z

Updated:
The use of JSON only allows to retain some typing information, but not all (datetime isn't JSON serializable).

This significantly diminishes the advantage of the use of JSON as the query serialization format, because we still need to have a mechanism in place to reconstruct typing information based on knowledge about the index that's being queried.

Under these circumstances I really don't see a point in also offering POST anymore. While on the implementation side it's trivial to also support POST, it still needs to be tested and documented. Also, offering two ways to do the same thing goes against the Zen of Python There should be one-- and preferably only one --obvious way to do it., and might confuse our users.

For this reason, I think GET with query string and optional Zope type hints is the way to go. I'll update my search PR (#48) accordingly in the next few days and propose a GET only implementation.

lukasgraf · 2016-04-19T16:40:19Z

A first version implementing GET with query string has been merged with #48

lukasgraf added the 32 needs: review label Nov 17, 2015

lukasgraf mentioned this issue Nov 18, 2015

Race conditions with batching #46

Open

lukasgraf closed this as completed Apr 19, 2016

lukasgraf removed the 32 needs: review label Apr 19, 2016

tisto mentioned this issue Jun 2, 2022

Include users in groups and vice versa. List users: Allow filtering by groups. #1325

Merged

tisto mentioned this issue Aug 25, 2023

Undeprecate comma separated expansion parameters #1696

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Search endpoint - request format #45

Search endpoint - request format #45

lukasgraf commented Nov 17, 2015

tisto commented Nov 18, 2015

bloodbare commented Nov 18, 2015

lukasgraf commented Nov 18, 2015

jone commented Nov 19, 2015

lukasgraf commented Nov 19, 2015

jone commented Nov 20, 2015

lukasgraf commented Apr 5, 2016

lukasgraf commented Apr 19, 2016

Search endpoint - request format #45

Search endpoint - request format #45

Comments

lukasgraf commented Nov 17, 2015

1) GET with query params in query string (no type hints)

2) GET with query params in query string (plus Zope type hints)

3) GET with an URL encoded JSON doc as single query string param

4) POST with query as JSON document in body

5) GET with query as JSON document in body

tisto commented Nov 18, 2015

bloodbare commented Nov 18, 2015

lukasgraf commented Nov 18, 2015

jone commented Nov 19, 2015

lukasgraf commented Nov 19, 2015

jone commented Nov 20, 2015

lukasgraf commented Apr 5, 2016

lukasgraf commented Apr 19, 2016

1) `GET` with query params in query string (no type hints)

2) `GET` with query params in query string (plus Zope type hints)

3) `GET` with an URL encoded JSON doc as single query string param

4) `POST` with query as JSON document in body

5) `GET` with query as JSON document in body