Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search endpoint - request format #45

Closed
lukasgraf opened this issue Nov 17, 2015 · 8 comments
Closed

Search endpoint - request format #45

lukasgraf opened this issue Nov 17, 2015 · 8 comments

Comments

@lukasgraf
Copy link
Member

I'd like to start off the discussion about implementing a service that handles searching / querying the Plone site, and hope that we can come to a decision on the fundamental points so I can start with an implementation.

I already took a stab at implementing the way search is currently outlined in docs/source/collections/searching.rst. However, I've encountered a couple issues with this approach. So I evaluated several different possible implementations, each with their own set of issues.

Common pros/cons:

GET:

  • Proper HTTP method for an idempotent read-only operation
  • If query string is used for parameters:
    • Query needs to be serialized in some way
    • URL length limitations (probably a non-issue in practice)
    • Typing of parameters is tricky

POST:

  • Wrong HTTP method / not RESTful
  • No way to build an URL that points to search results (relevant for batching)

Note that in all examples that use a query string the parameters would need to be URL escaped.


1) GET with query params in query string (no type hints)

Example:

search?SearchableText=lorem&path.query=/Plone/my-folder&path.depth=2

This approach is using GET requests with parameters in query strings, without using Zope style type hints (:record, :list, etc.). Parameters that contain a dot in their key will be merged into corresponding dictionaries.

Advantages:

  • Simplified style for query string params - at least somewhat readable
  • Easy to construct for API consumers, at least for trivial cases

Disadvantages:

  • Arguments to index-specific query options can't be typed. This is the big one. Imagine a query like ?path.query=/Plone&path.depth=0. The 0 value for the depth option needs to be an integer when passed to catalog.searchResults(), otherwise the ExtendedPathIndex will fail with a TypeError. Given the query string syntax without the Zope type hints, there is no way for the API consumer to declare a type for these arguments, they'll all end up as a string in request.form.

    This means that, even ignoring validation for now, with this approach the server side needs to do some type conversions. This is tricky, because there's no programmatic way to determine the required types for these type of index-specific options. All you get is a listing of query_options.

    So we'd need to maintain a list of information objects that describe the required types of query options, not unlike to what @vangheem has done in collective/elasticsearch/indexes.py. These would need to cover at least the index types present in a standard Plone, and support some kind of extension mechanism (adapter lookup) to deal with other index types.

    I've gotten an approach like this to work, but it's not pretty, and I wouldn't be looking forward to maintaining that.


2) GET with query params in query string (plus Zope type hints)

This would be the exact same style of query string that Plone currently uses for its @@search view.

Example:

search?SearchableText=lorem&path.query:record=/Plone/my-folder&path.depth:record:int=2

Advantages:

  • Typing is covered

Disadvantages

  • Rather ugly, hard to read
  • Those Zope type hints won't mean anything to anybody outside the Plone/Zope community, they're obscure and cumbersome
  • They're hardly documented. The only actual documentation I could find for them is on old.zope.org. How would we document this and explain it to developers that need to write API consumers?
  • Error handling for the Zope type hints is pretty limited. A TypeError with the value (but no key) is usually all you get

3) GET with an URL encoded JSON doc as single query string param

Example:

search?q={"path": {"query": "/plone/folder", "depth": 0}}

(Obviously would need to be URL encoded)

For the query, there's a single query string parameter q that contains an URL encoded JSON document that, when deserialized to a Python dictionary, contains a query that can be handed off to catalog.searchResults().

Advantages

  • Typing because of JSON
  • Relatively easy to produce for consumers
  • Easy to document

Disadvantages

  • Ugliest of them all (basically unreadable when URL encoded)
  • Two nested escaping contexts: JSON and URL encoding
  • Somewhat uncommon I think

4) POST with query as JSON document in body

Example Request:

POST /Plone/search

{
  "path": {
    "query": "\/plone\/folder",
    "depth": 0
  }
}

Advantages:

  • Ability to retain some typing information because ofthe use of JSON (but not all: datetime)
  • Much more readable than query string parameters, and more in line with the style of the rest API
  • Easier to produce (and less error-prone) for consumers (for anything other than trivial queries). Prepare a Python dictionary with a query that fits searchResults() and send it on its way with requests.post(url, json=query).

Disadvantages:

  • If we consider a search query a read-only and idempotent operation, POST is the wrong HTTP method. This might
    • affect cacheability of responses
    • trigger form resubmission prevention in browsers
  • Cannot generate meaningful prev/next batching links for POST requests

5) GET with query as JSON document in body

I briefly considered this as an approach (with a POST alternative), but the fact is, HTTP clients can't deal with GET + body very well, and ZPublisher can't either. So I think this is already out of the question, just mentioning it here for completeness.

Advantages:

  • Ability to retain some typing information because ofthe use of JSON (but not all: datetime)

Disadvantages:

  • Limited client support
  • Not doable in ZPublisher without some trickery
  • In violation to the HTTP spec, according to this post.

@tisto @bloodbare
So, moving forward, do you guys see any other options that I haven't covered? Should we swallow the red pill and continue implementing search as outlined in docs/source/collections/searching.rst, accepting that we need to maintain a list of index descriptions? Or do you see one of the mentioned approaches as a viable alternative?

@tisto
Copy link
Sponsor Member

tisto commented Nov 18, 2015

@lukasgraf thanks for the writeup! I tried to wrap my head around this issue as well and haven't reached a final conclusion yet.

You can ignore searching.rst. This was a very early draft before we even looked at existing solutions. I will try to find some time to do some more research. Though, this seems to be a topic that most API architects seem to avoid talking about. ;)

Just a remark regarding POST vs. GET. It seems that some people argue that you actually create a search with a query and therefore a POST request can also be correct in REST terms. Though, you are of course correct with the other disadvantages (caching, no URL, etc.).

@bloodbare
Copy link
Member

Great job @lukasgraf !

I love your ideas, in some projects we are using JWT (jwt.io) to have the search query on a parameter as they can be complex. If we wanna support complex queries , the parameters get complex and jwt allows to define a JSON format.

I would support only GET verb.

It may be an option to have to endpoints ? one for text search and one for complex search ?

@lukasgraf
Copy link
Member Author

@tisto

Just a remark regarding POST vs. GET. It seems that some people argue that you actually create a search with a query and therefore a POST request can also be correct in REST terms. Though, you are of course correct with the other disadvantages (caching, no URL, etc.).

Yes, I've also seen POST being used. I created a separate issue #46 for discussing a potential concept to maybe address some race conditions with batching - that would require a POST anyway.

@bloodbare

If we wanna support complex queries , the parameters get complex and jwt allows to define a JSON format.

Yes, for any non-trivial query I don't really see anything besides JSON as an option any more. We really don't want to write our own query DSL.

I would support only GET verb.

We (4tw) also have a strong tendency towards that, if we can make it work.

I'd argue that then means that 3) (GET + JSON in query string) emerges as a promising option.

It may be an option to have to endpoints ? one for text search and one for complex search ?

Yes, we also considered that. For now I'd like to focus on an implementation that can gracefully handle complex queries, because that's the hard case, but I could very well see a limited simple_search endpoint with a simplified syntax and reduced functionality.

@jone
Copy link
Member

jone commented Nov 19, 2015

I'd love to have a simple JSON-payload rather than having to build a compliacated GET-querystring.
Because GET + payload is a problem I'm therefore in favor of using payloads with POST-requests (nr. 4.)

@lukasgraf
Copy link
Member Author

For 3) it would be about as complicated as this 😜

query_json = quote_plus(json.dumps(query))
requests.get(url, params={'q': json_query})

Or with curl:

 curl -G -H 'Accept: application/json' $URL --data-urlencode 'q={"SearchableText": "foobar"}'

Or with httpie:

http -j GET $URL q=='{"SearchableText": "foobar"}'

Maybe offering both is the way to go? As long as the serialization format is JSON for both, implementation and documentation overhead should be minimal.

@jone
Copy link
Member

jone commented Nov 20, 2015

That's for me ok as well. I think we should just implement a first version; it can be extended later..

@lukasgraf
Copy link
Member Author

Updated:
The use of JSON only allows to retain some typing information, but not all (datetime isn't JSON serializable).

This significantly diminishes the advantage of the use of JSON as the query serialization format, because we still need to have a mechanism in place to reconstruct typing information based on knowledge about the index that's being queried.

Under these circumstances I really don't see a point in also offering POST anymore. While on the implementation side it's trivial to also support POST, it still needs to be tested and documented. Also, offering two ways to do the same thing goes against the Zen of Python There should be one-- and preferably only one --obvious way to do it., and might confuse our users.

For this reason, I think GET with query string and optional Zope type hints is the way to go. I'll update my search PR (#48) accordingly in the next few days and propose a GET only implementation.

@lukasgraf
Copy link
Member Author

A first version implementing GET with query string has been merged with #48

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants