Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Batching #99

Merged
merged 10 commits into from
May 24, 2016
Merged

Implement Batching #99

merged 10 commits into from
May 24, 2016

Conversation

lukasgraf
Copy link
Member

@lukasgraf lukasgraf commented May 19, 2016

Implements batching for

  • Plone Site Root children
  • Dexterity Folder
  • Archetypes Folders
  • Collections
  • Search Results

Also changes implementation for GET on folderish contexts so catalog queries are used instead of objectValues().

TODO:

  • Documentation
  • Use Plone's default batch size as the default (turns out Plone doesn't have a configurable default batch sizes. It seems to be hard coded to 25 in most places, so I changed the default in plone.restapi.batching to that as well)
  • Rename items_count to total_items
  • Refactor tests

Closes #8

@lukasgraf
Copy link
Member Author

@tisto this one is ready for review

@lukasgraf lukasgraf force-pushed the batching branch 4 times, most recently from f9d2bcc to 1541bcd Compare May 20, 2016 10:00
@lukasgraf
Copy link
Member Author

@buchi fixed some flaky tests and rebased onto master - please review

@lukasgraf lukasgraf force-pushed the batching branch 2 times, most recently from 4a7630a to 7585eb7 Compare May 20, 2016 13:49
@lukasgraf
Copy link
Member Author

lukasgraf commented May 20, 2016

@buchi updated to not have HypermediaBatch inherit from Batch - it was only supposed to proxy to a batch.

@lukasgraf lukasgraf mentioned this pull request May 21, 2016
3 tasks
@buchi
Copy link
Member

buchi commented May 21, 2016

Still need some time for the review. Will do asap.

@lukasgraf PR needs rebase

@lukasgraf
Copy link
Member Author

@buchi rebased

content-type: application/json

{
"@id": "http://localhost:55001/plone/folder/@search?sort_on=path",
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lukasgraf if the request is on "/folder/@search?b_size=5&sort_on=path", why is the @id equals "/folder/@search?sort_on=path" (without the "b_size")? Shouldn't this be either the base request or the full request with all params?

Copy link
Member Author

@lukasgraf lukasgraf May 22, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tisto no, this is supposed to be the canonical ID to the base collection without any batching information (loosely based on https://www.w3.org/community/hydra/wiki/Collection_Design#Pagination / https://www.w3.org/community/hydra/wiki/Pagination#PartialCollection). The ID for the current batch is in batching/@id.

I guess the sort_on parameter doesn't really affect the collection's contents, just their ordering. But if you consider other query string parameters for @search (or possibly other endpoints), the query string params can directly affect what resource will be returned, so I'm preserving them and just stripping batching related params for the canonical resource URL.

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. My feeling is that we should remove the sort_on param from the canonical top-level id as well. Do we have any hypermedia controls for changing the sort order? Have you looked into that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tisto I briefly looked into it, but couldn't find any spec for hypermedia controls to specify sort order that are standarized in any sense. Updated HypermediaBatch.canonical_url to also remove sorting related params.

@buchi
Copy link
Member

buchi commented May 22, 2016

Maybe we should rather call it pagination or paging. Batching is often used in the context of performing bulk operations. Any thoughts?

@buchi
Copy link
Member

buchi commented May 22, 2016

What about including total_items in batching instead of top-level?

"next": "http://.../plone/folder/search?b_size=10&b_start=30"
},
"total_items": 175,
"member": [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

has been renamed to items

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, well spotted. I will do one more pass for member -> items.

@lukasgraf
Copy link
Member Author

The term "pagination" is definitely much more common outside the Plone world. Since we're doing a rather low-level API that exposes many of the implementation details already I decided to stick with Plone's naming.

I wouldn't be opposed to naming it "pagination", but calling it that and still using the b_size and b_start parameters would be a bit odd. (And we should keep using those IMHO, because they cause some implicit magic to happen under the hood that we would otherwise need to reimplement).

@lukasgraf
Copy link
Member Author

Re: moving total_items inside the batching info dict: I also considered that. I ended up going with a top-level attribute that was in an example in one of the earlier design documents, but it would make more sense to move it inside the batching info dict. 👍 I'm in favor of moving it, any objections @tisto?

@tisto
Copy link
Sponsor Member

tisto commented May 22, 2016

@buch @lukasgraf the rationale behind the top-level total_items is that all items inside "batching" are supposed to be hypermedia controls that the client just exposes 1:1 to the user. If we start adding more items there we will end up with tight coupling between client and server because the client needs to know about the exact structure and names of the API.

@lukasgraf
Copy link
Member Author

@tisto I see - we'll leave it like that then. The client shouldn't be relying on total_items for page calculations anyway, because it can only ever be approximate (resultset may change during pagination).

@@ -19,5 +25,5 @@ content-type: application/json
"title": "Test Folder"
}
],
"items_count": 2
"total_items": 2
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lukasgraf I'm wondering if it would make sense to rename "total_items" to "items_total". This would make "items_total" show up right after "items". Does not make any difference for the consumer app but for a developer browsing the API.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could, I don't have any strong feelings about either one ;) Regarding developer-friendly representations though, we might want to look into using OrderedDicts at some point. If given an ordered dict, json.dumps() will preserve the order of the keys. Structurally the order of keys in a dict still has no meaning of course, but that might make it easier to produce JSON that is layed out in a human-friendly way.

@buchi
Copy link
Member

buchi commented May 23, 2016

Right now the batching information is included even when the items count doesn't exceed the batch size.
IMO opinion it doesn't make sense to provide batching links to the first or last batch in this case. A js client will unnecessarily have to check if it should display those links. I would propose to return an empty batching object instead or to not include the batching at all. @tisto @lukasgraf opinions?

@lukasgraf
Copy link
Member Author

@buchi sounds good to me. Though if we were to still include the batching object, it probably shouldn't be completely empty, it would still need the @id I believe, otherwise JSON-LD processors would drop it alltogether. So maybe just leave it out entirely?

@lukasgraf
Copy link
Member Author

Can we get a consensus on renaming total_items to items_total and omitting the batching dict if len(resultset) <= batch_size? I don't feel strongly about either point, but I'd like to prevent this PR from stalling and move forward towards a first alpha.

@tisto
Copy link
Sponsor Member

tisto commented May 24, 2016

@lukasgraf +1

@lukasgraf
Copy link
Member Author

@tisto @buchi updated:

  • Total item count is now called items_total
  • Omit batching links object entirely if resultset isn't batched

I also rebased onto master and squashed my commits in a way that some back and forth changes were eliminated.

@tisto tisto merged commit 1423a7d into master May 24, 2016
@tisto tisto deleted the batching branch May 24, 2016 11:22
@tisto tisto removed the in progress label May 24, 2016
lukasgraf added a commit that referenced this pull request Jun 6, 2016
The LazyCatalogResultSerializer has been removed in #99 without
a pressing need to do so. Therefore we re-introduce it here and
make it handle the batching logic that was previously left to
the SearchHandler.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants