Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Paginated atom feed #147

Open
pschirmacher opened this issue Dec 3, 2014 · 21 comments
Open

Paginated atom feed #147

pschirmacher opened this issue Dec 3, 2014 · 21 comments

Comments

@pschirmacher
Copy link
Contributor

This would be nice to have: http://tools.ietf.org/html/rfc5005#section-3. It would allow feed consumers to access old entries (e.g. after downtime or similar).

@aheusingfeld
Copy link
Contributor

Hm, couldn't that already be done with the prev link we implemented with #116?

@mvitz
Copy link
Contributor

mvitz commented Dec 4, 2014

No. #116 added links for individual updates. This will add links for the complete feed. Because I think we all agree to support this I will take care of this issue.

@pschirmacher
Copy link
Contributor Author

One thing that might have to be considered is how to handle new feed entries while a client is browsing through the paginated feed. Example:

  • There are 30 statuses in the database (IDs from large/latest to small/oldest): [30 .. 1]
  • A client accesses the first page of the feed and is presented with the latest 25 entries: [30 .. 6]
  • A new status is added to the database: [31 .. 1]
  • The client follows the 'previous' link. Which statuses is he going to get? [5 .. 1] or [6 .. 1]?

Another example:

  • Again 30 statuses in database: [30 .. 1]
  • A client accesses the first page of the feed: [30 .. 6]
  • 100 new statuses are added to the database: [130 .. 1]
  • Will the 'previous' link now point to [105 .. 81] or to [5 .. 1]?

@mvitz
Copy link
Contributor

mvitz commented Dec 4, 2014

I had this discussion previously. Is there any other solution than passing the first (or last item) processed in the current chunk? With this information the server could solve some of your problematic use cases.

@pschirmacher
Copy link
Contributor Author

passing the first (or last item) processed in the current chunk

I'm not sure I understand what you mean by that. Do you mean adding more information to the 'previous' link so that the server can figure out "from where to start"?
Example:

  • 30 statuses in database
  • GET /feed

feed with entries 30 to 6 and 'previous' link /feed?from=5?

@tloist
Copy link

tloist commented Dec 4, 2014

@pschirmacher That's a ConcurrentModificationException, you've described there.

A quick search through the HTTP Status Codes let's me assume that this would be an 409 - Conflict.
This would require that the server knows that the resource was changed in the mean time and makes browsing through paginated feeds more or less a pain, so I would hope that the resource is read far more often than edited... (which may be wrong).

@mvitz
Copy link
Contributor

mvitz commented Dec 4, 2014

@pschirmacher: Yes that was the idea I had.

In general the problem described here is hard to solve without taking a snapshot of the data for a complete session from a client. And this is something nobody want's to do (at least in this case).

@aheusingfeld
Copy link
Contributor

That's a ConcurrentModificationException

@tloist It's not! That would only happen if the server held the state of the list/ cursor, the client is currently iterating over. I seriously hope that none of us would implement it this way.

feed with entries 30 to 6 and 'previous' link /feed?from=5

@pschirmacher That makes perfect sense to me. Optionally we should add the number/ count of items to fetch. In this case the server could simply get the list of all the "updates", find the item with id=5 and return the following count of "updates"

@mvitz
Copy link
Contributor

mvitz commented Dec 4, 2014

Only problem with this approach is calculating the last link because the from parameter points to the newest item in the current chunk.

@mvitz
Copy link
Contributor

mvitz commented Dec 4, 2014

Hm the RFC states:

Paged feeds are lossy; that is, it is not possible to guarantee that
clients will be able to reconstruct the contents of the logical feed
at a particular time.  Entries may be added or changed as the pages
of the feed are accessed, without the client becoming aware of them.

Therefore, clients SHOULD NOT present paged feeds as coherent or
complete, or make assumptions to that effect.

So at least the simplest possible solution would satisfy the RFC...

@tloist
Copy link

tloist commented Dec 4, 2014

@aheusingfeld That's the same scenario - that's what I meant. And my suggestion involved that the client remembers the state of the resource it previously accessed.

Actually @mvitz introduced the idea of a snapshot.

But in this case I don't get the problem because the list of feeds is an append only list. So if you reference them relative from the beginning the same index always points to the same element.
That's basical what you do when you create an URL like /feed?from=5.

@mvitz The RFC assumes that the resource has arbitrary modifications (e.g. deletion of a feed) which is not (is it?) the case here.

@aheusingfeld
Copy link
Contributor

Only problem with this approach is calculating the last link

Not a problem if the "rel=next" points to a url containing before=<first-post-on-current-page>, is it?

@mvitz
Copy link
Contributor

mvitz commented Dec 4, 2014

@tloist You can delete your own updates as long as there is no reply to it and the RFC assumes that entries are created and deleted while some client is browsing the paginated atom feed. This can happen with statuses.

@aheusingfeld
If there are 20 entries [20..1] and we are currently viewing 6-10 the following links must be generated:
rel=first: No problem points to the latest one which has no pagination information
rel=next: No problem the 5 entries before 6 -> [5..1]
rel=prev: We could introduce something like the 5 entries after 10 -> [11..15]
rel=last: We need to now the first entry ID (1) or the the last ID of the last page (6). I think this would be complicated.

@tloist
Copy link

tloist commented Dec 4, 2014

Okay, say, I want to get all the hot new entries beginning from where I left of (e.g. I know the last entry I read had ID: 4711).

The discussion here tells me that I can't use the paginated feed (which is the default), right?
So how do I do this instead?

@pschirmacher
Copy link
Contributor Author

@tloist

ConcurrentModificationException This exception may be thrown by methods that have detected concurrent modification of an object when such modification is not permissible.

I think it should be permissible here.

409 The request could not be completed due to a conflict with the current state of the resource.

I don't think that fits here. The client might send an ETag to indicate what status of the resource it's expecting, and if the status of the resource changed, the server might respond with 412. I don't think this is a good approach here, though, because there might be too many status updates.

@pschirmacher
Copy link
Contributor Author

Maybe deleting entries from the feed can be avoided? E.g. by doing a soft delete in the database and just removing the content from the feed entry?
Another option might be making the events actually immutable and publishing StatusUpdated or StatusDeleted events. But I guess we don't want to do that here.

@mvitz
Copy link
Contributor

mvitz commented Dec 4, 2014

Hm. In my mind every deleted entry (even if done with soft deletion) should not appear in the feed.

@aheusingfeld
Copy link
Contributor

Ok, let's get the semantics clear first of all. The hard part is that "rel=prev" and "rel=next" have the semantics of e.g. the browser back button and not of time. Which is because we "travel back in time" when paging through the entries.

But my point is that when the server generates the page, he always knows the ids of the 20 entries he returns and the next older entry and then next younger entry! Therefore the client doesn't need to guess!

For the following I assume that the default count of entries in a list is 20:

rel=first -> get youngest entry

Returns a list of entries starting with the latest, freshest dp-entry and the 20 entries which happened before that latest entry
Example: /updates?count=20

rel=next -> get older entries

Returns a list of entries starting with the db-entry which comes directly after the oldest entry in the current list and the 19(1) entries which happened before that entry.
Example: /updates?before=180&count=20 (where 180 is the id of the first entry on the "next page")

rel=prev -> get younger entries

Returns a list of entries ending with the db-entry which comes directly before the youngest entry in the current list and the 19(!) entries which happened after that entry.
Example: /updates?after=200&count=20 (where 200 is the id of the last entry on the "prev page")

rel=last -> get oldest entry

Returns a list of entries starting with the db-entry which comes directly after the entry with id=1 in the current list and the 19 entries which happened before that entry.
Example: /updates?after=1&count=20

Does that make sense? With this solution we have a very simple id >= $reference or id <= $reference for the dabase query.

@mvitz
Copy link
Contributor

mvitz commented Dec 4, 2014

👍
Only thing I would not agree with is that rel=last starts with id=1 but we can agree saying rel=last starts with ID of first entry ;-)

@aheusingfeld
Copy link
Contributor

@mvitz It doesn't matter which id it is or whether that id exists - it just needs to be the smallest id in the db as our db query is id >= 1!

@mvitz
Copy link
Contributor

mvitz commented Dec 4, 2014

@aheusingfeld OK! >= 0 in case of statuses ;-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants