For ordered sequences, have `changes()` return the new/old position #3714

Closed
neumino opened this Issue Feb 3, 2015 · 31 comments

Projects

None yet

10 participants

@neumino
Member
neumino commented Feb 3, 2015

Somehow related to #3709

Typically if I want to build a leaderboard, I'll run something like this:

r.table('users').orderBy({index: 'score'}).limit(10).changes()

And then try to maintain an array of the best users.

The changes I'll get will be like this:

{
    new_val: {id: 14, score: 68},
    old_val: {id: 23, score: 10} 
}

The problem is that while I can remove the last element {id: 23, score: 10} from my array, I have no idea where I should insert the new document {id:14, score: 68} (without going through my array).
It would be nice to return the new/old position in changes() for ordered sequences. Augmenting the data like what ReQL does for getNearest seems like a good idea to me.

@danielmewes danielmewes added this to the 2.0-polish milestone Feb 3, 2015
@danielmewes
Member

Sounds like a good idea.

@neumino
Member
neumino commented Feb 3, 2015

One more little thing. Is it possible to return a new type ORDERED_FEED? For the same reason ATOM_FEED is useful.

In case of an ordered sequence, I would like to have thinky maintain an array and just emit events when the array is updated.

@mlucy
Member
mlucy commented Feb 3, 2015

I added a new issue for the ORDERED_FEED thing, since that's pretty easy. Returning the index is a little bit harder because of squashing. Consider the case where the row A at position 5 is deleted, then the row that is now at position 5 is updated from B to B'. Right now the queued up changes look like [{old_val: A, new_val: nil, old_position: 5}, {old_val: B, new_val: B', old_position: 5, new_position: 5}]. Now if a new row A' is inserted that takes position 5, we squash it onto the first change, and we need to produce [{old_val: A, new_val: A', old_position: 5, new_position: 5}, {old_val: B, new_val: B', old_position: 6, new_position: 6}]. I.e. the new change being squashed onto the first change requires us to update the positions in the second change. We could get around that by keeping a snapshot of the pre-batch ordered set around in memory and adding the position tags when the batch is sent, but that would double memory usage in the worst case and be a bit of implementation effort.

@timmaxw
Member
timmaxw commented Feb 4, 2015

For some situations (e.g. r.table("foo").orderBy(...).changes()) it's impractical to compute the old and new positions. So I think it should be an optarg to .changes(), and it should error if it's impractical to compute positions. Also, #3719 might make this unnecessary.

@neumino
Member
neumino commented Feb 4, 2015

For ordered feeds, we could maybe squash only the last change? I'm actually fine not squashing things in a first time.

@timmaxw -- why would be impractical to compute the old and new positions? Isn't that "just" about keeping a number of children on each node of the tree?
While it requires more updates on the tree, it would also make things like orderBy(...).skip(x).limit(y) more efficient no?

@danielmewes
Member

Since we don't currently support r.table("foo").orderBy(...).changes() without a limit(...), I don't think we have to worry too much about that now. In the future I think throwing an error when it's impractical to compute positions will work fine.

For small result sets #3719 might indeed be an alternative. One thing to note though is that it doesn't save any computations (not asymptotically at least) compared to iterating through the current results set on the client and finding out the positions of the old value and new value that way.
@neumino what do you think about that?

For making this work with squash, we could only keep the primary keys of entries around in the pre-batch ordered set to avoid too much memory overhead. Of course that wouldn't make it any easier to implement.

@neumino The idea of adding subtree sizes to the btree has been suggested a few times, mostly in the context of making count() on ranges efficient. However the problem is that we would need to update those counts, which would mean that whenever inserting or deleting a document into a leaf, we would have to rewrite O(log n) blocks up to the top of the btree. That's impractical. We could keep the counts in the LBA, which is more efficient for those sorts of things, but that comes with some memory overhead.

@mlucy
Member
mlucy commented Feb 6, 2015

@danielmewes -- I haven't really thought about this enough, and I would guess we probably won't find time to do this in 2.0; could we bump it to subsequent and settle it later?

@danielmewes
Member

@mlucy Yes definitely.

@danielmewes danielmewes modified the milestone: 2.1, 2.0-polish Feb 6, 2015
@coffeemug
Member

FYI, losing order for feeds has come up literally every time I've explained .orderBy.limit.changes to a user. As far as prioritization of ReQL issues goes, I think this one is pretty high up there.

@danielmewes
Member

I think if we implement #3719 this one becomes less urgent. But we should make sure to do at least one of them for 2.1.

@mlucy
Member
mlucy commented Apr 22, 2015

I'm not opposed to this, but there's a lot scheduled for 2.1, so unless it's absolutely vital I'd prefer to just do #3719.

@mlucy
Member
mlucy commented Apr 24, 2015

@coffeemug, @danielmewes -- are we actually doing this for 2.1, or do you think we can get away with just doing #3719?

@danielmewes
Member

I think #3719 is good enough for 2.1.

We all seem to agree that this is useful, but we would have to discuss the names of the arguments and how to integrate the positions into the events.

Moving out of 2.1.

@danielmewes danielmewes modified the milestone: subsequent, 2.1 Apr 24, 2015
@coffeemug
Member

I agree with @danielmewes -- I think #3719 is sufficient for 2.1.

@Slava
Slava commented Jun 30, 2015

commenting to register my interest publicly, as well as to subscribe to further updates

@ccorcos
ccorcos commented Jul 1, 2015

+1

@dalanmiller
Contributor

I'm also going to comment here for a vote in support 👍

@danielmewes
Member

@dalanmiller also mentioned that it would be nice if we could support this for unlimited orderBy changefeeds (e.g. r.table(...).orderBy({index: ...}).changes().
It sounds like it wouldn't easily work with our current implementation of those, and I can't see an obvious way of doing it without a lot of memory overhead. But we should think about it a bit.

@coffeemug
Member

I'd like to understand the use case for that better. I can't easily think of one, and I don't think we should worry about it unless a compelling use case actually exists.

@Slava
Slava commented Sep 17, 2015

Honestly, I don't think the positions are important to support in the change feeds. RethinkDB already maintains the ordered sorted set, just w/o positions. Positions can be recalculated efficiently in the db client.

@danielmewes
Member

I think supporting it for non-limited orderBy.changes would mostly improve the consistency of the query language and the available features.
If it's non-trivial to do, I don't think we should do it. But if it is, we should do it.

@ccorcos
ccorcos commented Sep 18, 2015

with changefeeds for ordered queries and joins, I could make a really lean integration with Meteor. 👍

@coffeemug
Member

@ccorcos -- just FYI, feeds on ordered queries already work. You can already do things like this: r.table('foo').orderBy({index: 'blah'}).limit(10).changes().

Feeds on joins are coming soon (tm).

@ccorcos
ccorcos commented Sep 18, 2015

but those just give changes to the individual document feeds right? are there any notifications indicating that the order has changed?

@coffeemug
Member

@ccorcos -- no, they also include notifications that the order has changed. For example, if two documents flip their order, they'll show up in the feed.

@coffeemug
Member

Sorry, to clarify -- currently we don't give information about the actual order because the limited number of documents are easy to sort on the client; but we do give all the information you need to do it.

@danielmewes
Member

This shipped in 2.3.0 (see #5334).

@danielmewes danielmewes modified the milestone: duplicate, subsequent Apr 29, 2016
@ccorcos
ccorcos commented Apr 29, 2016

🌿

@GeoffreyPlitt

Is there a standard way, on the consumer side, to convert this feed of old_pos, new_pos into the final array a user would want to display?

I guess another way of asking this is the following question: For r.table().order_by().limit(10).changes(), I'd like to use a callback that takes the final updated array, not individual (each()) elements.

I'm just asking because I imagine lots of people are doing this, and maybe there's a function or a separate NPM module that makes it easy to coalesce into a result array.

@danielcompton
Contributor

@GeoffreyPlitt sounds a bit like this is what you're after? rethinkdb/docs#1137 (comment)

@danielmewes
Member

The pseudo-code in rethinkdb/docs#1137 (comment) works for set-semantics, where the ordering doesn't matter. For an ordered result with old_offset and new_offset fields, the algorithm will look somewhat like this:

// A data structure with (resizable) array semantics
var Array documents;

feed = r.table(...).orderBy(...).limit(...).changes({includeInitial: true, includeStates: true, includeOffsets: true}).run();

for change in feed {
  if (change has field "state" and change.state == "ready") {
    // We have reached a consistent state of the documents in the table.
    // This might be a good time for rendering the results etc.
    // We can simply stay in the `for` loop to keep it updated.
  }
  if (change has field "old_offset" and change.old_offset != null) {
    // Remove the old value at index old_offset from the array
    // (`erase` should "cut out" the value, reducing the index of every
    //  subsequent element by 1)
    documents.erase(change.old_offset);
  }
  if (change has field "new_offset" and change.new_offset != null) { 
    // Add the new value to the array at position new_offset
    documents.insert(change.new_offset, change.new_val);
  }
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment