`return_initial` #3579

Closed
mlucy opened this Issue Jan 15, 2015 · 45 comments

Comments

Projects
None yet
@mlucy
Member

mlucy commented Jan 15, 2015

We never implemented a return_initial optarg. Currently point changefeeds and limit changefeeds always do return_initial, and other feeds never do. I think that's probably the desired default behavior.

It would be too much work to support return_initial on range changefeeds before the release, but we could offer the option to disable it on changefeeds where it's enabled by default. @coffeemug, @danielmewes, do you think this is worth doing?

@mlucy mlucy self-assigned this Jan 15, 2015

@mlucy mlucy added this to the 1.16-polish milestone Jan 15, 2015

@danielmewes

This comment has been minimized.

Show comment
Hide comment
@danielmewes

danielmewes Jan 15, 2015

Member

I wouldn't worry about this for 1.16. Moving to subsequent.

Member

danielmewes commented Jan 15, 2015

I wouldn't worry about this for 1.16. Moving to subsequent.

@danielmewes danielmewes modified the milestones: subsequent, 1.16-polish Jan 15, 2015

@coffeemug

This comment has been minimized.

Show comment
Hide comment
@coffeemug

coffeemug Jan 16, 2015

Contributor

Agree with @danielmewes. I do think that once 1.16 is out, return_initial and/or persistent/restartable feeds will be the next order of business for changefeeds.

Contributor

coffeemug commented Jan 16, 2015

Agree with @danielmewes. I do think that once 1.16 is out, return_initial and/or persistent/restartable feeds will be the next order of business for changefeeds.

@Timer

This comment has been minimized.

Show comment
Hide comment
@Timer

Timer Mar 17, 2015

I would be interested in seeing this implemented.

Timer commented Mar 17, 2015

I would be interested in seeing this implemented.

@danielmewes

This comment has been minimized.

Show comment
Hide comment
@danielmewes

danielmewes Mar 18, 2015

Member

@Timer @mlucy is working on the full version of this (including return_initial for range changefeeds). Will ship in 2.1 (after 2.0).

Member

danielmewes commented Mar 18, 2015

@Timer @mlucy is working on the full version of this (including return_initial for range changefeeds). Will ship in 2.1 (after 2.0).

@Timer

This comment has been minimized.

Show comment
Hide comment
@Timer

Timer Mar 18, 2015

Sounds good, thanks for the update @danielmewes!

Timer commented Mar 18, 2015

Sounds good, thanks for the update @danielmewes!

@mlucy

This comment has been minimized.

Show comment
Hide comment
@mlucy

mlucy May 7, 2015

Member

This is in next, but may not make it into 2.1. When we're sure where we're scheduling it we should open a docs issue. (Re-assigning to @danielmewes to handle scheduling.)

Member

mlucy commented May 7, 2015

This is in next, but may not make it into 2.1. When we're sure where we're scheduling it we should open a docs issue. (Re-assigning to @danielmewes to handle scheduling.)

@mlucy mlucy assigned danielmewes and unassigned mlucy May 7, 2015

@Timer

This comment has been minimized.

Show comment
Hide comment
@Timer

Timer May 8, 2015

Thanks for the update, @mlucy!

Timer commented May 8, 2015

Thanks for the update, @mlucy!

@jasonkuhrt

This comment has been minimized.

Show comment
Hide comment
@jasonkuhrt

jasonkuhrt May 26, 2015

If I understand this issue correctly it will allow the results of a table be gotten initially and watched ala point changefeeds? I would/will use this feature.

-- Edit VeXocide on IRC confirmed the answer to my question is yes.

If I understand this issue correctly it will allow the results of a table be gotten initially and watched ala point changefeeds? I would/will use this feature.

-- Edit VeXocide on IRC confirmed the answer to my question is yes.

@larkost

This comment has been minimized.

Show comment
Hide comment
@larkost

larkost Jun 3, 2015

Collaborator

I think that all types of changefeeds should either default to returning initial changes, or not. The mixed behavior that we seem to be adopting is confusing. I see the logic in where the defaults are, and especially in having table changefeeds not default to sending the entire table, but I don't think that the convenience of trying to match the defaults to the normal use case is worth having the (potentially) seemingly random differences in behavior between types of changefeeds.

Collaborator

larkost commented Jun 3, 2015

I think that all types of changefeeds should either default to returning initial changes, or not. The mixed behavior that we seem to be adopting is confusing. I see the logic in where the defaults are, and especially in having table changefeeds not default to sending the entire table, but I don't think that the convenience of trying to match the defaults to the normal use case is worth having the (potentially) seemingly random differences in behavior between types of changefeeds.

@coffeemug

This comment has been minimized.

Show comment
Hide comment
@coffeemug

coffeemug Jun 3, 2015

Contributor

I can definitely see where you're coming from, but I disagree with the conclusion (that we should standardize on one default). It feels like:

  • Defaulting to true on queries like r.table('foo').changes() is obviously bad because it makes this query useless by default in 95% of scenarios
  • Defaulting to false on queries like r.table('foo').orderBy(index='bar').limit(5) is obviously bad because it makes this query useless by default in 95% of scenarios
  • Having different defaults in different cases is obviously bad because inconsistency is confusing

Given that we're picking between different evils, I think we should go the pragmatic route (different defaults in different cases, accept the inconsistency) and document it really well. But I can definitely see why people are unhappy with that behavior. I'm unhappy with it too, just less unhappy than the alternatives.

Contributor

coffeemug commented Jun 3, 2015

I can definitely see where you're coming from, but I disagree with the conclusion (that we should standardize on one default). It feels like:

  • Defaulting to true on queries like r.table('foo').changes() is obviously bad because it makes this query useless by default in 95% of scenarios
  • Defaulting to false on queries like r.table('foo').orderBy(index='bar').limit(5) is obviously bad because it makes this query useless by default in 95% of scenarios
  • Having different defaults in different cases is obviously bad because inconsistency is confusing

Given that we're picking between different evils, I think we should go the pragmatic route (different defaults in different cases, accept the inconsistency) and document it really well. But I can definitely see why people are unhappy with that behavior. I'm unhappy with it too, just less unhappy than the alternatives.

@deontologician

This comment has been minimized.

Show comment
Hide comment
@deontologician

deontologician Jun 3, 2015

Contributor

Defaulting to true on queries like r.table('foo').changes() is obviously bad because it makes this query useless by default in 95% of scenarios

Defaulting to false on queries like r.table('foo').orderBy(index='bar').limit(5) is obviously bad because it makes this query useless by default in 95% of scenarios

I think there's a difference though because the first is bad because you'll accidentally suck down your entire table. The second one is bad because it's a bit annoying to need to tack on "return_initial" when you almost always need to for that kind of query.

But! That's only because we support changefeeds on a limited subset of queries right now, and mentally we're translating r.table('foo').orderBy(index='bar').limit(5) into "top five list". If you think about a hypothetical (and hopefully real) future where changefeeds work on everything, suddenly there are all sorts of combinations and it's a lot less obvious that some type of query has a natural default for return_initial. (In other words, assuming it really is 95% for one query and 95% for the other, in the future it's very likely we'll need to make almost arbitrary decisions because there is no easy way to map a chain of terms to a developer's intent).

If you still aren't convinced, think about the costs of the second in terms of annoyance to the user. They're going to reason out naturally "Ok, I want to get these users, and order them... then I guess I'll take the top 3. Ok changefeed that." then a bit later "Oh whoops, I need the initial values. Do I need to do a separate query to get the initial values? [reads docs a bit] Ah, neat, there's an optarg that will give them to me".

My contention being that instead of looking up in a big list of changefeedable queries "Which one gets me a leaderboard?", they're likely to reason it out for themselves, and in that case, they'll appreciate the regularity of return_initial s behavior

Contributor

deontologician commented Jun 3, 2015

Defaulting to true on queries like r.table('foo').changes() is obviously bad because it makes this query useless by default in 95% of scenarios

Defaulting to false on queries like r.table('foo').orderBy(index='bar').limit(5) is obviously bad because it makes this query useless by default in 95% of scenarios

I think there's a difference though because the first is bad because you'll accidentally suck down your entire table. The second one is bad because it's a bit annoying to need to tack on "return_initial" when you almost always need to for that kind of query.

But! That's only because we support changefeeds on a limited subset of queries right now, and mentally we're translating r.table('foo').orderBy(index='bar').limit(5) into "top five list". If you think about a hypothetical (and hopefully real) future where changefeeds work on everything, suddenly there are all sorts of combinations and it's a lot less obvious that some type of query has a natural default for return_initial. (In other words, assuming it really is 95% for one query and 95% for the other, in the future it's very likely we'll need to make almost arbitrary decisions because there is no easy way to map a chain of terms to a developer's intent).

If you still aren't convinced, think about the costs of the second in terms of annoyance to the user. They're going to reason out naturally "Ok, I want to get these users, and order them... then I guess I'll take the top 3. Ok changefeed that." then a bit later "Oh whoops, I need the initial values. Do I need to do a separate query to get the initial values? [reads docs a bit] Ah, neat, there's an optarg that will give them to me".

My contention being that instead of looking up in a big list of changefeedable queries "Which one gets me a leaderboard?", they're likely to reason it out for themselves, and in that case, they'll appreciate the regularity of return_initial s behavior

@coffeemug

This comment has been minimized.

Show comment
Hide comment
@coffeemug

coffeemug Jun 3, 2015

Contributor

That's a pretty compelling argument to turn it off by default.

Contributor

coffeemug commented Jun 3, 2015

That's a pretty compelling argument to turn it off by default.

@danielmewes

This comment has been minimized.

Show comment
Hide comment
@danielmewes

danielmewes Jun 3, 2015

Member

That's a pretty compelling argument to turn it off by default.

I think I agree with that. ...Unless there's some very clear and generic distinction between the two cases that I'm not seeing.

Member

danielmewes commented Jun 3, 2015

That's a pretty compelling argument to turn it off by default.

I think I agree with that. ...Unless there's some very clear and generic distinction between the two cases that I'm not seeing.

@jasonkuhrt

This comment has been minimized.

Show comment
Hide comment
@jasonkuhrt

jasonkuhrt Jun 4, 2015

That's a pretty compelling argument to turn it off by default.

I agree. The consistency is actually a big win here. Furthermore this problem could still be solved via root level defaults configuration (e.g. let users customize/toggle what the defaults are etc.) if we really wanted to.

That's a pretty compelling argument to turn it off by default.

I agree. The consistency is actually a big win here. Furthermore this problem could still be solved via root level defaults configuration (e.g. let users customize/toggle what the defaults are etc.) if we really wanted to.

@mlucy

This comment has been minimized.

Show comment
Hide comment
@mlucy

mlucy Jun 6, 2015

Member

Alright, let's turn it off by default for everything in 2.2.

We may want to pick a shorter name. The current name is include_initial_vals in the branch, which is a pain in the ass to have to type all the time. But on the other hand include_initial_vals is nice because it's consistent with include_states.

Member

mlucy commented Jun 6, 2015

Alright, let's turn it off by default for everything in 2.2.

We may want to pick a shorter name. The current name is include_initial_vals in the branch, which is a pain in the ass to have to type all the time. But on the other hand include_initial_vals is nice because it's consistent with include_states.

@jasonkuhrt

This comment has been minimized.

Show comment
Hide comment
@jasonkuhrt

jasonkuhrt Jun 6, 2015

include_initial?

include_initial?

@coffeemug

This comment has been minimized.

Show comment
Hide comment
@coffeemug

coffeemug Jun 8, 2015

Contributor

👍 for include_initial

Contributor

coffeemug commented Jun 8, 2015

👍 for include_initial

@deontologician

This comment has been minimized.

Show comment
Hide comment
@deontologician

deontologician Jun 9, 2015

Contributor

I thought it already was return_initial, whoops. I vote for that

Contributor

deontologician commented Jun 9, 2015

I thought it already was return_initial, whoops. I vote for that

@danielmewes

This comment has been minimized.

Show comment
Hide comment
@danielmewes

danielmewes Jun 9, 2015

Member
                                                                                  `return_initial` or `include_initial`?
Member

danielmewes commented Jun 9, 2015

                                                                                  `return_initial` or `include_initial`?
@deontologician

This comment has been minimized.

Show comment
Hide comment
@deontologician

deontologician Jun 9, 2015

Contributor

Ok, serves me right for looking at this while sleepy. I don't intend to add a new option, include_initial looks good

Contributor

deontologician commented Jun 9, 2015

Ok, serves me right for looking at this while sleepy. I don't intend to add a new option, include_initial looks good

@mlucy

This comment has been minimized.

Show comment
Hide comment
@mlucy

mlucy Jun 9, 2015

Member

I think I prefer include_initial_vals over include_initial; it reads a lot more clearly to me, and the latter is still short enough to inconvenient.

Member

mlucy commented Jun 9, 2015

I think I prefer include_initial_vals over include_initial; it reads a lot more clearly to me, and the latter is still short enough to inconvenient.

@danielmewes

This comment has been minimized.

Show comment
Hide comment
@danielmewes

danielmewes Jun 9, 2015

Member

I'm fine with either include_initial_vals or include_initial. No real preference from my end.

@mlucy

and the latter is still short enough to inconvenient.

I'm not sure I understand. Did you mean the former was still short enough to not be inconvenient?

Member

danielmewes commented Jun 9, 2015

I'm fine with either include_initial_vals or include_initial. No real preference from my end.

@mlucy

and the latter is still short enough to inconvenient.

I'm not sure I understand. Did you mean the former was still short enough to not be inconvenient?

@VeXocide

This comment has been minimized.

Show comment
Hide comment
@VeXocide

VeXocide Jun 9, 2015

Member

Since we're bikeshedding, I prefer include_initial.

Member

VeXocide commented Jun 9, 2015

Since we're bikeshedding, I prefer include_initial.

@mlucy

This comment has been minimized.

Show comment
Hide comment
@mlucy

mlucy Jun 10, 2015

Member

@danielmewes -- I meant that I think both are too long to be convenient.

Member

mlucy commented Jun 10, 2015

@danielmewes -- I meant that I think both are too long to be convenient.

@coffeemug

This comment has been minimized.

Show comment
Hide comment
@coffeemug

coffeemug Jun 10, 2015

Contributor

I also prefer include_initial (but not enough to fight for it if the consensus is on return_initial_vals).

Contributor

coffeemug commented Jun 10, 2015

I also prefer include_initial (but not enough to fight for it if the consensus is on return_initial_vals).

@mlucy

This comment has been minimized.

Show comment
Hide comment
@mlucy

mlucy Jun 10, 2015

Member

include_initial_vals seems much better to me; it's not nearly as obvious to me what include_initial means just from reading it.

Member

mlucy commented Jun 10, 2015

include_initial_vals seems much better to me; it's not nearly as obvious to me what include_initial means just from reading it.

@coffeemug

This comment has been minimized.

Show comment
Hide comment
@coffeemug

coffeemug Jun 16, 2015

Contributor

Shouldn't this be include_initial_vals for consistency? I'm fine with either choice -- @mlucy could you just settle on one?

Contributor

coffeemug commented Jun 16, 2015

Shouldn't this be include_initial_vals for consistency? I'm fine with either choice -- @mlucy could you just settle on one?

@mlucy

This comment has been minimized.

Show comment
Hide comment
@mlucy

mlucy Jun 16, 2015

Member

Sorry to drag on the discussion for forever, but what do people think about a really short one like init? So r.table('test').changes(init: true) instead of r.table('test').changes(include_initial_vals: true).

If people don't like a short one, I think include_initial_vals is the best of the long ones.

Member

mlucy commented Jun 16, 2015

Sorry to drag on the discussion for forever, but what do people think about a really short one like init? So r.table('test').changes(init: true) instead of r.table('test').changes(include_initial_vals: true).

If people don't like a short one, I think include_initial_vals is the best of the long ones.

@deontologician

This comment has been minimized.

Show comment
Hide comment
@deontologician

deontologician Jun 16, 2015

Contributor

I prefer short names in general, but init sounds like we're initializing something. I think include_initial is a good middle ground. I don't think _vals adds a lot and does make it more visually noisy

Contributor

deontologician commented Jun 16, 2015

I prefer short names in general, but init sounds like we're initializing something. I think include_initial is a good middle ground. I don't think _vals adds a lot and does make it more visually noisy

@coffeemug

This comment has been minimized.

Show comment
Hide comment
@coffeemug

coffeemug Jun 16, 2015

Contributor

I agree with @deontologician.

Contributor

coffeemug commented Jun 16, 2015

I agree with @deontologician.

@VeXocide

This comment has been minimized.

Show comment
Hide comment
@VeXocide

VeXocide Jun 16, 2015

Member

I agree with @deontologician as well.

Member

VeXocide commented Jun 16, 2015

I agree with @deontologician as well.

@mlucy

This comment has been minimized.

Show comment
Hide comment
@mlucy

mlucy Jun 16, 2015

Member

Alright, include_initial works for me too. Let's re-settle this issue with include_initial as the name, and have it always default to false.

Member

mlucy commented Jun 16, 2015

Alright, include_initial works for me too. Let's re-settle this issue with include_initial as the name, and have it always default to false.

@coffeemug

This comment has been minimized.

Show comment
Hide comment
@coffeemug

coffeemug Jun 16, 2015

Contributor

👏

Contributor

coffeemug commented Jun 16, 2015

👏

@danielmewes

This comment has been minimized.

Show comment
Hide comment
@danielmewes

danielmewes Jun 16, 2015

Member

Sounds great

Member

danielmewes commented Jun 16, 2015

Sounds great

@jasonkuhrt

This comment has been minimized.

Show comment
Hide comment

👍

@Slava

This comment has been minimized.

Show comment
Hide comment
@Slava

Slava Jun 30, 2015

commenting to register my interest in this issue publicly, as well as subscribe for further updates

Slava commented Jun 30, 2015

commenting to register my interest in this issue publicly, as well as subscribe for further updates

@ccorcos

This comment has been minimized.

Show comment
Hide comment

ccorcos commented Jul 1, 2015

+1

@danielcompton

This comment has been minimized.

Show comment
Hide comment
@danielcompton

danielcompton Aug 11, 2015

Contributor

When this feature is added, will it dissolve the differences between 'point' changefeeds and other changefeeds?

Contributor

danielcompton commented Aug 11, 2015

When this feature is added, will it dissolve the differences between 'point' changefeeds and other changefeeds?

@mlucy

This comment has been minimized.

Show comment
Hide comment
@mlucy

mlucy Aug 11, 2015

Member

@danielcompton -- yes, the interface should be the same.

Member

mlucy commented Aug 11, 2015

@danielcompton -- yes, the interface should be the same.

@stellanhaglund

This comment has been minimized.

Show comment
Hide comment
@stellanhaglund

stellanhaglund Sep 7, 2015

This seems very promising, I guess the 2.2 release is months away?

This seems very promising, I guess the 2.2 release is months away?

@mlucy

This comment has been minimized.

Show comment
Hide comment
@mlucy

mlucy Sep 7, 2015

Member

@stellanhaglund -- we don't have a release date yet, but we just released 2.1 and historically releases have been 1-3 months apart.

Member

mlucy commented Sep 7, 2015

@stellanhaglund -- we don't have a release date yet, but we just released 2.1 and historically releases have been 1-3 months apart.

@mlucy mlucy assigned mlucy and unassigned danielmewes Oct 8, 2015

@danielmewes danielmewes referenced this issue in rethinkdb/docs Oct 12, 2015

Closed

Document include_initial functionality #920

@tatsujin1

This comment has been minimized.

Show comment
Hide comment
@tatsujin1

tatsujin1 Oct 19, 2015

Question: would the initial state stream be possible to sort?
In my system I want to return the documents in the same order they arrived, so for the consumer it will be received as if it was live (just faster!).
Currently I do a current-value query (sorted by modify time) and a changes() query in a thread each and serialize the output from those threads so they're emitted externally in "initial-then-changes"-order. This is a somewhat cumbersome solution (and notably slower). Having changes() natively return initial state first is a great win for this use case. However, there is the sorting issue...

Question: would the initial state stream be possible to sort?
In my system I want to return the documents in the same order they arrived, so for the consumer it will be received as if it was live (just faster!).
Currently I do a current-value query (sorted by modify time) and a changes() query in a thread each and serialize the output from those threads so they're emitted externally in "initial-then-changes"-order. This is a somewhat cumbersome solution (and notably slower). Having changes() natively return initial state first is a great win for this use case. However, there is the sorting issue...

@danielmewes

This comment has been minimized.

Show comment
Hide comment
@danielmewes

danielmewes Oct 19, 2015

Member

@tatsujin1 If you just do an orderBy({index: ...}).changes({includeInitial: true}), changes will not necessarily arrive after all initial results, but you can start seeing some changes before you have received all initial results.

There are two work-arounds:

  • If you also specify {includeStates: true}, you will receive a special document {state: "ready"} once all initial results (plus possibly a few changes) have been transmitted. If you delay rendering in your application until you see that document, you get essentially the desired behavior where the first view that's rendered represents the initial results at the time of the ready document, and from that point on you receive changes to it. Note that at no point you will see a change for a document that you haven't seen the initial result for yet.
  • If you attach a limit to the query, i.e. table.orderBy({index: ...}).limit(...), I believe you will get the behavior you describe where all initial results will be transmitted first, followed by any updates.

The reason for why unlimited changefeeds with include_initial can send changes before they've sent all initial results is for memory consumption. If we had to somehow keep track of all changes until we're done sending all initial results, that would mean that we would need to use an unbounded amount of memory (depending on the number of changes happening during the time where we stream the initial results).

Member

danielmewes commented Oct 19, 2015

@tatsujin1 If you just do an orderBy({index: ...}).changes({includeInitial: true}), changes will not necessarily arrive after all initial results, but you can start seeing some changes before you have received all initial results.

There are two work-arounds:

  • If you also specify {includeStates: true}, you will receive a special document {state: "ready"} once all initial results (plus possibly a few changes) have been transmitted. If you delay rendering in your application until you see that document, you get essentially the desired behavior where the first view that's rendered represents the initial results at the time of the ready document, and from that point on you receive changes to it. Note that at no point you will see a change for a document that you haven't seen the initial result for yet.
  • If you attach a limit to the query, i.e. table.orderBy({index: ...}).limit(...), I believe you will get the behavior you describe where all initial results will be transmitted first, followed by any updates.

The reason for why unlimited changefeeds with include_initial can send changes before they've sent all initial results is for memory consumption. If we had to somehow keep track of all changes until we're done sending all initial results, that would mean that we would need to use an unbounded amount of memory (depending on the number of changes happening during the time where we stream the initial results).

@tatsujin1

This comment has been minimized.

Show comment
Hide comment
@tatsujin1

tatsujin1 Oct 20, 2015

Thanks for the thorough response!

That changes arrive amongst initial state documents is not a problem; the two sets are simple to serialize in application code. Hmm... come to think of it, I'm not sure that is strictly required. :) (especially as the initial is sent first, as you mentioned)

Sorting the initial state is something I need, however; the inter-document arrival order has meaning in my case, e.g. A then B means X; B then A means Y.

But hey, I hadn't thought of using an indexed .order_by()! As it is required to be first in the query chain I thought it would be too slow to "sort the entire data set", but I tried it now and it is surprisingly fast! :)

Yeah, I know of the state documents, but since I'm doing the initial+changes queries manually it's not currently used, but they of course will be once this feature is in use. If the initial+changes sets need to be serialized that is...

Thanks for the thorough response!

That changes arrive amongst initial state documents is not a problem; the two sets are simple to serialize in application code. Hmm... come to think of it, I'm not sure that is strictly required. :) (especially as the initial is sent first, as you mentioned)

Sorting the initial state is something I need, however; the inter-document arrival order has meaning in my case, e.g. A then B means X; B then A means Y.

But hey, I hadn't thought of using an indexed .order_by()! As it is required to be first in the query chain I thought it would be too slow to "sort the entire data set", but I tried it now and it is surprisingly fast! :)

Yeah, I know of the state documents, but since I'm doing the initial+changes queries manually it's not currently used, but they of course will be once this feature is in use. If the initial+changes sets need to be serialized that is...

@mlucy

This comment has been minimized.

Show comment
Hide comment
@mlucy

mlucy Oct 20, 2015

Member

This is in next, but it still needs more testing (I'll create another issue for that shortly). Should be released in 2.2.

Member

mlucy commented Oct 20, 2015

This is in next, but it still needs more testing (I'll create another issue for that shortly). Should be released in 2.2.

@mlucy mlucy closed this Oct 20, 2015

@faceyspacey faceyspacey referenced this issue in Slava/meteor-rethinkdb Oct 28, 2015

Open

Wait for Rethink's #4629 to get done #24

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment