ReQL proposal: restarting feeds #3471

Open
coffeemug opened this Issue Dec 23, 2014 · 183 comments

Comments

Projects
None yet
@coffeemug
Contributor

coffeemug commented Dec 23, 2014

The question of restarting feeds came up in #2953, #1118, and #2613. Since we've learned a lot since these issues were opened, I decided to start with a clean slate and a specific proposal.

When the user calls changes() on a stream, the feed protocol would inject an opaque timestamp into results as follows:

# Returns a feed along with an opaque timestamp
> r.table('foo').changes()
[{ 'old_val': ..., 'new_val': ..., 'timestamp': OPAQUE}]

The user could then pass the opaque timestamp back to changes() via the return_initial optarg to get all the changes since the timestamp. We'd be piggy-backing off existing replication logic, so we'd have to tell the user the range of "obsolete" keys they should delete, and then fill them in on the new values in that range:

> r.table('foo').changes(return_initial=OPAQUE)
[
  # Obsolete range of keys to delete
  { 'obsolete_range': [LEFT_KEY1, RIGHT_KEY1] },
  # Backfill the user on values they missed in the range
  { 'new_val': ..., 'timestamp': OPAQUE}, { 'new_val': ..., 'timestamp': OPAQUE},

  # Another obsolete range
  { 'obsolete_range': [LEFT_KEY2, RIGHT_KEY2] },
  # Values to backfill in the range
  { 'new_val': ..., 'timestamp': OPAQUE}, { 'new_val': ..., 'timestamp': OPAQUE},

  # Now we can start the feed
  { 'old_val': ..., 'new_val': ..., 'timestamp': OPAQUE},
  ...
]

A special value for return_initial is True, which backfills the user from scratch:

> r.table('foo').changes(return_initial=True)
[
  # Backfill the user on all the values
  { 'new_val': ..., 'timestamp': OPAQUE}, { 'new_val': ..., 'timestamp': OPAQUE},

  # Now we can start the feed
  { 'old_val': ..., 'new_val': ..., 'timestamp': OPAQUE},
  ...
]

This would work on streams in general, like t.map().changes() and t.filter().changes().

Note, if the changefeed is set up on a datum (e.g. a single document), the only legal values for return_initial are True or False.

There is a question of what the default for return_initial should be. I can see a couple of options:

  • Set it to False. This is annoying for datum feeds (e.g. a feed on a document).
  • Set it to True. This may be annoying for stream feeds (e.g. a feed on a table).
  • Set it to False on streams, and to True on datums. This might be confusing to users.

Also note, under this proposal the squash optarg would continue operating as it does now, but would have no effect on the initial values.

/cc @danielmewes @timmaxw @mlucy. How hard would this be to implement on top of the current replication logic? Are there flaws in the API? Would it work on more general streams (like t.filter().changes())?

@coffeemug coffeemug added this to the subsequent milestone Dec 23, 2014

@coffeemug

This comment has been minimized.

Show comment
Hide comment
@coffeemug

coffeemug Dec 23, 2014

Contributor

Also note, the meaning of the opaque timestamp value is always monotonically increasing.

Contributor

coffeemug commented Dec 23, 2014

Also note, the meaning of the opaque timestamp value is always monotonically increasing.

@danielmewes

This comment has been minimized.

Show comment
Hide comment
@danielmewes

danielmewes Dec 23, 2014

Member

My guess is that this is probably not too difficult in a basic form.

There are a few open questions:

  1. What exactly must be in the timestamp? Probably it will be a set of replication timestamps, with separate timestamps for different key/hash ranges?
  2. What do we have to do to make the changefeed survive rebalancing / reconfiguration?
  3. If we lose a few writes in the underlying table due to a primary dying, this API doesn't allow us to roll those changes back without starting from scratch. We should think a bit about how to handle that case.
  • We can either ignore the issue and simply resume feeding changes from the new primary. In this case the state submitted over the feed will become out of sync with the actual data in the table.
  • Or we can send the user a message to drop all keys and start over. Or we can terminate the changefeed and require the user to start over.
  • Finally we could do something more complex and ask the user to send back any key ranges that they think might have been affected by the changes since some given timestamp. We would then rebackfill only those ranges. This requires some substantial effort on the user's side. It would also require making the timestamps transparent.

Point 3 will become especially relevant when we implement automatic failover.

A minor detail: In our current backfilling logic, the "drop potentially obsolete key range" requests are restricted to a given hash shard. We have to either expose the hashing function to the user in some way, or make sure that we re-send the given key range over all hash shards.

Member

danielmewes commented Dec 23, 2014

My guess is that this is probably not too difficult in a basic form.

There are a few open questions:

  1. What exactly must be in the timestamp? Probably it will be a set of replication timestamps, with separate timestamps for different key/hash ranges?
  2. What do we have to do to make the changefeed survive rebalancing / reconfiguration?
  3. If we lose a few writes in the underlying table due to a primary dying, this API doesn't allow us to roll those changes back without starting from scratch. We should think a bit about how to handle that case.
  • We can either ignore the issue and simply resume feeding changes from the new primary. In this case the state submitted over the feed will become out of sync with the actual data in the table.
  • Or we can send the user a message to drop all keys and start over. Or we can terminate the changefeed and require the user to start over.
  • Finally we could do something more complex and ask the user to send back any key ranges that they think might have been affected by the changes since some given timestamp. We would then rebackfill only those ranges. This requires some substantial effort on the user's side. It would also require making the timestamps transparent.

Point 3 will become especially relevant when we implement automatic failover.

A minor detail: In our current backfilling logic, the "drop potentially obsolete key range" requests are restricted to a given hash shard. We have to either expose the hashing function to the user in some way, or make sure that we re-send the given key range over all hash shards.

@danielmewes

This comment has been minimized.

Show comment
Hide comment
@danielmewes

danielmewes Dec 23, 2014

Member

Another thing to note is that our backfill implementation relies on taking a snapshot of the whole table while it's running. We have to think about how that interacts with changefeeds that might be open for a long time because the client is slow in reading all the (initial) data of the cursor. Specifically I'm concerned about memory consumption on tables with high write loads.
It might be necessary to enhance the buffer cache to allow paging out snapshotted blocks to disk. This will be useful for backfills as well, and will later become useful for #3464.

Can I change my "My guess is that this is probably not too difficult in a basic form." to a "This is a significant amount of work, but seems feasible."?

Member

danielmewes commented Dec 23, 2014

Another thing to note is that our backfill implementation relies on taking a snapshot of the whole table while it's running. We have to think about how that interacts with changefeeds that might be open for a long time because the client is slow in reading all the (initial) data of the cursor. Specifically I'm concerned about memory consumption on tables with high write loads.
It might be necessary to enhance the buffer cache to allow paging out snapshotted blocks to disk. This will be useful for backfills as well, and will later become useful for #3464.

Can I change my "My guess is that this is probably not too difficult in a basic form." to a "This is a significant amount of work, but seems feasible."?

@coffeemug

This comment has been minimized.

Show comment
Hide comment
@coffeemug

coffeemug Dec 23, 2014

Contributor
  1. We shouldn't expose shards to users in the changefeed API. If at all possible, they should have to think about it in terms of deleting obsolete ranges, getting documents from those ranges one at a time, and increasing timestamps.
  2. I'd do nothing about surviving rebalancing for now (until we get high availability). If possible, I'd prefer to think of that problem as out-of-scope for restartable feeds.
  3. I'd do nothing until we do failover (though we should probably think about that more).

WRT to snapshot memory issues, I'd also ignore that for now. We already run into that in case of backfills, so I wouldn't worry about it too much until later. If we really wanted to be careful here, we could just terminate the feed if the snapshot gets too big (i.e. the user is too far behind), and they could restart the feed.

I think it'll be really important to do this quickly after 2.0 since it's a huge limitations to the current system, so I'd try to cut scope in every way possible. I suspect we can get away with doing very little with respect to many of the operational issues like these.

Contributor

coffeemug commented Dec 23, 2014

  1. We shouldn't expose shards to users in the changefeed API. If at all possible, they should have to think about it in terms of deleting obsolete ranges, getting documents from those ranges one at a time, and increasing timestamps.
  2. I'd do nothing about surviving rebalancing for now (until we get high availability). If possible, I'd prefer to think of that problem as out-of-scope for restartable feeds.
  3. I'd do nothing until we do failover (though we should probably think about that more).

WRT to snapshot memory issues, I'd also ignore that for now. We already run into that in case of backfills, so I wouldn't worry about it too much until later. If we really wanted to be careful here, we could just terminate the feed if the snapshot gets too big (i.e. the user is too far behind), and they could restart the feed.

I think it'll be really important to do this quickly after 2.0 since it's a huge limitations to the current system, so I'd try to cut scope in every way possible. I suspect we can get away with doing very little with respect to many of the operational issues like these.

@danielmewes

This comment has been minimized.

Show comment
Hide comment
@danielmewes

danielmewes Dec 23, 2014

Member

I personally would always set return_initial to False by default, simply because having it True comes with significant costs in some of the cases. If a user needs the initial value they will know that they do so, no matter whether it's a single document or a range.

@coffeemug you're probably thinking of specific use cases when you say that return_initial would be more conveniently default to True for single-document changefeeds and False otherwise. Could you elaborate a bit on those?

Member

danielmewes commented Dec 23, 2014

I personally would always set return_initial to False by default, simply because having it True comes with significant costs in some of the cases. If a user needs the initial value they will know that they do so, no matter whether it's a single document or a range.

@coffeemug you're probably thinking of specific use cases when you say that return_initial would be more conveniently default to True for single-document changefeeds and False otherwise. Could you elaborate a bit on those?

@coffeemug

This comment has been minimized.

Show comment
Hide comment
@coffeemug

coffeemug Dec 23, 2014

Contributor

You'd want it to be true any time you're loading a realtime web page. Things like t.orderBy().limit().changes() and t.get().changes(), since if you don't get the initial value right away, you can't populate the page.

Contributor

coffeemug commented Dec 23, 2014

You'd want it to be true any time you're loading a realtime web page. Things like t.orderBy().limit().changes() and t.get().changes(), since if you don't get the initial value right away, you can't populate the page.

@danielmewes

This comment has been minimized.

Show comment
Hide comment
@danielmewes

danielmewes Dec 23, 2014

Member

@coffeemug
I agree that we should avoid exposing sharding details to the user. My question in 1 was with respect to how we would actually implement the timestamp. I think we need the individual ranges there because there's no single synchronized (replication) timestamp across different shards.

I think solving the snapshot memory issue is worth looking into. It might not actually be that difficult to do.
Terminating the changefeed if the user is too far behind is annoying. On large tables transferring the initial results might well take so long that no matter what the user does, the changefeed will be aborted before it even gets a chance to catch up with more recent writes. At that point the feature becomes unusable.
If we want this feature to make changefeeds more reliable for replication purposes, that is probably not the way to go.

Member

danielmewes commented Dec 23, 2014

@coffeemug
I agree that we should avoid exposing sharding details to the user. My question in 1 was with respect to how we would actually implement the timestamp. I think we need the individual ranges there because there's no single synchronized (replication) timestamp across different shards.

I think solving the snapshot memory issue is worth looking into. It might not actually be that difficult to do.
Terminating the changefeed if the user is too far behind is annoying. On large tables transferring the initial results might well take so long that no matter what the user does, the changefeed will be aborted before it even gets a chance to catch up with more recent writes. At that point the feature becomes unusable.
If we want this feature to make changefeeds more reliable for replication purposes, that is probably not the way to go.

@danielmewes

This comment has been minimized.

Show comment
Hide comment
@danielmewes

danielmewes Dec 23, 2014

Member

Actually thinking more about it we can probably avoid implementing pageable snapshots in the cache.

Instead we could use a similar technique to what is suggested in #1944. We would stream the initial results in batches of small primary key ranges. After every range, we would send the user a new opaque timestamp that reflects the fact that we have backfilled up to a given replication timestamp for that small range.

If we implement this, we can either start streaming changes for a given key range to the user as soon as we have "backfilled" all initial results for that small range, or keep accumulating them in a disk backed queue until the initial results for the whole table have been sent.
I think both would be fine. The former is nicer because we can keep the disk backed queues really small (or maybe avoid them in the first place), the latter has the property that all initial results are sent over the changefeed before any update is received which might make it minimally easier for the user to parse.

Member

danielmewes commented Dec 23, 2014

Actually thinking more about it we can probably avoid implementing pageable snapshots in the cache.

Instead we could use a similar technique to what is suggested in #1944. We would stream the initial results in batches of small primary key ranges. After every range, we would send the user a new opaque timestamp that reflects the fact that we have backfilled up to a given replication timestamp for that small range.

If we implement this, we can either start streaming changes for a given key range to the user as soon as we have "backfilled" all initial results for that small range, or keep accumulating them in a disk backed queue until the initial results for the whole table have been sent.
I think both would be fine. The former is nicer because we can keep the disk backed queues really small (or maybe avoid them in the first place), the latter has the property that all initial results are sent over the changefeed before any update is received which might make it minimally easier for the user to parse.

@danielmewes

This comment has been minimized.

Show comment
Hide comment
@danielmewes

danielmewes Dec 23, 2014

Member

You'd want it to be true any time you're loading a realtime web page.

Ah I see. I think this is not to bad then:

Set it to False on streams, and to True on datums. This might be confusing to users.

It sounds confusing because the user has to know which queries return a stream and which return a datum. In practice, at least with the types of changefeeds that we currently support, I think it's sufficiently clear though.

Member

danielmewes commented Dec 23, 2014

You'd want it to be true any time you're loading a realtime web page.

Ah I see. I think this is not to bad then:

Set it to False on streams, and to True on datums. This might be confusing to users.

It sounds confusing because the user has to know which queries return a stream and which return a datum. In practice, at least with the types of changefeeds that we currently support, I think it's sufficiently clear though.

@deontologician

This comment has been minimized.

Show comment
Hide comment
@deontologician

deontologician Dec 23, 2014

Contributor

You guys had mentioned something like r.db('system').table('changefeeds') that is populated with changefeeds that are currently active. Maybe with some kind of changefeed pseudotype for the documents that the drivers can convert into a cursor:

{id: <feed_uuid>
 feed: <FEED_PSEUDOTYPE>
}

So the drivers could maybe do something like:

for item in r.db('system').table('changefeeds').get(<feed_uuid>)('feed'):
    print item['new_val'], item['old_val']
Contributor

deontologician commented Dec 23, 2014

You guys had mentioned something like r.db('system').table('changefeeds') that is populated with changefeeds that are currently active. Maybe with some kind of changefeed pseudotype for the documents that the drivers can convert into a cursor:

{id: <feed_uuid>
 feed: <FEED_PSEUDOTYPE>
}

So the drivers could maybe do something like:

for item in r.db('system').table('changefeeds').get(<feed_uuid>)('feed'):
    print item['new_val'], item['old_val']
@mlucy

This comment has been minimized.

Show comment
Hide comment
@mlucy

mlucy Jan 15, 2015

Member

Here's an alternate proposal that would take a lot less work to implement:

1> r.table('test').changes() => non-restartable changefeed
1> r.table('test').changes(persist: 'my_changefeed') => restartable changefeed
1> CRASHES
2> r.table('test').changefeed('my_changefeed') => steal the restartable changefeed from (1)
2> r.table('test').changefeed('my_changefeed').delete => safely close the changefeed so it doesn't hang around taking up memory

We'd basically give people a way to create a named changefeed that exists above the connection level, so another client can steal that changefeed and keep reading changes from it in case the first client goes down. This wouldn't let people requests all the changes starting at an arbitrary point in time, but it's way way easier to implement. (We could also keep the last batch sent around in memory, and have an optarg to indicate whether or not (2) receives the last sent batch a second time, which I think is usually what you'd want in the case where (1) dies.)

Member

mlucy commented Jan 15, 2015

Here's an alternate proposal that would take a lot less work to implement:

1> r.table('test').changes() => non-restartable changefeed
1> r.table('test').changes(persist: 'my_changefeed') => restartable changefeed
1> CRASHES
2> r.table('test').changefeed('my_changefeed') => steal the restartable changefeed from (1)
2> r.table('test').changefeed('my_changefeed').delete => safely close the changefeed so it doesn't hang around taking up memory

We'd basically give people a way to create a named changefeed that exists above the connection level, so another client can steal that changefeed and keep reading changes from it in case the first client goes down. This wouldn't let people requests all the changes starting at an arbitrary point in time, but it's way way easier to implement. (We could also keep the last batch sent around in memory, and have an optarg to indicate whether or not (2) receives the last sent batch a second time, which I think is usually what you'd want in the case where (1) dies.)

@deontologician

This comment has been minimized.

Show comment
Hide comment
@deontologician

deontologician Jan 15, 2015

Contributor

What happens to changefeeds that people forget to delete?

Contributor

deontologician commented Jan 15, 2015

What happens to changefeeds that people forget to delete?

@mlucy

This comment has been minimized.

Show comment
Hide comment
@mlucy

mlucy Jan 15, 2015

Member

They use up memory indefinitely, but not an arbitrary amount of memory (at most 100k changes right now). We could also add a reasonable timeout after which they're evicted (although it's debatable whether reasonable would mean "an hour" or "a week").

Member

mlucy commented Jan 15, 2015

They use up memory indefinitely, but not an arbitrary amount of memory (at most 100k changes right now). We could also add a reasonable timeout after which they're evicted (although it's debatable whether reasonable would mean "an hour" or "a week").

@mlucy

This comment has been minimized.

Show comment
Hide comment
@mlucy

mlucy Jan 15, 2015

Member

We could also make the timeout configurable as another optarg to changes.

Member

mlucy commented Jan 15, 2015

We could also make the timeout configurable as another optarg to changes.

@deontologician

This comment has been minimized.

Show comment
Hide comment
@deontologician

deontologician Jan 15, 2015

Contributor

I like the mlucy api better, but I like coffeemug's guarantees better. If my process crashes that was listening on the changefeed with the @coffeemug proposal, I can always recover what happened in between. If we only store the last sent batch in memory, we solve the "can't recover a feed" problem, but we don't solve the "I'm sure I didn't miss a change due to a crash" problem.

Contributor

deontologician commented Jan 15, 2015

I like the mlucy api better, but I like coffeemug's guarantees better. If my process crashes that was listening on the changefeed with the @coffeemug proposal, I can always recover what happened in between. If we only store the last sent batch in memory, we solve the "can't recover a feed" problem, but we don't solve the "I'm sure I didn't miss a change due to a crash" problem.

@timmaxw

This comment has been minimized.

Show comment
Hide comment
@timmaxw

timmaxw Jan 15, 2015

Member

If we're willing to make some compromises, we could design the API such that the user can't tell which method is being used under the hood. It would look something like this:

>>> r.table("foo").changes(resumable=True)
[{ 'old_val': ..., 'new_val': ..., 'token': OPAQUE}]
# then after the crash
>>> r.table("foo").changes(resume_token=OPAQUE)
[{ 'old_val': ..., 'new_val': ..., 'token': OPAQUE}]

The initial implementation would look like this: When the user calls .changes(resumable=True), the server generates a "changefeed ID" and allocates a ring buffer for changes. The OPAQUE sent with each change is the changefeed ID and a timestamp/index into the ring buffer. If a client connection is abruptly lost, the changefeed ID and ring buffer stay alive for a while. If the client calls .changes(resume_token=OPAQUE) then the server looks up that ring buffer and resumes right where the client left off. If the ring buffer already expired, or it was on a different server, then the server sends {obsolete_range: [null, null]}, then streams the entire table, then sets up a new buffer and streams changes that way.

In a later implementation, we could make OPAQUE be a full-blown storage engine timestamp, and .changes(resume_token=OPAQUE) would do a backfill. So the client wouldn't have to change at all when we switched from the naive implementation to the intelligent implementation under the hood.

(I'm not sure if this is actually the right way to go. It seems like the naive implementation has a lot of hidden "gotchas" the user has to know about.)

Member

timmaxw commented Jan 15, 2015

If we're willing to make some compromises, we could design the API such that the user can't tell which method is being used under the hood. It would look something like this:

>>> r.table("foo").changes(resumable=True)
[{ 'old_val': ..., 'new_val': ..., 'token': OPAQUE}]
# then after the crash
>>> r.table("foo").changes(resume_token=OPAQUE)
[{ 'old_val': ..., 'new_val': ..., 'token': OPAQUE}]

The initial implementation would look like this: When the user calls .changes(resumable=True), the server generates a "changefeed ID" and allocates a ring buffer for changes. The OPAQUE sent with each change is the changefeed ID and a timestamp/index into the ring buffer. If a client connection is abruptly lost, the changefeed ID and ring buffer stay alive for a while. If the client calls .changes(resume_token=OPAQUE) then the server looks up that ring buffer and resumes right where the client left off. If the ring buffer already expired, or it was on a different server, then the server sends {obsolete_range: [null, null]}, then streams the entire table, then sets up a new buffer and streams changes that way.

In a later implementation, we could make OPAQUE be a full-blown storage engine timestamp, and .changes(resume_token=OPAQUE) would do a backfill. So the client wouldn't have to change at all when we switched from the naive implementation to the intelligent implementation under the hood.

(I'm not sure if this is actually the right way to go. It seems like the naive implementation has a lot of hidden "gotchas" the user has to know about.)

@mlucy

This comment has been minimized.

Show comment
Hide comment
@mlucy

mlucy Jan 15, 2015

Member

That seems reasonable to me. If we always kept at least one old batch around until we evict the feed entirely, then it would be impossible for someone to process a row with a given token, crash, and not be able to restart at that token (i.e. they'd have to not be logging every token they process to whatever service they use to recover). That seems like an OK guarantee for a first pass, and we can make the guarantee stronger later.

Member

mlucy commented Jan 15, 2015

That seems reasonable to me. If we always kept at least one old batch around until we evict the feed entirely, then it would be impossible for someone to process a row with a given token, crash, and not be able to restart at that token (i.e. they'd have to not be logging every token they process to whatever service they use to recover). That seems like an OK guarantee for a first pass, and we can make the guarantee stronger later.

@danielmewes

This comment has been minimized.

Show comment
Hide comment
@danielmewes

danielmewes Jan 15, 2015

Member

I like @timmaxw 's proposal.

That way we could ship a first version using a ring buffer on the primary replica that:

  • allows resuming if the client (connection) dies
  • starts over feeding the table state from scratch if more than e.g. 100K changes accumulate
  • starts over feeding the table state from scratch if the primary dies

We could think about maintaining resumable changefeeds over table reconfiguration, though on first thought that seems like work we can better spend on the better implementation.

Later we can follow up with an implementation that uses store timestamps, in order to

  • allow resuming even after missing more than 100K changes
  • allow resuming after primary changes

I'm uncertain as to whether it's worth spending time on the first implementation rather than going for the second one right away. This will depend on how much work we expect the respective implementations to be.

Member

danielmewes commented Jan 15, 2015

I like @timmaxw 's proposal.

That way we could ship a first version using a ring buffer on the primary replica that:

  • allows resuming if the client (connection) dies
  • starts over feeding the table state from scratch if more than e.g. 100K changes accumulate
  • starts over feeding the table state from scratch if the primary dies

We could think about maintaining resumable changefeeds over table reconfiguration, though on first thought that seems like work we can better spend on the better implementation.

Later we can follow up with an implementation that uses store timestamps, in order to

  • allow resuming even after missing more than 100K changes
  • allow resuming after primary changes

I'm uncertain as to whether it's worth spending time on the first implementation rather than going for the second one right away. This will depend on how much work we expect the respective implementations to be.

@mlucy

This comment has been minimized.

Show comment
Hide comment
@mlucy

mlucy Jan 15, 2015

Member

It's worth noting that this impacts #3564; using tokens in this way won't work with point changefeeds the way things are now.

Member

mlucy commented Jan 15, 2015

It's worth noting that this impacts #3564; using tokens in this way won't work with point changefeeds the way things are now.

@timmaxw

This comment has been minimized.

Show comment
Hide comment
@timmaxw

timmaxw Jan 15, 2015

Member

For point changefeeds it's cheap to send users the initial value every time, so there's no need to make changefeeds resumable or persistent. The only exception I can think of is changefeeds on map-reduce. But then there's no backfill-like solution available, so we'd have to make the map-reduce trees persistent. At that point I'd want to make them explicit objects that the user can create and delete, like secondary indexes.

If the user runs .changes(resumable=true, return_initial=true) in a backfill-based implementation, then we can send tokens along with the initial "backfill", in order to make the backfill resumable. In other words, we can extend incremental backfilling all the way to the clients.

The resume_token and return_initial optargs are mutually exclusive. This is because return_initial=true is fundamentally the same as resume_token=<zero timestamp>. It would be nice if we could combine them into a single optarg, but I can't think of a good way to explain it to the user.

Member

timmaxw commented Jan 15, 2015

For point changefeeds it's cheap to send users the initial value every time, so there's no need to make changefeeds resumable or persistent. The only exception I can think of is changefeeds on map-reduce. But then there's no backfill-like solution available, so we'd have to make the map-reduce trees persistent. At that point I'd want to make them explicit objects that the user can create and delete, like secondary indexes.

If the user runs .changes(resumable=true, return_initial=true) in a backfill-based implementation, then we can send tokens along with the initial "backfill", in order to make the backfill resumable. In other words, we can extend incremental backfilling all the way to the clients.

The resume_token and return_initial optargs are mutually exclusive. This is because return_initial=true is fundamentally the same as resume_token=<zero timestamp>. It would be nice if we could combine them into a single optarg, but I can't think of a good way to explain it to the user.

@mlucy

This comment has been minimized.

Show comment
Hide comment
@mlucy

mlucy Jan 15, 2015

Member

For point changefeeds it's cheap to send users the initial value every time, so there's no need to make changefeeds resumable or persistent.

This is true for some but not all use cases. We support non-squashing point changefeeds, so it's plausible someone would want to use a changefeed to e.g. make sure a user's balance never dips below $0 and charge them an overdraft fee if it does. In that case "resuming" the changefeed by just re-sending the initial value doesn't get you what you want.

Member

mlucy commented Jan 15, 2015

For point changefeeds it's cheap to send users the initial value every time, so there's no need to make changefeeds resumable or persistent.

This is true for some but not all use cases. We support non-squashing point changefeeds, so it's plausible someone would want to use a changefeed to e.g. make sure a user's balance never dips below $0 and charge them an overdraft fee if it does. In that case "resuming" the changefeed by just re-sending the initial value doesn't get you what you want.

@timmaxw

This comment has been minimized.

Show comment
Hide comment
@timmaxw

timmaxw Jan 15, 2015

Member

That also excludes backfill-based resuming. I think we shouldn't support non-squashing resumption. In fact, I'm suspicious of non-squashing changefeeds in general. If someone wants to do something special if the user's balance ever drops below zero, they should put that logic in the write queries.

Member

timmaxw commented Jan 15, 2015

That also excludes backfill-based resuming. I think we shouldn't support non-squashing resumption. In fact, I'm suspicious of non-squashing changefeeds in general. If someone wants to do something special if the user's balance ever drops below zero, they should put that logic in the write queries.

@mlucy

This comment has been minimized.

Show comment
Hide comment
@mlucy

mlucy Jan 15, 2015

Member

If someone wants to do something special if the user's balance ever drops below zero, they should put that logic in the write queries.

Maybe in that one example that would be reasonable because you'd only want one or two places where you update someone's balance. In general, though, I think we should give people a way to write code that triggers whenever a certain change happens without forcing them to attach a copy of that code to all the write queries that could conceivably cause such a change.

Member

mlucy commented Jan 15, 2015

If someone wants to do something special if the user's balance ever drops below zero, they should put that logic in the write queries.

Maybe in that one example that would be reasonable because you'd only want one or two places where you update someone's balance. In general, though, I think we should give people a way to write code that triggers whenever a certain change happens without forcing them to attach a copy of that code to all the write queries that could conceivably cause such a change.

@danielmewes

This comment has been minimized.

Show comment
Hide comment
@danielmewes

danielmewes Jan 15, 2015

Member

I agree with @timmaxw that we shouldn't worry about non-squashing resumable changefeeds.
I still think non-squashing non-resumable changefeeds are useful.

As far as I can see resumable and non-resumable changefeeds really have quite different types of use cases.
The primary use case for resumable changefeeds is for data synchronization into other systems (e.g. ElasticSearch).
Non-resumable changefeeds are sufficient for things where you want a web app or something update in realtime once some user posts a new comment or your scoreboard changes etc. These are also the types of scenarios for non-squashing changefeeds can be useful.

Member

danielmewes commented Jan 15, 2015

I agree with @timmaxw that we shouldn't worry about non-squashing resumable changefeeds.
I still think non-squashing non-resumable changefeeds are useful.

As far as I can see resumable and non-resumable changefeeds really have quite different types of use cases.
The primary use case for resumable changefeeds is for data synchronization into other systems (e.g. ElasticSearch).
Non-resumable changefeeds are sufficient for things where you want a web app or something update in realtime once some user posts a new comment or your scoreboard changes etc. These are also the types of scenarios for non-squashing changefeeds can be useful.

@kofalt

This comment has been minimized.

Show comment
Hide comment
@kofalt

kofalt Jan 15, 2015

This portion of @timmaxw's proposal makes me nervous:

then streams the entire table

  1. Wouldn't this make changefeed resumption basically ususable for large data sets?

The example @danielmewes just pointed out - a web app that broadcasts events - would like to pick up where it left off after a crash, but would be incapable of eating an entire table's worth of data, deduping it with what the client already has, sending 4 million websocket events, etc.

  1. What would the 'full table dump' events look like?

If { 'old_val': null, 'new_val': ..., 'token': OPAQUE}, then it would appear identical to an insertion event, which is arguably reasonable but not idempotent. A system expecting the current change feed behaviour would react incorrectly - say, broadcasting 4 million websocket events to clients about how many new comments there were.

kofalt commented Jan 15, 2015

This portion of @timmaxw's proposal makes me nervous:

then streams the entire table

  1. Wouldn't this make changefeed resumption basically ususable for large data sets?

The example @danielmewes just pointed out - a web app that broadcasts events - would like to pick up where it left off after a crash, but would be incapable of eating an entire table's worth of data, deduping it with what the client already has, sending 4 million websocket events, etc.

  1. What would the 'full table dump' events look like?

If { 'old_val': null, 'new_val': ..., 'token': OPAQUE}, then it would appear identical to an insertion event, which is arguably reasonable but not idempotent. A system expecting the current change feed behaviour would react incorrectly - say, broadcasting 4 million websocket events to clients about how many new comments there were.

@kofalt

This comment has been minimized.

Show comment
Hide comment
@kofalt

kofalt Jan 15, 2015

In my comment above, I suppose I am implicitly assuming the second use case from @danielmewes' comment about replication to ElasticSearch vs a web app, for both my questions.

Perhaps what I'm getting at is that in this implementation, it would be nice to have a flag that says "please do NOT dump me the entire table if I'm out of buffer."

kofalt commented Jan 15, 2015

In my comment above, I suppose I am implicitly assuming the second use case from @danielmewes' comment about replication to ElasticSearch vs a web app, for both my questions.

Perhaps what I'm getting at is that in this implementation, it would be nice to have a flag that says "please do NOT dump me the entire table if I'm out of buffer."

@timmaxw

This comment has been minimized.

Show comment
Hide comment
@timmaxw

timmaxw Jan 15, 2015

Member

@kofalt: That's a really good point. In fact, we should probably never dump the entire table when the user tries to resume a changefeed. If the server expired the changefeed, it should always throw an error instead of dumping the table. The client can catch this error and start a new changefeed with return_initial=True if they want a full-table dump. Then when we switch to a backfilling-based implementation, from the client's point of view it looks like it's no longer possible for changefeeds to expire.

Member

timmaxw commented Jan 15, 2015

@kofalt: That's a really good point. In fact, we should probably never dump the entire table when the user tries to resume a changefeed. If the server expired the changefeed, it should always throw an error instead of dumping the table. The client can catch this error and start a new changefeed with return_initial=True if they want a full-table dump. Then when we switch to a backfilling-based implementation, from the client's point of view it looks like it's no longer possible for changefeeds to expire.

@kofalt

This comment has been minimized.

Show comment
Hide comment
@kofalt

kofalt Jan 16, 2015

There were some good comments upthread about different use cases for feeds that might want to restart, and feeds that will never care about restarting.

In that resumption case, I cannot (off the top of my head) envision a case where I'd both want to restart the feed, and be able to continue without errors if the feed was actually gone, or going to dump the whole table at me.

An arbitrarily-sized ring buffer can handle blips in connectivity, but the same if place in feed lost, then catastrophic loss of state logic would have to be written by an app consuming the feed as if the buffer were not in place.

It would follow that persistent, hard-guarantee feeds may inherently be a more valuable feature.

kofalt commented Jan 16, 2015

There were some good comments upthread about different use cases for feeds that might want to restart, and feeds that will never care about restarting.

In that resumption case, I cannot (off the top of my head) envision a case where I'd both want to restart the feed, and be able to continue without errors if the feed was actually gone, or going to dump the whole table at me.

An arbitrarily-sized ring buffer can handle blips in connectivity, but the same if place in feed lost, then catastrophic loss of state logic would have to be written by an app consuming the feed as if the buffer were not in place.

It would follow that persistent, hard-guarantee feeds may inherently be a more valuable feature.

@timmaxw

This comment has been minimized.

Show comment
Hide comment
@timmaxw

timmaxw Jan 16, 2015

Member

The original rationale was that I wanted the two different implementation to expose exactly the same API to the user. But the performance characteristics are different enough that I agree it's not a good idea.

Backfill-based feeds are strictly better than ring-buffer-based feeds from the user's point of view. The problem is that backfill-based feeds are a lot more complicated to implement on the server side. That's the reason @mlucy suggested ring-buffer-based feeds in the first place.

Member

timmaxw commented Jan 16, 2015

The original rationale was that I wanted the two different implementation to expose exactly the same API to the user. But the performance characteristics are different enough that I agree it's not a good idea.

Backfill-based feeds are strictly better than ring-buffer-based feeds from the user's point of view. The problem is that backfill-based feeds are a lot more complicated to implement on the server side. That's the reason @mlucy suggested ring-buffer-based feeds in the first place.

@kofalt

This comment has been minimized.

Show comment
Hide comment
@kofalt

kofalt Jan 16, 2015

Makes sense. I'm mainly trying to reconcile changefeed implementation proposals with how I would end up consuming them as a user. The ring-buffer strategy might be a great starting point, I just don't want it to preclude usage for common cases.

kofalt commented Jan 16, 2015

Makes sense. I'm mainly trying to reconcile changefeed implementation proposals with how I would end up consuming them as a user. The ring-buffer strategy might be a great starting point, I just don't want it to preclude usage for common cases.

@coffeemug

This comment has been minimized.

Show comment
Hide comment
@coffeemug

coffeemug Jan 16, 2015

Contributor

Assuming it's actually easier to implement, I like @mlucy's API more (or at least the general gist of it), because it's much easier for users to consume. If we send the user a token they have to later send back, they'd have to deal with persisting the token. For example, if the client crashes and they start another one, they'd have to pull the latest known token from somewhere to send it to the database. If we implement persistent named feeds, that problem goes away entirely since they can just continue reading from the feed.

That being said, named feeds open up a couple of semantic questions/opportunities we should talk about. For example, what happens when multiple clients connect to the same named feed? (There are a few obvious choices we could make; we could also add an optarg to control this behavior, etc.)

Contributor

coffeemug commented Jan 16, 2015

Assuming it's actually easier to implement, I like @mlucy's API more (or at least the general gist of it), because it's much easier for users to consume. If we send the user a token they have to later send back, they'd have to deal with persisting the token. For example, if the client crashes and they start another one, they'd have to pull the latest known token from somewhere to send it to the database. If we implement persistent named feeds, that problem goes away entirely since they can just continue reading from the feed.

That being said, named feeds open up a couple of semantic questions/opportunities we should talk about. For example, what happens when multiple clients connect to the same named feed? (There are a few obvious choices we could make; we could also add an optarg to control this behavior, etc.)

@timmaxw

This comment has been minimized.

Show comment
Hide comment
@timmaxw

timmaxw Jan 16, 2015

Member

If you persist the token atomically whenever you read from the feed, then the server can guarantee that the client gets each change exactly once. I think that's valuable. There's no way to get that guarantee with @mlucy's API.

Member

timmaxw commented Jan 16, 2015

If you persist the token atomically whenever you read from the feed, then the server can guarantee that the client gets each change exactly once. I think that's valuable. There's no way to get that guarantee with @mlucy's API.

@coffeemug

This comment has been minimized.

Show comment
Hide comment
@coffeemug

coffeemug Jan 16, 2015

Contributor

True, but persistence is hard, especially if you want to coordinate it across multiple client nodes. I think it's more valuable to do it in the database and let the user deal with very rare cases when they might have gotten a change notification, but the database failed to persist it because of the crash.

Contributor

coffeemug commented Jan 16, 2015

True, but persistence is hard, especially if you want to coordinate it across multiple client nodes. I think it's more valuable to do it in the database and let the user deal with very rare cases when they might have gotten a change notification, but the database failed to persist it because of the crash.

@danielmewes

This comment has been minimized.

Show comment
Hide comment
@danielmewes

danielmewes Jan 19, 2015

Member

If we later want to switch to a replication timestamp based implementation, keeping track of changefeeds on the server has the disadvantage of requiring additional intra-cluster synchronization if we want it to survive server failures.

With the client managing a token (which would contain a set of timestamps), it's easy to resume a changefeed on any server, even after a primary failure.

Member

danielmewes commented Jan 19, 2015

If we later want to switch to a replication timestamp based implementation, keeping track of changefeeds on the server has the disadvantage of requiring additional intra-cluster synchronization if we want it to survive server failures.

With the client managing a token (which would contain a set of timestamps), it's easy to resume a changefeed on any server, even after a primary failure.

@danielmewes

This comment has been minimized.

Show comment
Hide comment
@danielmewes

danielmewes Jan 19, 2015

Member

In the case where you use resumable changefeeds to replicate RethinkDB data into another database system, you already have persistence there and it's probably not too hard to store the latest token together with the data.

Member

danielmewes commented Jan 19, 2015

In the case where you use resumable changefeeds to replicate RethinkDB data into another database system, you already have persistence there and it's probably not too hard to store the latest token together with the data.

@ProTip

This comment has been minimized.

Show comment
Hide comment
@ProTip

ProTip Jan 30, 2015

Hi guys, here is my 2c. I apologise if it's a bit of a ramble :|

Everything you can think of has or is being turned into a webapp. I work at a company that makes online business software that handles payrolls, taxes, appointments, and the whole lot. Good enough for web apps seems like a murky place. I would like to guarantee at least once notification with my applications. I know Google does; sometimes I get duplicate hangout messages :|

I believe that the primary use of a change feed is synchronization, but not just to other databases. As a changefeed consumer, I'm synchronizing an event in the database with some business logic, SNS for mobile push, elastic-search, a users screen, or even a RDBMS. I want a changefeed that's durably persisted to and committed to atomically with the data. I want it to store my progress for me and to advance that position when I let it know I've processed the changes. I'm in agreeance with @coffeemug in that I don't want to have to store this information back into the database myself. I want it to store the changes forever or until I explicitly remove the feed. Essentially, I want a Kafka topic of query changes.

ProTip commented Jan 30, 2015

Hi guys, here is my 2c. I apologise if it's a bit of a ramble :|

Everything you can think of has or is being turned into a webapp. I work at a company that makes online business software that handles payrolls, taxes, appointments, and the whole lot. Good enough for web apps seems like a murky place. I would like to guarantee at least once notification with my applications. I know Google does; sometimes I get duplicate hangout messages :|

I believe that the primary use of a change feed is synchronization, but not just to other databases. As a changefeed consumer, I'm synchronizing an event in the database with some business logic, SNS for mobile push, elastic-search, a users screen, or even a RDBMS. I want a changefeed that's durably persisted to and committed to atomically with the data. I want it to store my progress for me and to advance that position when I let it know I've processed the changes. I'm in agreeance with @coffeemug in that I don't want to have to store this information back into the database myself. I want it to store the changes forever or until I explicitly remove the feed. Essentially, I want a Kafka topic of query changes.

@danielmewes danielmewes added this to the 2.1 milestone Feb 3, 2015

@deontologician

This comment has been minimized.

Show comment
Hide comment
@deontologician

deontologician Mar 5, 2016

Contributor

@analytik For replacing RabbitMQ job queues, this proposal won't be enough. You'll need something like #4133 to guarantee that two workers don't get the same job.

Contributor

deontologician commented Mar 5, 2016

@analytik For replacing RabbitMQ job queues, this proposal won't be enough. You'll need something like #4133 to guarantee that two workers don't get the same job.

@mikemintz mikemintz referenced this issue in mikemintz/react-rethinkdb Mar 18, 2016

Closed

Isomorphic server and client flickering #29

@wenzowski wenzowski referenced this issue in mattkrick/meatier Apr 19, 2016

Open

Wishlist feature: offline-first #101

@renato

This comment has been minimized.

Show comment
Hide comment
@renato

renato Apr 19, 2016

This would be very useful to us, too. Are you working on this currently or is it on hold for now?

renato commented Apr 19, 2016

This would be very useful to us, too. Are you working on this currently or is it on hold for now?

@danielmewes

This comment has been minimized.

Show comment
Hide comment
@danielmewes

danielmewes Apr 19, 2016

Member

@renato We have not started actively working on this yet.
Which degree of resumability are you going to need? Is it enough to resume a changefeed after a brief connection drop between the client and the RethinkDB server? Or do you need changefeeds to survive server restarts as well (either client or database servers)?

Member

danielmewes commented Apr 19, 2016

@renato We have not started actively working on this yet.
Which degree of resumability are you going to need? Is it enough to resume a changefeed after a brief connection drop between the client and the RethinkDB server? Or do you need changefeeds to survive server restarts as well (either client or database servers)?

@renato

This comment has been minimized.

Show comment
Hide comment
@renato

renato Apr 20, 2016

@danielmewes We are currently using PouchDB/CouchDB to replicate between different database instances in different locations for a retail POS software that just can't be offline - and, being in Brazil, our internet service providers are not reliable at all, so we can't have only a central database.

That said, I believe having changefeeds resumability (if we can guarantee the changes' delivery) would make it possible to have our own multi-master replication between RethinkDB instances, but we'd need it to survive server and client restarts (oh, and I know there's a lot more in error handling and conflict resolution to achieve this goal).

Probably this use case is beyong what is expected with the changefeeds resumability, I just imagined the possibility of a multi-master replication would allow us to migrate our few non-RethinkDB databases to RethinkDB.

renato commented Apr 20, 2016

@danielmewes We are currently using PouchDB/CouchDB to replicate between different database instances in different locations for a retail POS software that just can't be offline - and, being in Brazil, our internet service providers are not reliable at all, so we can't have only a central database.

That said, I believe having changefeeds resumability (if we can guarantee the changes' delivery) would make it possible to have our own multi-master replication between RethinkDB instances, but we'd need it to survive server and client restarts (oh, and I know there's a lot more in error handling and conflict resolution to achieve this goal).

Probably this use case is beyong what is expected with the changefeeds resumability, I just imagined the possibility of a multi-master replication would allow us to migrate our few non-RethinkDB databases to RethinkDB.

@danielmewes

This comment has been minimized.

Show comment
Hide comment
@danielmewes

danielmewes Apr 21, 2016

Member

@renato I don't think what you need is outside the scope of resumable changefeeds per-se, but it will take a second (or third) step after our initial resumability support until you could really use them for multi-master replication. It's something we would like to support eventually though.

Member

danielmewes commented Apr 21, 2016

@renato I don't think what you need is outside the scope of resumable changefeeds per-se, but it will take a second (or third) step after our initial resumability support until you could really use them for multi-master replication. It's something we would like to support eventually though.

@gbloisi

This comment has been minimized.

Show comment
Hide comment
@gbloisi

gbloisi May 6, 2016

I was wondering if there is still a plan or will to support #2953 use case. Having a b-tree timestamp or incremental token would be nice also for performing historical queries or as complement for bi-temporal logic.

gbloisi commented May 6, 2016

I was wondering if there is still a plan or will to support #2953 use case. Having a b-tree timestamp or incremental token would be nice also for performing historical queries or as complement for bi-temporal logic.

@danielmewes

This comment has been minimized.

Show comment
Hide comment
@danielmewes

danielmewes May 6, 2016

Member

@gbloisi Yes definitely. Though I don't think it will be useable for historical queries.

There are basically two planned degrees of reliable changefeeds:

  1. Surviving short disconnects, as long as the client and servers remain up. This is the part for which we have settled an API so far. In this mode, either no change will get lost and the changefeed can just be resumed, or all changes will get lost and the changefeed will need to be restarted.
  2. Surviving restarts and disconnects of both the client or server. Even permanent server failures can be sustained, as long as enough replicas are left. Picking up at a given point will be based on some sort of b-tree timestamp token that the client needs to persist (if client restarts should be survived without starting over from scratch). With this approach, a changefeed can always be resumed. However it will have "squash"-like semantics, i.e. it will omit intermediate values of any documents. It will also require an additional "delete range" notification and will sometimes emit changes for documents that weren't actually changed. There will be further restrictions, e.g. on the types of queries on which such a changefeed can be used. The goal of this mode is to keep a copy of the data synced with the current table state. A primary use case is for replicating RethinkDB data into a different secondary data store, such as ElasticSearch. For this mode, the API and exact behavior are not settled yet.
Member

danielmewes commented May 6, 2016

@gbloisi Yes definitely. Though I don't think it will be useable for historical queries.

There are basically two planned degrees of reliable changefeeds:

  1. Surviving short disconnects, as long as the client and servers remain up. This is the part for which we have settled an API so far. In this mode, either no change will get lost and the changefeed can just be resumed, or all changes will get lost and the changefeed will need to be restarted.
  2. Surviving restarts and disconnects of both the client or server. Even permanent server failures can be sustained, as long as enough replicas are left. Picking up at a given point will be based on some sort of b-tree timestamp token that the client needs to persist (if client restarts should be survived without starting over from scratch). With this approach, a changefeed can always be resumed. However it will have "squash"-like semantics, i.e. it will omit intermediate values of any documents. It will also require an additional "delete range" notification and will sometimes emit changes for documents that weren't actually changed. There will be further restrictions, e.g. on the types of queries on which such a changefeed can be used. The goal of this mode is to keep a copy of the data synced with the current table state. A primary use case is for replicating RethinkDB data into a different secondary data store, such as ElasticSearch. For this mode, the API and exact behavior are not settled yet.
@mans2singh

This comment has been minimized.

Show comment
Hide comment

+1

@sheerun

This comment has been minimized.

Show comment
Hide comment
@sheerun

sheerun Jun 23, 2016

In my use case I need to reliably process each change in rethinkdb.. Without this feature it's probably not possible? Changefeeds don't give any guarantees.

sheerun commented Jun 23, 2016

In my use case I need to reliably process each change in rethinkdb.. Without this feature it's probably not possible? Changefeeds don't give any guarantees.

@kerfab

This comment has been minimized.

Show comment
Hide comment
@kerfab

kerfab Jul 29, 2016

If RethinkDB integrated reliable changefeed consumption with resume feature, and group consumption with load-balancing feature, we would immediately replace Kafka for it.

But it does only half of the job. We thought using it for our web front-ends, but the "changefeed" feature would overlap with the "kafka" event notifications (notification of changes) and we do not want to add up on the techno stack.

Making this changefeed feature be more cluster friendly, is, IMO, a high priority.

I am interested for any RethinkDB alternative that provides delivery guarantees and Kafka-like consumption feature of changes. It sounds to me like an unicorn so far...

kerfab commented Jul 29, 2016

If RethinkDB integrated reliable changefeed consumption with resume feature, and group consumption with load-balancing feature, we would immediately replace Kafka for it.

But it does only half of the job. We thought using it for our web front-ends, but the "changefeed" feature would overlap with the "kafka" event notifications (notification of changes) and we do not want to add up on the techno stack.

Making this changefeed feature be more cluster friendly, is, IMO, a high priority.

I am interested for any RethinkDB alternative that provides delivery guarantees and Kafka-like consumption feature of changes. It sounds to me like an unicorn so far...

@RXminuS

This comment has been minimized.

Show comment
Hide comment
@RXminuS

RXminuS Jul 29, 2016

Well I'm not sure if the aim should be to replace kafka...db operations are
more expensive than log appending...so if you need extreme throughput then
kafka is pretty epic. And they have some other great features as well.
However where rethinkdb falls short now IMHO is that if you persist the
change feed to kafka with a service (we have a docker microservice running)
then there's no way to guarantee that every change is persisted (in case
you restart the service or error). Being able to resume a log tail would
remedy that
On Fri, 29 Jul 2016 at 14:44, fkerbouci notifications@github.com wrote:

If RethinkDB integrates reliable changefeed consumption with resume
feature, and group consumption with load-balance feature, we would
immediately replace Kafka for it.

But it does only half of the job. We thought using it for our web
front-ends, but the "changefeed" feature would overlap the "kafka" feature
(notification of changes) and we do not want to add up on the techno stack.

Making this changefeed feature be more cluster friendly, is, IMO, a high
priority.

I am interested for any RethinkDB alternative that provides delivery
guarantees and Kafka-like consumption feature of changes. It sounds to me
like an unicorn so far...


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#3471 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ADxC5REmiGUDfPLLK3OdsLjkP22bvuIrks5qafWagaJpZM4DLvt8
.

RXminuS commented Jul 29, 2016

Well I'm not sure if the aim should be to replace kafka...db operations are
more expensive than log appending...so if you need extreme throughput then
kafka is pretty epic. And they have some other great features as well.
However where rethinkdb falls short now IMHO is that if you persist the
change feed to kafka with a service (we have a docker microservice running)
then there's no way to guarantee that every change is persisted (in case
you restart the service or error). Being able to resume a log tail would
remedy that
On Fri, 29 Jul 2016 at 14:44, fkerbouci notifications@github.com wrote:

If RethinkDB integrates reliable changefeed consumption with resume
feature, and group consumption with load-balance feature, we would
immediately replace Kafka for it.

But it does only half of the job. We thought using it for our web
front-ends, but the "changefeed" feature would overlap the "kafka" feature
(notification of changes) and we do not want to add up on the techno stack.

Making this changefeed feature be more cluster friendly, is, IMO, a high
priority.

I am interested for any RethinkDB alternative that provides delivery
guarantees and Kafka-like consumption feature of changes. It sounds to me
like an unicorn so far...


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#3471 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ADxC5REmiGUDfPLLK3OdsLjkP22bvuIrks5qafWagaJpZM4DLvt8
.

@eav

This comment has been minimized.

Show comment
Hide comment
@eav

eav Aug 5, 2016

Guys, you should take reliable changefeeds feature (which survives client or server restarts) more seriously.... this is "must have" in real time world, right now we have to use rethinkdb as a secondary db or not use it at all, for example by implementing event sourcing we can ignore many rethinkdb features.

eav commented Aug 5, 2016

Guys, you should take reliable changefeeds feature (which survives client or server restarts) more seriously.... this is "must have" in real time world, right now we have to use rethinkdb as a secondary db or not use it at all, for example by implementing event sourcing we can ignore many rethinkdb features.

@bchavez bchavez referenced this issue in bchavez/RethinkDb.Driver Aug 21, 2016

Closed

Connection reset by peer #93

@thomasmodeneis thomasmodeneis referenced this issue in apollographql/apollo-server Dec 6, 2016

Closed

Real-time / streaming / live query / RethinkDB support #34

0 of 4 tasks complete
@Hronom

This comment has been minimized.

Show comment
Hide comment
@Hronom

Hronom Mar 22, 2017

@RXminuS I don't whant troughput that gives Kafka or RabbitMQ, I need gurantee that notification sended from RethinkDB is guranteed recived by Client. Thats all. I don't whant to use RethinkDB for streaming processing.

RethinkDB must be positioned as one source of truth DB and must replace stack of Message Broker used for notifications of changes in DB.

Imagine: you create microservice that use changefeeds for listening newly created/deleted users. This service must send email on user creation/deleteion to the admin. And voila, changefeeds send message to the microservice and it don't receive message because of problem with internal network. So what microservice should do? Use timer to retrieve all data from DB?
If network restores fast, did microservice at least receive exception that changefeed is broken, so it be able at least to recreate changefeed with INITIAL option enabled?

What is use cases of changefeeds after all?

Hronom commented Mar 22, 2017

@RXminuS I don't whant troughput that gives Kafka or RabbitMQ, I need gurantee that notification sended from RethinkDB is guranteed recived by Client. Thats all. I don't whant to use RethinkDB for streaming processing.

RethinkDB must be positioned as one source of truth DB and must replace stack of Message Broker used for notifications of changes in DB.

Imagine: you create microservice that use changefeeds for listening newly created/deleted users. This service must send email on user creation/deleteion to the admin. And voila, changefeeds send message to the microservice and it don't receive message because of problem with internal network. So what microservice should do? Use timer to retrieve all data from DB?
If network restores fast, did microservice at least receive exception that changefeed is broken, so it be able at least to recreate changefeed with INITIAL option enabled?

What is use cases of changefeeds after all?

@gebrits

This comment has been minimized.

Show comment
Hide comment
@gebrits

gebrits Apr 15, 2017

Wanted to pile-up on the requests to give this issue more priority.

As it stands, changefeeds can only be used for notifications for which data-loss it not that big of an issue. Since the real-time capabilities and with that changefeeds are at the core of Rethink's value proposition, getting this issue fixed should be among the highest in prio in my opinion.

My specific use case: a polyglot architecture where Rethink is the single source of truth, and all other databases (redis, elasticsearch, ...) are read caches (or Eager Read Derivations if you will) on top of this data.

This is a great use case with tremendous wide spread potential, but as it stands, it's unusable because the system as a whole cannot guarantee delivery. (i.e.: at-least-once-semantics)

So again, please redirect efforts a bit more to this issue. Thanks

gebrits commented Apr 15, 2017

Wanted to pile-up on the requests to give this issue more priority.

As it stands, changefeeds can only be used for notifications for which data-loss it not that big of an issue. Since the real-time capabilities and with that changefeeds are at the core of Rethink's value proposition, getting this issue fixed should be among the highest in prio in my opinion.

My specific use case: a polyglot architecture where Rethink is the single source of truth, and all other databases (redis, elasticsearch, ...) are read caches (or Eager Read Derivations if you will) on top of this data.

This is a great use case with tremendous wide spread potential, but as it stands, it's unusable because the system as a whole cannot guarantee delivery. (i.e.: at-least-once-semantics)

So again, please redirect efforts a bit more to this issue. Thanks

@riyadhzen

This comment has been minimized.

Show comment
Hide comment
@riyadhzen

riyadhzen Jun 7, 2017

Hi

I just want to know what is the current status on this, is there anybody working on this?

thanks in advance.

Hi

I just want to know what is the current status on this, is there anybody working on this?

thanks in advance.

@thomasmodeneis

This comment has been minimized.

Show comment
Hide comment
@thomasmodeneis

thomasmodeneis Jun 7, 2017

@riyadhzen I think there is no work being done on this issue, the work-around still is to have all the registers you are interested in track with a timestamp and persist this timestamp after each batch is updated. This will enable you to restart the change-feeds at any given point in time and will reduce the problem to something manageable.

@riyadhzen I think there is no work being done on this issue, the work-around still is to have all the registers you are interested in track with a timestamp and persist this timestamp after each batch is updated. This will enable you to restart the change-feeds at any given point in time and will reduce the problem to something manageable.

@riyadhzen

This comment has been minimized.

Show comment
Hide comment
@riyadhzen

riyadhzen Jun 7, 2017

@thomasmodeneis, I understood from your words that I should keep a timestamp in the documents this way if my feed gets disconnected all I have to do is reconnect from the last timestamp onwards.

Thank you.

@thomasmodeneis, I understood from your words that I should keep a timestamp in the documents this way if my feed gets disconnected all I have to do is reconnect from the last timestamp onwards.

Thank you.

@aral

This comment has been minimized.

Show comment
Hide comment
@aral

aral Jan 4, 2018

Having just arrived here from the node-rethinkdb-job-queue thread, I just wanted to inquire about the state of this.

As I understand it, changefeeds are unreliable. Is this correct?

(Or is this only an issue in clustered deployments? Given the nature of our project – personal web sites – there should never really be a reason to cluster a deployment as they will be deployments for one. I’m pretty sure that even if Stephen Fry decided to get one he wouldn’t need a cluster.) :)

I was considering making RethinkDB the “single source of truth” for an ActivityPub server implementation I’ve just started working on but, after reading the posts here (e.g., @gebrits’s), I’m reconsidering that.

Can anyone using changefeeds in production please tell me if I’m in for a world of hurt if I decide to depend on them? Would love to hear from folks with practical experience in this area.

(Also, if changefeeds – a core, distinguishing, and loudly advertised feature – is unreliable, this should be mentioned in the documentation, right at the top. This is not something you want to find out when you’re several months into dev.)

aral commented Jan 4, 2018

Having just arrived here from the node-rethinkdb-job-queue thread, I just wanted to inquire about the state of this.

As I understand it, changefeeds are unreliable. Is this correct?

(Or is this only an issue in clustered deployments? Given the nature of our project – personal web sites – there should never really be a reason to cluster a deployment as they will be deployments for one. I’m pretty sure that even if Stephen Fry decided to get one he wouldn’t need a cluster.) :)

I was considering making RethinkDB the “single source of truth” for an ActivityPub server implementation I’ve just started working on but, after reading the posts here (e.g., @gebrits’s), I’m reconsidering that.

Can anyone using changefeeds in production please tell me if I’m in for a world of hurt if I decide to depend on them? Would love to hear from folks with practical experience in this area.

(Also, if changefeeds – a core, distinguishing, and loudly advertised feature – is unreliable, this should be mentioned in the documentation, right at the top. This is not something you want to find out when you’re several months into dev.)

@srh

This comment has been minimized.

Show comment
Hide comment
@srh

srh Jan 4, 2018

Contributor

Hi, I don't know what you mean by node-rethinkdb-job-queue thread. Do you have a link?

Changefeeds are unreliable in the sense that they could fail, explicitly, and then if you make a new changefeed, you'll miss out on writes. They'll explicitly fail if one of the servers the feed is receiving changes from goes down or gets network partitioned. Or if the server your ran the query on goes down. Or if the client goes down.

You can hack your way around the problem of resuming a changefeed by putting a timestamp, either logical or wall-clock, on each row, then making an index by that timestamp (for efficiency), and then, when you want to restart a changefeed, doing so by starting a changefeed, and after that starting a range query on the secondary index by timestamp, piecing together the information that you've lost. There are of course issues to work out with timestamp monotonicity, which you can deal with but add an extra level of discomfort.

(I don't have practical experience in this area as a user -- I'm a developer of the DB -- so I don't have personal experience with the difficulty here.)

It would be possible to remove the issues with timestamp monotonicity if a server feature were developed that let your queries generate a (shard id, logical timestamp) pair that is guaranteed to increase (for each shard).

Contributor

srh commented Jan 4, 2018

Hi, I don't know what you mean by node-rethinkdb-job-queue thread. Do you have a link?

Changefeeds are unreliable in the sense that they could fail, explicitly, and then if you make a new changefeed, you'll miss out on writes. They'll explicitly fail if one of the servers the feed is receiving changes from goes down or gets network partitioned. Or if the server your ran the query on goes down. Or if the client goes down.

You can hack your way around the problem of resuming a changefeed by putting a timestamp, either logical or wall-clock, on each row, then making an index by that timestamp (for efficiency), and then, when you want to restart a changefeed, doing so by starting a changefeed, and after that starting a range query on the secondary index by timestamp, piecing together the information that you've lost. There are of course issues to work out with timestamp monotonicity, which you can deal with but add an extra level of discomfort.

(I don't have practical experience in this area as a user -- I'm a developer of the DB -- so I don't have personal experience with the difficulty here.)

It would be possible to remove the issues with timestamp monotonicity if a server feature were developed that let your queries generate a (shard id, logical timestamp) pair that is guaranteed to increase (for each shard).

@aral

This comment has been minimized.

Show comment
Hide comment
@aral

aral Jan 4, 2018

Thanks for the quick response, Sam. I meant the thread referenced above from the node-rethinkdb-job-queue project (grantcarthew/node-rethinkdb-job-queue#77).

So if I have a component, say, listening for a changefeeds to distribute messages to followers, if the feed fails, I will get an immediate notification and I can restart the feed (and if I’m using timestamps, I won’t lose my place?) That sounds acceptable for my use case, if so. Reading some of the earlier issues, I was under the impression that the fails were not immediately reported but that there was some lag.

Thanks again.

aral commented Jan 4, 2018

Thanks for the quick response, Sam. I meant the thread referenced above from the node-rethinkdb-job-queue project (grantcarthew/node-rethinkdb-job-queue#77).

So if I have a component, say, listening for a changefeeds to distribute messages to followers, if the feed fails, I will get an immediate notification and I can restart the feed (and if I’m using timestamps, I won’t lose my place?) That sounds acceptable for my use case, if so. Reading some of the earlier issues, I was under the impression that the fails were not immediately reported but that there was some lag.

Thanks again.

@zappjones

This comment has been minimized.

Show comment
Hide comment
@zappjones

zappjones Jan 4, 2018

Hi aral -

We've been using rethinkdb for over a year in production and initially leaned heavily on changefeeds. With a low amount of database traffic things seemed to be working quite fine. Once traffic was increased and we added more changefeeds into the mix all queries (reads / writes / cfs) were slowed down into an unusable state.

Within the last month we just removed our last changefeed from our server stack. We are still using rethink as our primary nosql db but all realtime communication is now done through redis pubsubs.

Changefeeds hold a lot of great promise but I would not currently recommend using them in a production environment.

In our setup we were using three rethink nodes each running on a i3.4xl ec2 instance. At about 15-20k concurrent changefeeds is when issues would crop up.

Hi aral -

We've been using rethinkdb for over a year in production and initially leaned heavily on changefeeds. With a low amount of database traffic things seemed to be working quite fine. Once traffic was increased and we added more changefeeds into the mix all queries (reads / writes / cfs) were slowed down into an unusable state.

Within the last month we just removed our last changefeed from our server stack. We are still using rethink as our primary nosql db but all realtime communication is now done through redis pubsubs.

Changefeeds hold a lot of great promise but I would not currently recommend using them in a production environment.

In our setup we were using three rethink nodes each running on a i3.4xl ec2 instance. At about 15-20k concurrent changefeeds is when issues would crop up.

@srh

This comment has been minimized.

Show comment
Hide comment
@srh

srh Jan 4, 2018

Contributor

There should be some lag because it does take some time to notice that a server's not responding and decide that it's timed out.

It might also be that there's some honest-to-god bug where a changefeed just never gets responses instead of a notification happening.

The documentation does a poor job of describing what error conditions are possible and what sort of errors can be returned. So I've got some uncertainty there.

@zappjones is right -- if you have a large number of changefeeds listening on key ranges (and not individual documents), you'll pay a huge CPU-time price to see if each write should be sent to each of the change feeds. (There could be a performance problem when you have a ton of change feeds on individual documents, too, but it wouldn't be for the same reason.)

Contributor

srh commented Jan 4, 2018

There should be some lag because it does take some time to notice that a server's not responding and decide that it's timed out.

It might also be that there's some honest-to-god bug where a changefeed just never gets responses instead of a notification happening.

The documentation does a poor job of describing what error conditions are possible and what sort of errors can be returned. So I've got some uncertainty there.

@zappjones is right -- if you have a large number of changefeeds listening on key ranges (and not individual documents), you'll pay a huge CPU-time price to see if each write should be sent to each of the change feeds. (There could be a performance problem when you have a ton of change feeds on individual documents, too, but it wouldn't be for the same reason.)

@aral

This comment has been minimized.

Show comment
Hide comment
@aral

aral Jan 5, 2018

Thanks, Sam. That’s really useful information. I’m going to spike out RethinkDB because I’d love to implement its beautiful workflow if at all possible. It would be awesome to see changefeed issues better documented. Would you like me to open a separate issue for that?

aral commented Jan 5, 2018

Thanks, Sam. That’s really useful information. I’m going to spike out RethinkDB because I’d love to implement its beautiful workflow if at all possible. It would be awesome to see changefeed issues better documented. Would you like me to open a separate issue for that?

@aral

This comment has been minimized.

Show comment
Hide comment
@aral

aral Jan 8, 2018

@zappjones Thank you so much, that’s invaluable information.

aral commented Jan 8, 2018

@zappjones Thank you so much, that’s invaluable information.

@ceelian

This comment has been minimized.

Show comment
Hide comment
@ceelian

ceelian Feb 22, 2018

@srh Can you elaborate a bit more on this statement:

is right -- if you have a large number of changefeeds listening on key ranges (and not individual documents), you'll pay a huge CPU-time price to see if each write should be sent to each of the change feeds.

As I understand you mean that the website example:

r.table('game').orderBy('score').limit(3).changes()

will have "bad" performance already on ~15.000 active changefeeds?

That would also be bad for us, we are currently evaluating rethinkdb but will have approximately 50.000 - 100.000 parallel active changefeeds with range queries (pagination). A lot of the live queries will be the same, so I am not sure if that will help with the performance (might be the case if rethinkdb does internal deduplication of changefeed subscriptions)

ceelian commented Feb 22, 2018

@srh Can you elaborate a bit more on this statement:

is right -- if you have a large number of changefeeds listening on key ranges (and not individual documents), you'll pay a huge CPU-time price to see if each write should be sent to each of the change feeds.

As I understand you mean that the website example:

r.table('game').orderBy('score').limit(3).changes()

will have "bad" performance already on ~15.000 active changefeeds?

That would also be bad for us, we are currently evaluating rethinkdb but will have approximately 50.000 - 100.000 parallel active changefeeds with range queries (pagination). A lot of the live queries will be the same, so I am not sure if that will help with the performance (might be the case if rethinkdb does internal deduplication of changefeed subscriptions)

@srh

This comment has been minimized.

Show comment
Hide comment
@srh

srh Feb 22, 2018

Contributor

I wasn't thinking of the limit example, just the range example (with small ranges, so that the overhead is "bad").

The limit example would also be hit by bad performance. Basically every write will interact with every range change feed, which costs 15000x work with 15000 change feeds.

It is possible to improve the db implementation, so that both this and limit change feeds were reasonably fast, but I don't know anybody that has the free time for that right now.

Contributor

srh commented Feb 22, 2018

I wasn't thinking of the limit example, just the range example (with small ranges, so that the overhead is "bad").

The limit example would also be hit by bad performance. Basically every write will interact with every range change feed, which costs 15000x work with 15000 change feeds.

It is possible to improve the db implementation, so that both this and limit change feeds were reasonably fast, but I don't know anybody that has the free time for that right now.

@srh

This comment has been minimized.

Show comment
Hide comment
@srh

srh Feb 22, 2018

Contributor

I'll have to check and see how identical subscriptions are treated. Various order_by/skip/limit change feeds ought to be handled together, and maybe those, and identical ranges, are handled more efficiently. I don't know that they are, so I can look sometime when I get the chance.

Contributor

srh commented Feb 22, 2018

I'll have to check and see how identical subscriptions are treated. Various order_by/skip/limit change feeds ought to be handled together, and maybe those, and identical ranges, are handled more efficiently. I don't know that they are, so I can look sometime when I get the chance.

@wmertens

This comment has been minimized.

Show comment
Hide comment
@wmertens

wmertens Mar 25, 2018

wmertens commented Mar 25, 2018

@srh

This comment has been minimized.

Show comment
Hide comment
@srh

srh Mar 26, 2018

Contributor

In fact, writes already do mark areas with a version number -- this is used for incremental replication. The current change feed API does provide old_val and new_val though. I'm not sure of the guarantees but I think there is one that values don't get "skipped" by a change feed.

Contributor

srh commented Mar 26, 2018

In fact, writes already do mark areas with a version number -- this is used for incremental replication. The current change feed API does provide old_val and new_val though. I'm not sure of the guarantees but I think there is one that values don't get "skipped" by a change feed.

@wmertens

This comment has been minimized.

Show comment
Hide comment
@wmertens

wmertens Mar 26, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment