Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unblock relaxed write consistency level #3901

Merged

Conversation

takayuki
Copy link

This pull request contains two commits; one introducing an option in cmd/influx to select write consistency level, which has been set to 'any' by default, and another fixing writes possibly blocked even with relaxed write consistency level (e.g. any, one, or quorum).

Assume you have a 3-node cluster and one of those nodes went down. Write requests with write consistency level more relaxed than, or equal to quorum, should be completed immediately and successfully. But, the requests are possibly blocked waiting the dead node to respond.

@beckettsean
Copy link
Contributor

@corylanou

@@ -274,32 +278,32 @@ func (w *PointsWriter) writeToShard(shard *meta.ShardInfo, database, retentionPo
var wrote int
timeout := time.After(w.WriteTimeout)
var writeError error
for _, nodeID := range shard.OwnerIDs {
for i := 0; i < len(shard.OwnerIDs); i++ {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did this change? I don't see us using i, but I could be missing it. The original is much clearer and idiomatic in Go.

If it was to stop using the nodeID, you can simply do:

for range shard.OwnerIDS {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. The AsyncWriteResult and for loop change is unnecessary.

@corylanou
Copy link
Contributor

Overall, looks good. I would like to see a test or two added. One for testing the new flag added and that it parses correctly. Would like one more set of eyes as well to the change on the cluster points writer. Perhaps @otoolep or @jwilder

@otoolep
Copy link
Contributor

otoolep commented Aug 31, 2015

Thanks @takayuki

@jwilder definitely needs to see this. I understand the change to the write path code, but @jwilder needs to confirm that it is the right thing to do.

@jwilder
Copy link
Contributor

jwilder commented Sep 1, 2015

Moving the write >= required to the upper loop is the correct fix. I was able to verify that a slow writer could cause a write of any, one or quorum to block and return a timeout even after it successfully wrote to the required nodes. If the node was down (an we could not connect), it would return as quickly as possible in the current code.

@jwilder
Copy link
Contributor

jwilder commented Sep 1, 2015

@takayuki If you could rebase, revert the looping change, and make sure you've signed the CLA. This change should be good to go. Thanks for the fix!

@takayuki
Copy link
Author

takayuki commented Sep 2, 2015

Thank you @jwilder Would you spend some more with me, letting me think about the concern I originally had?

There are two loops involved here, one for starting goroutines and another for waiting for them to complete their tasks. The problem is, I thought, the second loop might be getting the results in a different order from one in which goroutines started. That's why I had to add AsyncWriteResult handling out-of-order arrivals.

Here is a conceptual example, taking the suggestion from @corylanou; the first loop starts node#1, then node#2, whereas the second loop is going to get the results from node#2, then node#1. I thought PointsWriter would have to take it into account.

    ownerIDs := [2]int{1,2}
    ch := make(chan int, len(ownerIDs))
    for _, nodeID := range ownerIDs {
        go func(nodeID int) {
            // node#1 sleeps for 10 secs, node#2 for 5 secs,
            // so node#2 is going to finish earlier.
            time.Sleep(time.Duration(10/nodeID)*time.Second)
            ch <- nodeID
        }(nodeID)
    }
    for range ownerIDs {
        select {
        case nodeID := <-ch:
            // node#2 arrives earlier, then node#1 later.
            fmt.Println(nodeID)
        }
    }

Takayuki Usui added 2 commits September 2, 2015 11:00
This patch introduces an option in influx allowing users to
select write consistency level, which has been implicitly
set to 'any' by default so far.
Immediately return once the required number of writes are completed,
otherwise requests running with relaxed consistency levels (e.g. any
or one) would be blocked unexpectedly, for instance, waiting for dead
nodes to respond.
@otoolep
Copy link
Contributor

otoolep commented Sep 2, 2015

Yeah, that's what I thought too @takayuki -- the new loop made sense to me.

@otoolep
Copy link
Contributor

otoolep commented Sep 2, 2015

I've confirmed @takayuki has signed the CLA.

@takayuki takayuki force-pushed the unblock-relaxed-write-consistency-level branch from 5d5a650 to da8efa5 Compare September 2, 2015 06:35
@jwilder
Copy link
Contributor

jwilder commented Sep 2, 2015

@takayuki @otoolep The order in which the response arrive does not matter from a write perspective. This code is writing to all shard owners in parallel and returning as soon as it receives the required number of successful responses.

It does look like there is a bug in the logging here though: https://github.com/influxdb/influxdb/pull/3901/files#diff-99b560e650ccc1fa3f20c23df5a9b41bL287 which might be what the looping change was intended to fix. It could log the wrong NodeID if the results arrive out of order. That is not right so I think your looping change makes sense in light of that issue.

jwilder added a commit that referenced this pull request Sep 2, 2015
…ncy-level

Unblock relaxed write consistency level
@jwilder jwilder merged commit 1464ee5 into influxdata:master Sep 2, 2015
@jwilder
Copy link
Contributor

jwilder commented Sep 2, 2015

Thanks @takayuki !

@otoolep
Copy link
Contributor

otoolep commented Sep 2, 2015

Yes, your second point is the one @takayuki and I were making, it was just a logging thing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants