Join GitHub today
GitHub is home to over 36 million developers working together to host and review code, manage projects, and build software together.Sign up
Data may be stored on only a single server shortly after startup #10657
TiDB 3.0.0-rc.2, by design, starts up with a region with only a single replica, regardless of the configured target number of replicas. PD then gradually adds additional replicas until
This is not a problem in itself, but it does lead to an awkward possibility: in the early stages of a TiDB cluster, data may be acknowledged, but stored only on a single node, when the user expected that data to be replicated to multiple nodes. A single-node failure during that period could destroy acknowledged writes, or render the cluster partly, or totally, unusable. In our tests with Jepsen (which use small regions and are extra-sensitive to this phenomenon), a single-node network partition as late as 500 seconds into the test can result in a total outage, because some regions are only replicated to 1 or 2, rather than 3, nodes.
The configuration parameter for replica count is called
I'd also like to suggest that when a cluster has a configured replica count, TiDB should disallow transactions on regions which don't have at least that many replicas in their Raft group. That'd prevent the possibility of a single-node failure destroying committed data, which is something I'm pretty sure users don't expect to be possible!
referenced this issue
May 31, 2019
I've been experimenting with different settings to see how long it takes for all regions to be fully replicated. With the settings we were using (
In this case, it looks like most regions got to 3 or 2 replicas early in the test, but for some reason, new regions keep getting created, and old ones being destroyed, while no progress is being made on the original regions. I've set up jepsen 0268340 to log region IDs and replica counts: in an hour and 20 minutes, we go from
That final region, id=4055, gets replaced by a new, higher region every few seconds. I'm not sure why this is the case--we're not actually making any writes, or even connecting clients to this cluster at this point. It looks pretty well stuck. Full logs are here: 20190531T121838.000-0400.zip
With Jepsen 0268340, try something like this to reproduce. It may take a few runs--it doesn't seem to get stuck every time.
@aphyr How do you check bootstrap step, it takes 80+ minutes but I found the log that just cost one minutes, like region 46:
you can grep
and a better way to bootstrap the cluster is to wait for the first region make up itself replicas to the
We check for bootstrapping by performing an HTTP GET of PD's
Region 46 starting quickly is great, but I'm concerned that the highest region never seem to stabilize--do you know what might be going on there?
OK! I'll rewrite the setup code to block before starting TiDB. Do you think TiDB running could be preventing the final region from converging?
the final region is always the last range from