Validate HA configs #11

adieu · 2017-05-14T13:18:52Z

There should be no service disruption when:

Master went down
One node goes offline
One node got OOM
One node got drained

ledzep2 · 2017-05-15T08:12:37Z

Some thoughts on this one:

Need a test system to perform continuous test for a cluster definition (ex. our examples), especially HA tests:

Build a test cluster with certain configurations to test different scenarios
Automated build of master and nodes (maybe dynamically addition/removal of nodes too)
Automated deployment of some test applications
Disable some server or components to simulate a failure
Prform load test in the mean time and observe. Check for any service disruption (primary) and gather performance stats (secondary)
Tear down the test cluster

Part of this sounds out of reach of Archon. So it's potentially a new tool. Maybe a script that makes use of Archon. Maybe your new test framework too.

adieu · 2017-05-16T13:46:20Z

Will https://github.com/kubernetes/test-infra be helpful?

ledzep2 · 2017-05-16T14:04:24Z

Possible.

But due to lack of documentation, we have limited knowledge of what it is and how it should be used. For a deployed cluster, the most useful tests are e2e tests, which can be run from remote and are perfect for this case. However after looking through conformance test code, I suspect they don't really cover the issues listed above.

adieu · 2017-05-16T14:15:54Z

Can we build something like chaos monkey to simulate failures while watching for overall service health? We could leave it running for a while and gather some stats.

I think all the failure cases can be tested in a e2e tests. Maybe we just need a way to detect service disruption.

ledzep2 · 2017-05-16T14:40:19Z

Yes. But that's only one of the steps. If we want to automate the whole thing, we will need to automate the setup of the cluster in a real environment (Aliyun or AWS) with Archon, and the deployment of all these test related tools. Jenkins is an obvious option. Maybe we can start with automating setup of a staging 3-node cluster on Aliyun and run e2e tests on it.

adieu · 2017-05-17T02:02:54Z

How about we setup a cluster with archon running in it. Each test case could define their own cluster with yaml files. In Jenkinsfile, they setup a new cluster with kubectl. Then launch test in the newly created cluster and get the result. After the test ended, they could tear down the cluster using kubectl delete.

ledzep2 · 2017-05-17T05:41:38Z

Why not just use --local?

adieu · 2017-05-17T14:53:11Z

Because we don't have to setup credentials locally?

ledzep2 · 2017-05-17T15:19:00Z

I think it's acceptable to put a credential with limited privileges in the CI system for testing purposes. However, if the intermediate cluster you talked about is also a part of the test, it somehow makes sense.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Validate HA configs #11

Validate HA configs #11

adieu commented May 14, 2017

ledzep2 commented May 15, 2017 •

edited

Loading

adieu commented May 16, 2017 •

edited

Loading

ledzep2 commented May 16, 2017

adieu commented May 16, 2017

ledzep2 commented May 16, 2017 •

edited

Loading

adieu commented May 17, 2017

ledzep2 commented May 17, 2017

adieu commented May 17, 2017

ledzep2 commented May 17, 2017

Validate HA configs #11

Validate HA configs #11

Comments

adieu commented May 14, 2017

ledzep2 commented May 15, 2017 • edited Loading

adieu commented May 16, 2017 • edited Loading

ledzep2 commented May 16, 2017

adieu commented May 16, 2017

ledzep2 commented May 16, 2017 • edited Loading

adieu commented May 17, 2017

ledzep2 commented May 17, 2017

adieu commented May 17, 2017

ledzep2 commented May 17, 2017

ledzep2 commented May 15, 2017 •

edited

Loading

adieu commented May 16, 2017 •

edited

Loading

ledzep2 commented May 16, 2017 •

edited

Loading