Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

*: run network fault tests with proxy #9081

Merged
merged 15 commits into from
Feb 6, 2018

Conversation

gyuho
Copy link
Contributor

@gyuho gyuho commented Jan 2, 2018

Problem

Current functional-tester depends on Linux utilities: iptables, iproute2, and tc. Which complicates and slows down tester deployments (require root access, impossible to test locally, hard to containerize, etc.). Also we never test real network partitions in CIs. We should simulate common network conditions to simplify test workflow (ideally package everything in container image).

Solution (what this PR does)
  • Implements proxy/transport.Proxy that simulates network faults, also useful for clientv3 integration network partition tests(Test clientv3 balancer under network partitions, other failures #8711) with the future embed package migration(use embed in integration package #8416).
  • Proxy with very small overhead (<500μs per request), negligible for etcd use cases (see benchmark results in tools/etcd-test-proxy/README.md).
  • functional-tester routes incoming advertise URL traffic to listen URL, to interfere peer/client traffic (no more iptables).
  • Containerizes functional-tester components (now just docker pull and run).
  • CI injects network partitions to functional-tester cluster (previously couldn't).
Other Changes

When network faults are injected, we expect cluster disrupts after election timeouts. Currently there's no waiting before recovering injected failures e.g. add latency and remove right away; nothing was being tested. failureDelay wrapper was there but never used. This PR adds delays to trigger leader elections, and ensures cluster continue to operate under such circumstances.

Future Work

@gyuho
Copy link
Contributor Author

gyuho commented Jan 12, 2018

@hexfusion @spzala Can you take a look when you have time?

The plan is merge this after 3.3 release (early February), so not urgent.

Thanks!

@spzala
Copy link
Member

spzala commented Jan 14, 2018

@gyuho thanks and sure, in mid of understanding couple other issues related work but will be reviewing afterwards considering no urgency here.

@gyuho
Copy link
Contributor Author

gyuho commented Jan 19, 2018

Also cc @jpbetz. Can you take a look?

This is first step to make functional-tester deploy easier.
Once this gets merged after 3.3 release, I will document container-based deploy workflow.

Copy link
Contributor

@jpbetz jpbetz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for getting this done! A couple suggestions.

@@ -23,11 +23,32 @@ clean:
rm -f ./clientv3/integration/127.0.0.1:* ./clientv3/integration/localhost:*
rm -f ./clientv3/ordering/127.0.0.1:* ./clientv3/ordering/localhost:*



_GO_VERSION = 1.9.2
ifdef GO_VERSION
_GO_VERSION = $(GO_VERSION)
endif
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In make, just GO_VERSION?=1.9.2 should be all you need to have a configurable env var with a default. The ifdef and _* var appear unnecessary here. Same for a few other of the below vars.

build-docker-functional-tester:
$(info GO_VERSION: $(_GO_VERSION))
$(info ETCD_VERSION: $(_ETCD_VERSION))
@cat ./Dockerfile-functional-tester | sed s/REPLACE_ME_GO_VERSION/$(_GO_VERSION)/ \
Copy link
Contributor

@jpbetz jpbetz Jan 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just FYI, sed -i.bak 's|TEMPLATE_VAR|$(REPLACEMENT)|g' filename also works and is slightly more compact. Not sure if we care.

Or

```bash
$ rm -rf `pwd`/agent-1 && mkdir -p `pwd`/agent-1 && docker run \
Copy link
Contributor

@jpbetz jpbetz Jan 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is quite a bit of script-style code. Should we maintain it in .sh files somewhere instead of in the README and reference the scripts from the README? Same for below.

var httpPort int
var verbose bool

func main() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we care about TLS? Maybe future work?

flag.StringVar(&to, "to", "localhost:2379", "Address URL to forward.")
flag.IntVar(&httpPort, "http-port", 2378, "Port to serve etcd-test-proxy API.")
flag.BoolVar(&verbose, "verbose", false, "'true' to run proxy in verbose mode.")
flag.Parse()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a short program description that is output with help?

@gyuho gyuho added the WIP label Jan 19, 2018
@gyuho gyuho force-pushed the network-fault-test-with-proxy branch from 2e370f0 to 0915c00 Compare January 24, 2018 22:15
@gyuho gyuho removed the WIP label Jan 24, 2018
@gyuho
Copy link
Contributor Author

gyuho commented Jan 24, 2018

@jpbetz All fixed. PTAL.

Thanks!

Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
@gyuho gyuho force-pushed the network-fault-test-with-proxy branch from 0915c00 to c61b660 Compare January 25, 2018 01:37
@etcd-io etcd-io deleted a comment from codecov-io Jan 25, 2018
@spzala
Copy link
Member

spzala commented Feb 5, 2018

@gyuho tons of changes :) ..looks good to me but I am gonna try to test using the readme you have provided, if that's helpful. Thanks!

@gyuho
Copy link
Contributor Author

gyuho commented Feb 5, 2018

@spzala Easiest way to run is PASSES=functional ./test.

Copy link
Contributor

@jpbetz jpbetz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Thanks!

@gyuho gyuho merged commit 3bd0114 into etcd-io:master Feb 6, 2018
@gyuho gyuho deleted the network-fault-test-with-proxy branch February 6, 2018 17:51
@spzala
Copy link
Member

spzala commented Feb 6, 2018

@gyuho just fyi, I have run the test successfully in my env. Thanks!

....
Finished 'functional' pass at Tue Feb  6 20:16:33 UTC 2018
Success

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants