-
Notifications
You must be signed in to change notification settings - Fork 9.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Blackhole implementation for e2e tests #17938
base: main
Are you sure you want to change the base?
Fix Blackhole implementation for e2e tests #17938
Conversation
Hi @henrybear327. Thanks for your PR. I'm waiting for a etcd-io member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
9887b13
to
3e15f96
Compare
e94e96d
to
48d1936
Compare
/retitle Fix Blackhole implementation for e2e tests |
48d1936
to
6158507
Compare
Not a networking expert, would like to get some help. cc @aojea |
763b0d3
to
465b00c
Compare
Thanks to Fu Wei for the input regarding etcd-io#17938. [Problem] A peer will (a) receive traffic from its peers (b) initiate connections to its peers (via stream and pipeline). Thus, the current mechanism of only blocking peer traffic via the peer's existing proxy is insufficient, since only scenario (a) is handled, and scenario (b) is not blocked at all. [Main idea] We introduce 1 shared "HTTP proxy" for all peers. All peers will be proxying all the connections through it. The modified architecture will look something like this: ``` A -- shared HTTP proxy ----- B ^ newly introduced ``` By adding this HTTP proxy, we can block all in and out traffic that is initiated from a peer to others, without having to resort to external tools, such as iptables. It's verified that the blocking of traffic is complete, compared to previous solutions [2][3]. [Implementation] The main subtasks are - set up an environment variable `FORWARD_PROXY`, because go will not parse HTTP_PROXY and HTTPS_PROXY that is using localhost or 127.0.0.1, regardless if the port is present or not - implement the shared HTTP proxy by extending the existing proxy server code - remove existing proxy setup (the per-peer proxy) - implement enable/disable of the HTTP proxy in the e2e test [Testing] - `make gofail-enable && make build && make gofail-disable && \ go test -timeout 60s -run ^TestBlackholeByMockingPartitionLeader$ go.etcd.io/etcd/tests/v3/e2e -v -count=1` - `make gofail-enable && make build && make gofail-disable && \ go test -timeout 60s -run ^TestBlackholeByMockingPartitionFollower$ go.etcd.io/etcd/tests/v3/e2e -v -count=1` [References] [1] Tracking issue etcd-io#17737 [2] PR (V1) https://github.com/henrybear327/etcd/tree/fix/e2e_blackhole [3] PR (V2) etcd-io#17891 [4] PR (V3) etcd-io#17938 Signed-off-by: Chun-Hung Tseng <henrybear327@gmail.com>
Thanks to Fu Wei for the input regarding etcd-io#17938. [Problem] A peer will (a) receive traffic from its peers (b) initiate connections to its peers (via stream and pipeline). Thus, the current mechanism of only blocking peer traffic via the peer's existing proxy is insufficient, since only scenario (a) is handled, and scenario (b) is not blocked at all. [Main idea] We introduce 1 shared HTTP proxy for all peers. All peers will be proxying all the connections through it. The modified architecture will look something like this: ``` A -- shared HTTP proxy ----- B ^ newly introduced ``` By adding this HTTP proxy, we can block all in and out traffic that is initiated from a peer to others, without having to resort to external tools, such as iptables. It's verified that the blocking of traffic is complete, compared to previous solutions [2][3]. [Implementation] The main subtasks are - set up an environment variable `FORWARD_PROXY`, because go will not parse HTTP_PROXY and HTTPS_PROXY that is using localhost or 127.0.0.1, regardless if the port is present or not - implement the shared HTTP proxy by extending the existing proxy server code (we need to be able to identify the source sender) - remove existing proxy setup (the per-peer proxy) - implement enable/disable of the HTTP proxy in the e2e test [Testing] - `make gofail-enable && make build && make gofail-disable && \ go test -timeout 60s -run ^TestBlackholeByMockingPartitionLeader$ go.etcd.io/etcd/tests/v3/e2e -v -count=1` - `make gofail-enable && make build && make gofail-disable && \ go test -timeout 60s -run ^TestBlackholeByMockingPartitionFollower$ go.etcd.io/etcd/tests/v3/e2e -v -count=1` [References] [1] Tracking issue etcd-io#17737 [2] PR (V1) https://github.com/henrybear327/etcd/tree/fix/e2e_blackhole [3] PR (V2) etcd-io#17891 [4] PR (V3) etcd-io#17938 Signed-off-by: Chun-Hung Tseng <henrybear327@gmail.com>
Talked with @aojea, proposed L7 forward proxy is an improvement over the previous faulty approach, however it might not be necessarily what we want. We want to simulate a realistic network disruptions, and there is no way to do that using a L7 proxy. This is currently impossible with current setup where etcd processes communicate via localhost, to modify packets we need them to go through kernel network stack, and that's not possible if they share the same network stack. Unfortunately setting dedicated network stack this is pretty complicated, @aojea suggest using docker which sets it up for each container, however I'm worried about how much using docker will complicate etcd e2e testing. I would prefer to set up network ourselves, however this makes it less portable. Main benefit of using docker is portability across platforms. As we won't be able to build such solution anytime soon, I'm ok with still using L7 http proxy in the meantime. |
@henrybear327: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
a673a34
to
7357f81
Compare
- Remove To and From fields, as this is only used in reverse proxy - Add Listen field, to indicate which URL that the forward is listening for requests - Update the tests as required TODO - Figure out why DelayTx() isn't working anymore Reference: - etcd-io#17938 (comment) Signed-off-by: Chun-Hung Tseng <henrybear327@gmail.com>
7357f81
to
408dc8a
Compare
- Remove To and From fields, as this is only used in reverse proxy - Add Listen field, to indicate which URL that the forward is listening for requests - Update the tests as required TODO - Figure out why DelayTx() isn't working anymore Reference: - etcd-io#17938 (comment) Signed-off-by: Chun-Hung Tseng <henrybear327@gmail.com>
48c5035
to
c945f43
Compare
- Remove To and From fields, as this is only used in reverse proxy - Add Listen field, to indicate which URL that the forward is listening for requests - Update the tests as required TODO - Figure out why DelayTx() isn't working anymore Reference: - etcd-io#17938 (comment) Signed-off-by: Chun-Hung Tseng <henrybear327@gmail.com>
Signed-off-by: Siyuan Zhang <sizhang@google.com>
Shorten the wait time for the open connection to expire to 5s Signed-off-by: Chun-Hung Tseng <henrybear327@gmail.com>
Based on Fu Wei's idea discussed in the [issue](etcd-io#17737), we employ the blocking on L7 but without using external tools. [Background] A peer will (a) receive traffic from its peers (b) initiate connections to its peers (via stream and pipeline). Thus, the current mechanism of only blocking peer traffic via the peer's existing proxy is insufficient, since only scenario (a) is handled, and scenario (b) is not blocked at all. [Proposed solution] We introduce an forward proxy for each peer, which will be proxying all the connections initiated from a peer to its peers. The modified architecture will look something like this: ``` A -- A's forward proxy ----- B's reverse proxy - B ^ newly introduced ^ in the original codebase (renamed) ``` By adding this forward proxy, we can block all in and out traffic that is initiated from a peer to others, without having to resort to external tools, such as iptables. [Testing] - `make gofail-enable && make build && make gofail-disable && \ go test -timeout 60s -run ^TestBlackholeByMockingPartitionLeader$ go.etcd.io/etcd/tests/v3/e2e -v -count=1` - `make gofail-enable && make build && make gofail-disable && \ go test -timeout 60s -run ^TestBlackholeByMockingPartitionFollower$ go.etcd.io/etcd/tests/v3/e2e -v -count=1` [Issues] - I run into `context deadline exceeded` sometimes ``` etcd_mix_versions_test.go:175: Error Trace: /Users/henrybear327/go/src/etcd/tests/e2e/etcd_mix_versions_test.go:175 /Users/henrybear327/go/src/etcd/tests/e2e/blackhole_test.go:75 /Users/henrybear327/go/src/etcd/tests/e2e/blackhole_test.go:31 Error: Received unexpected error: [/Users/henrybear327/go/src/etcd/bin/etcdctl --endpoints=http://localhost:20006 put key-0 value-0] match not found. Set EXPECT_DEBUG for more info Errs: [unexpected exit code [1] after running [/Users/henrybear327/go/src/etcd/bin/etcdctl --endpoints=http://localhost:20006 put key-0 value-0]], last lines: {"level":"warn","ts":"2024-05-05T23:02:36.809726+0800","logger":"etcd-client","caller":"v3@v3.6.0-alpha.0/retry_interceptor.go:65","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0x140001ee960/localhost:20006","method":"/etcdserverpb.KV/Put","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"} Error: context deadline exceeded (expected "OK", got []). Try EXPECT_DEBUG=TRUE Test: TestBlackholeByMockingPartitionLeader Messages: failed to put "key-0", error: [/Users/henrybear327/go/src/etcd/bin/etcdctl --endpoints=http://localhost:20006 put key-0 value-0] match not found. Set EXPECT_DEBUG for more info Errs: [unexpected exit code [1] after running [/Users/henrybear327/go/src/etcd/bin/etcdctl --endpoints=http://localhost:20006 put key-0 value-0]], last lines: {"level":"warn","ts":"2024-05-05T23:02:36.809726+0800","logger":"etcd-client","caller":"v3@v3.6.0-alpha.0/retry_interceptor.go:65","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0x140001ee960/localhost:20006","method":"/etcdserverpb.KV/Put","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"} Error: context deadline exceeded (expected "OK", got []). Try EXPECT_DEBUG=TRUE ``` [References] [1] issue etcd-io#17737 [2] PR (V1) https://github.com/henrybear327/etcd/tree/fix/e2e_blackhole [3] PR (V2) etcd-io#17891 Signed-off-by: Chun-Hung Tseng <henrybear327@gmail.com> It's verified that the blocking of traffic is complete, compared to previous solutions [2][3]. [Implementation] The main subtasks are - set up an environment variable `E2E_TEST_FORWARD_PROXY_IP` - implement forward proxy by extending the existing proxy server code - implement enable/disable of the forward proxy in the e2e test The result is that for every peer, we will have the arch like this ``` A -- A's forward proxy (connections initiated from A will be forwarded from this proxy) | ^ covers case (b) | --- A's (currently existing) reverse proxy (advertised to other peers where the connection should come in from) ^ covers case (a) ```
Signed-off-by: Chun-Hung Tseng <henrybear327@gmail.com>
Signed-off-by: Chun-Hung Tseng <henrybear327@gmail.com> Co-authored-by: Iván Valdés Castillo <iv@nvald.es>
There is no need of reverse proxy after the introduction of forward proxy, as the forward proxy holds the information of the destination, we can filter connection traffic from there, and this can reduce 1 hop when the proxy is turned on. Signed-off-by: Chun-Hung Tseng <henrybear327@gmail.com>
Due to forward proxy's need to parse the CONNECT header, which is a L7 layer feature, thus we are splitting the proxy into 2 types, for better maintainability. Reference: - etcd-io#17985 (comment) Signed-off-by: Chun-Hung Tseng <henrybear327@gmail.com>
…witch to L7 proxy design If the first request is not a CONNECT, we will need to receive perfectly parsable HTTP packets, not just random byte streams, as Serve() will leverage ReadRequest() under the hood. Signed-off-by: Chun-Hung Tseng <henrybear327@gmail.com>
Signed-off-by: Chun-Hung Tseng <henrybear327@gmail.com>
We are using Serve(), which will be hugging the listener on our behave, so we can't randomly terminate the listener anymore Signed-off-by: Chun-Hung Tseng <henrybear327@gmail.com>
- Remove To and From fields, as this is only used in reverse proxy - Add Listen field, to indicate which URL that the forward is listening for requests - Update the tests as required TODO - Figure out why DelayTx() isn't working anymore Reference: - etcd-io#17938 (comment) Signed-off-by: Chun-Hung Tseng <henrybear327@gmail.com>
…ttp/https/and socks5) - It's L7 so we need to send perfectly crafted http request - Bug fix: Doing var httpTransportProxyParsingFunc = determineHTTPTransportProxyParsingFunc() will initialize the function once only (env var read on program init) Signed-off-by: Chun-Hung Tseng <henrybear327@gmail.com>
… sent first (containing the intended destination HOST). If the traffic to the destination is HTTP, no CONNECT request will be sent first, but normal HTTP request, with the HOST set to the final destination, will be sent. Signed-off-by: Chun-Hung Tseng <henrybear327@gmail.com>
Signed-off-by: Chun-Hung Tseng <henrybear327@gmail.com>
c945f43
to
718a960
Compare
Not fully ported over yet Signed-off-by: Chun-Hung Tseng <henrybear327@gmail.com>
718a960
to
f6c6629
Compare
Please read https://github.com/etcd-io/etcd/blob/main/CONTRIBUTING.md#contribution-flow.
Based on Fu Wei's idea discussed in the issue, we employ the blocking on L7 but without using external tools.
Problem
A peer will
(a) receive traffic from its peers
(b) initiate connections to its peers (via stream and pipeline).
Thus, the current mechanism of only blocking peer traffic via the peer's existing proxy is insufficient, since only scenario (a) is handled, and scenario (b) is not blocked at all.
Solution - main idea
We introduce an "HTTP proxy" for each peer, which will be proxying all the connections initiated from a peer to its peers.
The modified architecture will look something like this:
By adding this HTTP proxy, we can block all in and out traffic that is initiated from a peer to others, without having to resort to external tools, such as iptables. It's verified that the blocking of traffic is complete, compared to previous solutions [2][3].
Implementation
The main subtasks are
FORWARD_PROXY
: please read Fix Blackhole implementation for e2e tests #17938 (comment)The result is that for every peer, we will have the arch like this
Test
make gofail-enable && make build && make gofail-disable && \ go test -timeout 60s -run ^TestBlackholeByMockingPartitionLeader$ go.etcd.io/etcd/tests/v3/e2e -v -count=1
make gofail-enable && make build && make gofail-disable && \ go test -timeout 60s -run ^TestBlackholeByMockingPartitionFollower$ go.etcd.io/etcd/tests/v3/e2e -v -count=1
Known issue
context deadline exceeded
sometimes[References]
[1] issue #17737
[2] PR (V1) https://github.com/henrybear327/etcd/tree/fix/e2e_blackhole
[3] PR (V2) #17891