Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Fix Blackhole implementation for e2e tests #17950

Conversation

henrybear327
Copy link
Contributor

@henrybear327 henrybear327 commented May 6, 2024

Please read https://github.com/etcd-io/etcd/blob/main/CONTRIBUTING.md#contribution-flow.

Thanks to @fuweid for the input regarding #17938.

This is currently a PoC for using just 1 HTTP proxy to deal with blackhole requirements.

Problem

A peer will
(a) receive traffic from its peers
(b) initiate connections to its peers (via stream and pipeline).

Thus, the current mechanism of only blocking peer traffic via the peer's existing proxy is insufficient, since only scenario (a) is handled, and scenario (b) is not blocked at all.

Main idea

We introduce 1 shared HTTP proxy for all peers, where all peers will be proxying all the connections through it.

Instead of the current per-peer proxy design, and having to have separated listen and advertise ports.

The modified architecture will look something like this:

A -- shared HTTP proxy ----- B
        ^ newly introduced

By adding this HTTP proxy, we can block all in and out traffic that is initiated from a peer to others, without having to resort to external tools, such as iptables. It's verified that the blocking of traffic is complete, compared to previous solutions [2][3].

Implementation

The main subtasks are

  • set up an environment variable FORWARD_PROXY, because go will not parse HTTP_PROXY and HTTPS_PROXY that is using localhost or 127.0.0.1, regardless if the port is present or not
  • implement the shared HTTP proxy by extending the existing proxy server code (we need to be able to identify the source sender)
  • remove the existing proxy setup (the per-peer proxy)
  • implement enable/disable of the HTTP proxy in the e2e test

Testing

  • make gofail-enable && make build && make gofail-disable && \ go test -timeout 60s -run ^TestBlackholeByMockingPartitionLeader$ go.etcd.io/etcd/tests/v3/e2e -v -count=1
  • make gofail-enable && make build && make gofail-disable && \ go test -timeout 60s -run ^TestBlackholeByMockingPartitionFollower$ go.etcd.io/etcd/tests/v3/e2e -v -count=1

References

[1] Tracking issue #17737
[2] PR (V1) https://github.com/henrybear327/etcd/tree/fix/e2e_blackhole
[3] PR (V2) #17891
[4] PR (V3) #17938

siyuanfoundation and others added 5 commits May 5, 2024 23:23
Signed-off-by: Siyuan Zhang <sizhang@google.com>
Shorten the wait time for the open connection to expire to 5s

Signed-off-by: Siyuan Zhang <sizhang@google.com>
Signed-off-by: Chun-Hung Tseng <henrybear327@gmail.com>
Based on Fu Wei's idea discussed in the [issue](etcd-io#17737), we employ the blocking on L7 but without using external tools.

A peer will
(a) receive traffic from its peers
(b) initiate connections to its peers (via stream and pipeline).

Thus, the current mechanism of only blocking peer traffic via the peer's existing proxy is insufficient, since only scenario (a) is handled, and scenario (b) is not blocked at all.

We introduce an "HTTP proxy" for each peer, which will be proxying all the connections initiated from a peer to its peers.

The modified architecture will look something like this:
```
A -- A's HTTP proxy ----- B's (currently existing) proxy - B
     ^ newly introduced   ^ in the original codebase
```

By adding this HTTP proxy, we can block all in and out traffic that is initiated from a peer to others, without having to resort to external tools, such as iptables. It's verified that the blocking of traffic is complete, compared to previous solutions [2][3].

The main subtasks are
- set up an environment variable `FORWARD_PROXY`
- implement HTTP proxy by extending the existing proxy server code
- implement enable/disable of the HTTP proxy in the e2e test

The result is that for every peer, we will have the arch like this
```
A -- A's HTTP proxy (connections initiated from A will be forwarded from this proxy)
 |   ^ covers case (b)
 |
 --- A's (currently existing) proxy (advertised to other peers where the connection should come in from)
     ^ covers case (a)
```

- `make gofail-enable && make build && make gofail-disable && \
go test -timeout 60s -run ^TestBlackholeByMockingPartitionLeader$ go.etcd.io/etcd/tests/v3/e2e -v -count=1`
- `make gofail-enable && make build && make gofail-disable && \
go test -timeout 60s -run ^TestBlackholeByMockingPartitionFollower$ go.etcd.io/etcd/tests/v3/e2e -v -count=1`

- I run into `context deadline exceeded` sometimes
```
    etcd_mix_versions_test.go:175:
                Error Trace:    /Users/henrybear327/go/src/etcd/tests/e2e/etcd_mix_versions_test.go:175
                                                        /Users/henrybear327/go/src/etcd/tests/e2e/blackhole_test.go:75
                                                        /Users/henrybear327/go/src/etcd/tests/e2e/blackhole_test.go:31
                Error:          Received unexpected error:
                                [/Users/henrybear327/go/src/etcd/bin/etcdctl --endpoints=http://localhost:20006 put key-0 value-0] match not found.  Set EXPECT_DEBUG for more info Errs: [unexpected exit code [1] after running [/Users/henrybear327/go/src/etcd/bin/etcdctl --endpoints=http://localhost:20006 put key-0 value-0]], last lines:
                                {"level":"warn","ts":"2024-05-05T23:02:36.809726+0800","logger":"etcd-client","caller":"v3@v3.6.0-alpha.0/retry_interceptor.go:65","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0x140001ee960/localhost:20006","method":"/etcdserverpb.KV/Put","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
                                Error: context deadline exceeded
                                 (expected "OK", got []). Try EXPECT_DEBUG=TRUE
                Test:           TestBlackholeByMockingPartitionLeader
                Messages:       failed to put "key-0", error: [/Users/henrybear327/go/src/etcd/bin/etcdctl --endpoints=http://localhost:20006 put key-0 value-0] match not found.  Set EXPECT_DEBUG for more info Errs: [unexpected exit code [1] after running [/Users/henrybear327/go/src/etcd/bin/etcdctl --endpoints=http://localhost:20006 put key-0 value-0]], last lines:
                                {"level":"warn","ts":"2024-05-05T23:02:36.809726+0800","logger":"etcd-client","caller":"v3@v3.6.0-alpha.0/retry_interceptor.go:65","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0x140001ee960/localhost:20006","method":"/etcdserverpb.KV/Put","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
                                Error: context deadline exceeded
                                 (expected "OK", got []). Try EXPECT_DEBUG=TRUE
```

[References]
[1] issue etcd-io#17737
[2] PR (V1) https://github.com/henrybear327/etcd/tree/fix/e2e_blackhole
[3] PR (V2) etcd-io#17891

Signed-off-by: Chun-Hung Tseng <henrybear327@gmail.com>
Signed-off-by: Chun-Hung Tseng <henrybear327@gmail.com>
Thanks to Fu Wei for the input regarding etcd-io#17938.

[Problem]

A peer will
(a) receive traffic from its peers
(b) initiate connections to its peers (via stream and pipeline).

Thus, the current mechanism of only blocking peer traffic via the peer's existing proxy is insufficient, since only scenario (a) is handled, and scenario (b) is not blocked at all.

[Main idea]

We introduce 1 shared HTTP proxy for all peers. All peers will be proxying all the connections through it.

The modified architecture will look something like this:
```
A -- shared HTTP proxy ----- B
     ^ newly introduced
```

By adding this HTTP proxy, we can block all in and out traffic that is initiated from a peer to others, without having to resort to external tools, such as iptables. It's verified that the blocking of traffic is complete, compared to previous solutions [2][3].

[Implementation]

The main subtasks are
- set up an environment variable `FORWARD_PROXY`, because go will not parse HTTP_PROXY and HTTPS_PROXY that is using localhost or 127.0.0.1, regardless if the port is present or not
- implement the shared HTTP proxy by extending the existing proxy server code (we need to be able to identify the source sender)
- remove existing proxy setup (the per-peer proxy)
- implement enable/disable of the HTTP proxy in the e2e test

[Testing]

- `make gofail-enable && make build && make gofail-disable && \
go test -timeout 60s -run ^TestBlackholeByMockingPartitionLeader$ go.etcd.io/etcd/tests/v3/e2e -v -count=1`
- `make gofail-enable && make build && make gofail-disable && \
go test -timeout 60s -run ^TestBlackholeByMockingPartitionFollower$ go.etcd.io/etcd/tests/v3/e2e -v -count=1`

[References]
[1] Tracking issue etcd-io#17737
[2] PR (V1) https://github.com/henrybear327/etcd/tree/fix/e2e_blackhole
[3] PR (V2) etcd-io#17891
[4] PR (V3) etcd-io#17938

Signed-off-by: Chun-Hung Tseng <henrybear327@gmail.com>
@k8s-ci-robot
Copy link

Hi @henrybear327. Thanks for your PR.

I'm waiting for a etcd-io member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@henrybear327
Copy link
Contributor Author

henrybear327 commented May 6, 2024

/cc @siyuanfoundation @fuweid @serathius

Hi @fuweid, as we discussed offline, this is an attempt to use just 1 HTTP proxy for all peers to blackhole traffic to and from a specific peer. Code is crap but demonstrates the idea we had works!

I can't figure out a way to create a PR based on my other branches, thus looking here henrybear327@d57db40 will be cleaner!

}
connectResponse.Write(in)

// maintain connection mapping
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest to use one pull request so that reviewers can read all the context about the change.

Just think about that it's necessary to use single one http proxy here?

What about this?

Assuming that each member has their own HTTP proxy server to forward traffic to other members.
There is no any proxy server to listen on advertise URL.

Let's say that there're 3 members in cluster and they're named by A, B and C.
If we want to block any traffic between A and others, based on your interface, we can just call.

procA.Proxy.BlackholePeer(procB.PeerURL) // block traffic from A to B 
procA.Proxy.BlackholePeer(procC.PeerURL) // block traffic from A to C
procB.Proxy.BlackholePeer(procA.PeerURL) // block traffic from B to A
procC.Proxy.BlackholePeer(procA.PeerURL) // block traffic from C to A

To be honest, using Proxy-Authorization to carry port information looks weird to me.
So, IMO, setup each HTTP proxy is like sidecar and more easier to understand and control source and destination of traffic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that this draft PR is a bit extreme 😅
Let me revise a bit to your proposed architecture!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fuweid an attempt is made #17985

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

None yet

4 participants