Skip to content
This repository has been archived by the owner on May 6, 2020. It is now read-only.

kube-hunter CI job is flaky #145

Closed
invidian opened this issue Jan 20, 2020 · 9 comments · Fixed by #146
Closed

kube-hunter CI job is flaky #145

invidian opened this issue Jan 20, 2020 · 9 comments · Fixed by #146

Comments

@invidian
Copy link
Contributor

It sometimes doesn't finish within 7 minutes for some reason, which makes CI job to fail. We should investigate that.

invidian added a commit that referenced this issue Jan 20, 2020
So we can investigate, why kube-hunter job is sometimes taking long time
to finish.

Refs #145

Signed-off-by: Mateusz Gozdek <mateusz@kinvolk.io>
invidian added a commit that referenced this issue Jan 20, 2020
So we can investigate, why kube-hunter job is sometimes taking long time
to finish.

Refs #145

Signed-off-by: Mateusz Gozdek <mateusz@kinvolk.io>
invidian added a commit to invidian/lokomotive-kubernetes that referenced this issue Jan 20, 2020
So we can investigate, why kube-hunter job is sometimes taking long time
to finish.

Refs kinvolk-archives#145

Signed-off-by: Mateusz Gozdek <mateusz@kinvolk.io>
@invidian
Copy link
Contributor Author

The job is killed with following logs:

3 tasks left
3 tasks left
3 tasks left
3 tasks left
3 tasks left
('Connection aborted.', OSError(0, 'Error')) on 147.75.84.47:443
2 tasks left
2 tasks left
2 tasks left
2 tasks left
2 tasks left
2 tasks left
2 tasks left
2 tasks left
2 tasks left
2 tasks left
('Connection aborted.', OSError(0, 'Error')) on 147.75.84.193:8080
final hook is hanging
1 tasks left
final hook is hanging
1 tasks left
final hook is hanging
1 tasks left
final hook is hanging
1 tasks left
final hook is hanging
1 tasks left

I wonder there is some timeout missing for this last task... Just need to figure out a way to reproduce it, probably patch kube-hunter to figure out which task it is and then look into that.

CC @surajssd

@invidian
Copy link
Contributor Author

The last task eventually finished with following result, when I tried to reproduce it:

1 tasks left
final hook is hanging
1 tasks left
('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer')) on 147.75.32.35:6443
Starting new HTTPS connection (1): 147.75.32.35:6443
HTTPSConnectionPool(host='147.75.32.35', port=6443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f44a43355d0>: Failed to establish a new connection: [Errno 111] Connection refused')) on 147.75.32.35:6443
Event <class 'src.core.events.types.common.HuntFinished'> got published with <src.core.events.types.common.HuntFinished object at 0x7f44a4ae5110>

@invidian
Copy link
Contributor Author

It seems that kube-hunter scans /24 of obtained public IP of the pod (for outgoing traffic), looking for API server. That might be detected as an abuse by some IaaS providers (e.g. Hetzner). And that seems to be finding some false-positives (perhaps other clusters?). In combination with --active flag, it may attack other clusters then...

Also the kube-hunter runtime doesn't seem to be deterministic:

  • Cluster nodes: 2, Runtime: 4m54s
  • Cluster nodes: 3, Runtime: 5m59s
  • Cluster nodes: 3, Runtime: 89s
  • Cluster nodes: 3, Runtime: 2m6s
  • Cluster nodes: 2, Runtime: 7m5s

@invidian
Copy link
Contributor Author

Seems that some servers which kube-hunter tries to probe takes really long time to respond:

130 ✗ (1.270s) 11:29:15 invidian@dellxps15mateusz ~/repos/kinvolk/kube-hunter (master)$ curl -v -s -k https://147.75.32.35:6443
*   Trying 147.75.32.35:6443...
* TCP_NODELAY set
* Connected to 147.75.32.35 (147.75.32.35) port 6443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /etc/ssl/certs/ca-certificates.crt
  CApath: none
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* Operation timed out after 300495 milliseconds with 0 out of 0 bytes received
* Closing connection 0
28 ✗ (5m0s) 11:34:17 invidian@dellxps15mateusz ~/repos/kinvolk/kube-hunter (master)*$

I think HTTP probe should timeout earlier than 5 minutes...

@surajssd
Copy link
Contributor

Do you think we are missing any pre requisite checks that we should be doing before installing?

@invidian
Copy link
Contributor Author

Do you think we are missing any pre requisite checks that we should be doing before installing?

Can you elaborate? What checks do you have in mind for example? I'm not sure if I understand.

@invidian
Copy link
Contributor Author

Created following issues in upstream:

And one PR:

I also tested, that when added timeout to the discovery, then kube-hunter runs are much faster. I'll do one more round of testing, and my suggestion would be to use patched version of kube-hunter image until the issue is not solved upstream.

invidian added a commit that referenced this issue Jan 21, 2020
Which contains patch, which adds timeout when discovering kube-apiserver
instances, updates kube-hunter to latest version, but has reversed
patches, which breaks debug logging.

The image is build from source stored in
https://github.com/kinvolk/kube-hunter/tree/kinvolk-master

Closes #145

Signed-off-by: Mateusz Gozdek <mateusz@kinvolk.io>
@surajssd
Copy link
Contributor

Can you elaborate? What checks do you have in mind for example? I'm not sure if I understand.

Before we deploy kube-hunter we do following checks (not extensive) but to make sure cluster is responsive:

https://github.com/kinvolk/lokomotive-kubernetes/blob/1d4faacd1fa5f78aeb8444c6370ad16d88c46f46/scripts/kube-hunter.sh#L25-L32


I meant do we need to add anything more here?

@invidian
Copy link
Contributor Author

I meant do we need to add anything more here?

No, I think those checks are fine. I believe the issue is in kube-hunter itself, as described above.

invidian added a commit that referenced this issue Jan 21, 2020
Which contains patch, which adds timeout when discovering kube-apiserver
instances, updates kube-hunter to latest version, but has reversed
patches, which breaks debug logging.

The image is build from source stored in
https://github.com/kinvolk/kube-hunter/tree/kinvolk-master

Closes #145

Signed-off-by: Mateusz Gozdek <mateusz@kinvolk.io>
invidian added a commit that referenced this issue Jan 21, 2020
Which contains patch, which adds timeout when discovering kube-apiserver
instances, updates kube-hunter to latest version, but has reversed
patches, which breaks debug logging.

The image is build from source stored in
https://github.com/kinvolk/kube-hunter/tree/kinvolk-master

Closes #145

Signed-off-by: Mateusz Gozdek <mateusz@kinvolk.io>
invidian added a commit that referenced this issue Jan 21, 2020
So we can investigate, why kube-hunter job is sometimes taking long time
to finish.

Refs #145

Signed-off-by: Mateusz Gozdek <mateusz@kinvolk.io>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
2 participants