Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LogWatch sometimes silentlly fails #4741

Closed
cmdjulian opened this issue Jan 6, 2023 · 6 comments
Closed

LogWatch sometimes silentlly fails #4741

cmdjulian opened this issue Jan 6, 2023 · 6 comments

Comments

@cmdjulian
Copy link

cmdjulian commented Jan 6, 2023

Describe the bug

When I create a new LogWatch and start consuming its output, right after the Kubernetes cluster is starting I sometimes don't get back the actual lines but rather the following output as a line:

{ "kind": "Status", "apiVersion":"v1", "metadata": {}, "status": "failur", "message": "Get \"https://10.1.32.200:10250/containerLogs/default/uuid/nginx?follow=true\": EOF", "code": 500 }

Afterwards the LogWatch directly terminates.
I feel like the Kubernetes api Server seems not to be ready to serve requests. It only happens occasionally. It would be really nice to get an Exception in that case and not just swallow it.

Fabric8 Kubernetes Client version

6.3.1

Steps to reproduce

  1. Create a LogWatch
  2. Use Kubernetes Cluster right after it got started

Expected behavior

I think the LogWatch should wait until the Cluster can serve logs or it should throw an Exception. I think both would be fine.

Runtime

other (please specify in additional context)

Kubernetes API Server version

1.24

Environment

Linux

Fabric8 Kubernetes Client Logs

No response

Additional context

I use k3s as cluster distro

@shawkins
Copy link
Contributor

shawkins commented Jan 6, 2023

#4637 should be applicable here - if your issue is due to the relevant pods not being ready, that will help wait until they are before making the log request.

Also ideally the built-in logic for retry / exponential backoff should apply here - but it currently does not due to the structuring of where that logic is located.

@cmdjulian
Copy link
Author

I did some digging. I don't think it's related to the pods health state. In my case, the pod is spawned by a Job and is in a failed state and is also not coming up again (totally normal). Retrying the operation after 20 seconds does not result in a error.
From the printed log message, the api server returned 500. Doesn't this more or less indicate a server error? If the pod is not ready / still in creation, I would expect more something in 4xx range.

@shawkins
Copy link
Contributor

Doesn't this more or less indicate a server error? If the pod is not ready / still in creation, I would expect more something in 4xx range.

500 is returned by the api server when the pod is not yet ready to return logs - a 400 would be more like a bad request. The fabric8 logic is checking for either Ready or Succeeded to know when a log request should work. If you have a situation where the Pod is ultimately in the Failed state, but still able to provide logs, then I don't think this check will work. A quick check of the kubectl client suggests that they don't wait on a particular state, but instead just rely on the builtin retry that would come with a 500 response.

@shawkins
Copy link
Contributor

shawkins commented Feb 9, 2023

With #4825 the fabric8 client can retry all 500's, even those on websocket requests.

@shawkins
Copy link
Contributor

Another thing of note - requests that are made after a pod is created, but before containers are initialized will return a 400.

@stale
Copy link

stale bot commented May 16, 2023

This issue has been automatically marked as stale because it has not had any activity since 90 days. It will be closed if no further activity occurs within 7 days. Thank you for your contributions!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants