-
Notifications
You must be signed in to change notification settings - Fork 243
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhanced crash logs events for: report_crash_loop, pod_oom_kill_enricher, image_pull_backoff_reporter, KubePodCrashLooping #1282
Conversation
ganeshrvel
commented
Feb 9, 2024
- report_crash_loop
- pod_oom_kill_enricher
- image_pull_backoff_reporter
- KubePodCrashLooping
- report_crash_loop - pod_oom_kill_enricher - image_pull_backoff_reporter - KubePodCrashLooping
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice work
left some comments
blocks = [TableBlock( | ||
[[k, v] for (k, v) in pending_rows], | ||
["label", "value"], | ||
table_name="", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is redundant
"" is the default
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
return Enrichment( | ||
enrichment_type=EnrichmentType.crash_info, | ||
blocks=blocks, | ||
title="Pod Unscheduled Information") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would change to:
Unscheduled Pod Information
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
blocks: List[BaseBlock] = [] | ||
investigator = PendingInvestigator(pod) | ||
all_reasons = investigator.investigate() | ||
def get_pending_pod_enrichment(pod: Pod) -> Optional[Enrichment]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this Optional
? We never return None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -163,6 +183,10 @@ class ImagePullBackoffInvestigator: | |||
"err_template": 'Error response from daemon: manifest for .*? not found: manifest unknown: Failed to fetch ".*?"', | |||
"reason": ImagePullBackoffReason.ImageDoesntExist | ImagePullBackoffReason.TagNotFound, | |||
}, | |||
{ | |||
"err_template": r'Error response from daemon: Get ".*?": net\/http: request canceled while waiting for connection \(Client\.Timeout exceeded while awaiting headers\)', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make it less sensitive?
Like anything that includes "Timeout exceeded"
What's the downside of that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
investigator = PendingInvestigator(pod) | ||
all_reasons = investigator.investigate() | ||
def get_pending_pod_enrichment(pod: Pod) -> Optional[Enrichment]: | ||
pending_rows: List[List[str]] = [] | ||
message = get_unscheduled_message(pod) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this cover all the pending scenarios ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It did cover the required fields in the new enrichment sheet. But i have reverted back the MarkdownBlock
which had the reason for the pending state
event.add_enrichment(blocks) | ||
if pod_issues_enrichments: | ||
issue_message, issues_enrichments = pod_issues_enrichments | ||
blocks: List[BaseBlock] = [MarkdownBlock(f"{message_string}. {issue_message}")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be added to description, and not as Markdown
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
if len(pods_with_issue) < 1: | ||
logging.debug(f"`pods_with_issue` for found for issue: {issue}") | ||
return | ||
|
||
expected_pods = get_expected_replicas(event) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this needs to change
We have here the event
, the type of the issue, the wait message and wait reason (which are currently not in use)
We need a new function, get_explanation
Information about number of available pods, and number of unavailable should be taken from the resource status
For Deployment, Statefulset, DaemonSet, message should be:
X pods are available, Y pods are not ready due to $ISSUE TYPE.
($WAIT_REASON: $WAIT_MESSAGE)
For others (Pod, Jobs ...)
Pod is in not ready due to $ISSUE_TYPE
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
) | ||
|
||
enrichments = get_crash_report_enrichments(pod) | ||
if enrichments: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is neven None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
"is_empty": True, | ||
"remarks": f"Logs unavailable for container: {container_status.name}" | ||
} | ||
logging.error( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be changed to info
or removed.
Also, no point printing the container log, we know it's empty
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
replaced it with logging.info
) | ||
metadata = None | ||
if not container_log: | ||
metadata = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this for the UI? To present an empty log?
How does this look in Slack and the other Sinks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How come we don't see the empty file on Slack?
On any event, is there a way to send the empty log only to the UI sink?
Maybe we can create a new type of block, EmptyFileBlock
that is handled only in the UI sink? (and sends it as an empty file)
Or other solutions that will have a similar result?
optimized the enrichments new pods counting logic
Fixed crash pod log details
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice works
Added few more comments
@@ -245,8 +245,12 @@ def deployment_events_enricher(event: DeploymentEvent, params: ExtendedEventEnri | |||
event.add_enrichment([events_table_block], {SlackAnnotations.ATTACHMENT: True}, | |||
enrichment_type=EnrichmentType.k8s_events, title="Deployment Events") | |||
else: | |||
pods = list_pods_using_selector(dep.metadata.namespace, dep.spec.selector, "status.phase=Running") | |||
event.add_enrichment([MarkdownBlock(f"*Replicas: Desired ({dep.spec.replicas}) --> Running ({len(pods)})*")]) | |||
if dep.kind in ["Deployment", "Statefulset", "DaemonSet"]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a DeploymentEvent
, kind is always a deployment
In addition, I think this logic is not correct for DaemonSet
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
else: | ||
issue_text = issue.name | ||
|
||
if resource.kind in ["Deployment", "Statefulset", "DaemonSet"]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need a small fix here:
For DaemonSet the available and desired status fields are different.
I think you want:
desired = desiredNumberScheduled
available = numberAvailable
When you run kubectl get DaemonSet -o yaml some-ds
you see in status:
status:
currentNumberScheduled: 4
desiredNumberScheduled: 4
numberAvailable: 4
numberMisscheduled: 0
numberReady: 4
observedGeneration: 1
updatedNumberScheduled: 4
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
|
||
event.extend_description(message_text) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should always add it to description, and not only if we found an enrichment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
) | ||
metadata = None | ||
if not container_log: | ||
metadata = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How come we don't see the empty file on Slack?
On any event, is there a way to send the empty log only to the UI sink?
Maybe we can create a new type of block, EmptyFileBlock
that is handled only in the UI sink? (and sends it as an empty file)
Or other solutions that will have a similar result?
title="Pod Logs" | ||
) | ||
metadata = None | ||
if not log_data: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here regarding the empty logs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done. Added new EmptyFileBlock
- report_crash_loop - pod_oom_kill_enricher - image_pull_backoff_reporter - KubePodCrashLooping
optimized the enrichments new pods counting logic
Fixed crash pod log details
… into feature/enhance-crashlog-events
Optimized Events handling
src/robusta/core/reporting/blocks.py
Outdated
@@ -130,6 +130,35 @@ def truncate_content(self, max_file_size_bytes: int) -> bytes: | |||
return "\n".join(truncated_lines).encode("utf-8") | |||
|
|||
|
|||
class EmptyFileBlock(FileBlock): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should not extend a FileBlock
The point of not making it a FileBlock, was to not send it to all existing sinks.
But if this extends a FileBlock, it will be handled as a FileBlock by all sinks, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice work