-
Notifications
You must be signed in to change notification settings - Fork 6
Retrieve termination log from all pods #336
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Retrieve termination log from all pods #336
Conversation
internal/controller/jobs.go
Outdated
// JobResults describes the status and result of an ETOS job which consists of one or more pods | ||
type JobResults struct { | ||
PodResults []PodResults | ||
} | ||
|
||
// PodResults describes the status and result of a single pod of an ETOS job which consists of one or more containers | ||
type PodResults struct { | ||
Name string | ||
ContainerResults []Result | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need two structs instead of just having a single Results
struct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to be able to determine conclusions and verdicts on different levels: container/pod/job/job group.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why?
Don't we just want to get the termination log from the relevant Job. The callers would then need to do "Hey, I have this job, what's the result of the container with name=?"
What happens internally in the terminationLogs function should not matter to the caller.
Internally in the terminationLogs we would need to check which is the "current" pod that we should check result from, and then get the result of the proper container in that one.
So the API for callers would be the same:
result, err := terminationLog(ctx, r, environmentProvider, environmentrequest.Name)
if err != nil {
result.Description = err.Error()
}
if result.Description == "" {
result.Description = "Failed to provision an environment - Unknown error"
}
description = fmt.Sprintf("%s; %s: %s", description, environmentProvider.Name, result.Description)
where the terminationLogs is something similar to this:
func terminationLog(...) (*Result, error) {
var pods corev1.PodList
if err := c.List(ctx, &pods, ...); err != nil {
...
return
}
if len(pods.Items) == 0 {
return &Result{...}
)
pod := getLatestPodCreated(pods)
for _, status := pod.Status.ContainerStatues {
...
}
}
or if we want to get results from all pods
type Results struct {
Result // The result of the last created pod (the one that should be relevant)
Results []Result
}
func terminationLog(...) (*Results, error) {
var pods corev1.PodList
if err := c.List(ctx, &pods, ...); err != nil {
...
return
}
if len(pods.Items) == 0 {
return &Result{...}
)
for _, pod := range pods {
...
if isLatest() {
results.Description = result.Description
results.Verdict = result.Verdict
results.Conclusion = result.Conclusion
}
results.Results = append(results.Results, result)
...
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Why
getLatestPodCreated()
? The original issue requires multiple pods: We should be able to get termination logs from multiple pods #268 - In the last example, why is
isLatest()
determining theVerdict
andConclusion
for all pods? Isn't it the combined result of all pods?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using single Result
struct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Why
getLatestPodCreated()
? The original issue requires multiple pods: We should be able to get termination logs from multiple pods #268
That means that there may be multiple pods for a job, and we should get the results from the pod that is currently "active". We still only get the results from one pod, but we need to iterate over all pods to find the one whose results we should be interested in.
- In the last example, why is
isLatest()
determining theVerdict
andConclusion
for all pods? Isn't it the combined result of all pods?
As I wrote above: Our jobs will have at-most a single relevant pod, but they may launch multiple pods if we implement retries or they get rolled over to a new node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Simplified solution starting from commit 1f8fa80.
94be94c
to
46446cd
Compare
Change-Id: Ief2cd8748054cda8b07593b8db8f4592e26344cb
Change-Id: Ic78f6007359660c83520910f86cf5f6bd075abb1
Applicable Issues
Description of the Change
This enables termination log retrieval from an arbitrary pod of a job.
Alternate Designs
The initial proposal includes two alternatives:
terminationLog()
function which takes pod name prefix as argumentterminationLogs()
function that returns a map of termination logs for all pods/containers of a job, allowing the caller to process the outputPossible Drawbacks
Sign-off
Developer's Certificate of Origin 1.1
By making a contribution to this project, I certify that:
(a) The contribution was created in whole or in part by me and I
have the right to submit it under the open source license
indicated in the file; or
(b) The contribution is based upon previous work that, to the best
of my knowledge, is covered under an appropriate open source
license and I have the right under that license to submit that
work with modifications, whether created in whole or in part
by me, under the same open source license (unless I am
permitted to submit under a different license), as indicated
in the file; or
(c) The contribution was provided directly to me by some other
person who certified (a), (b) or (c) and I have not modified
it.
(d) I understand and agree that this project and the contribution
are public and that a record of the contribution (including all
personal information I submit with it, including my sign-off) is
maintained indefinitely and may be redistributed consistent with
this project or the open source license(s) involved.
Signed-off-by: Andrei Matveyeu, andrei.matveyeu@axis.com