Retrieve termination log from all pods #336

andmat900 · 2025-01-07T11:24:26Z

Applicable Issues

We should be able to get termination logs from multiple pods #268

Description of the Change

This enables termination log retrieval from an arbitrary pod of a job.

Alternate Designs

The initial proposal includes two alternatives:

a modified version of the existing terminationLog() function which takes pod name prefix as argument
and a more generic terminationLogs() function that returns a map of termination logs for all pods/containers of a job, allowing the caller to process the output

Possible Drawbacks

Sign-off

Developer's Certificate of Origin 1.1

By making a contribution to this project, I certify that:

(a) The contribution was created in whole or in part by me and I
have the right to submit it under the open source license
indicated in the file; or

(b) The contribution is based upon previous work that, to the best
of my knowledge, is covered under an appropriate open source
license and I have the right under that license to submit that
work with modifications, whether created in whole or in part
by me, under the same open source license (unless I am
permitted to submit under a different license), as indicated
in the file; or

(c) The contribution was provided directly to me by some other
person who certified (a), (b) or (c) and I have not modified
it.

(d) I understand and agree that this project and the contribution
are public and that a record of the contribution (including all
personal information I submit with it, including my sign-off) is
maintained indefinitely and may be redistributed consistent with
this project or the open source license(s) involved.

Signed-off-by: Andrei Matveyeu, andrei.matveyeu@axis.com

internal/controller/jobs.go

internal/controller/environment_controller.go

internal/controller/jobs.go

t-persson · 2025-01-21T13:40:26Z

internal/controller/jobs.go

+// JobResults describes the status and result of an ETOS job which consists of one or more pods
+type JobResults struct {
+	PodResults []PodResults
+}
+
+// PodResults describes the status and result of a single pod of an ETOS job which consists of one or more containers
+type PodResults struct {
+	Name             string
+	ContainerResults []Result
+}


Why do we need two structs instead of just having a single Results struct?

We need to be able to determine conclusions and verdicts on different levels: container/pod/job/job group.

Why?
Don't we just want to get the termination log from the relevant Job. The callers would then need to do "Hey, I have this job, what's the result of the container with name=?"
What happens internally in the terminationLogs function should not matter to the caller.
Internally in the terminationLogs we would need to check which is the "current" pod that we should check result from, and then get the result of the proper container in that one.

So the API for callers would be the same:

result, err := terminationLog(ctx, r, environmentProvider, environmentrequest.Name) if err != nil { result.Description = err.Error() } if result.Description == "" { result.Description = "Failed to provision an environment - Unknown error" } description = fmt.Sprintf("%s; %s: %s", description, environmentProvider.Name, result.Description)

where the terminationLogs is something similar to this:

func terminationLog(...) (*Result, error) { var pods corev1.PodList if err := c.List(ctx, &pods, ...); err != nil { ... return } if len(pods.Items) == 0 { return &Result{...} ) pod := getLatestPodCreated(pods) for _, status := pod.Status.ContainerStatues { ... } }

or if we want to get results from all pods

type Results struct { Result // The result of the last created pod (the one that should be relevant) Results []Result } func terminationLog(...) (*Results, error) { var pods corev1.PodList if err := c.List(ctx, &pods, ...); err != nil { ... return } if len(pods.Items) == 0 { return &Result{...} ) for _, pod := range pods { ... if isLatest() { results.Description = result.Description results.Verdict = result.Verdict results.Conclusion = result.Conclusion } results.Results = append(results.Results, result) ... } }

Why getLatestPodCreated()? The original issue requires multiple pods: We should be able to get termination logs from multiple pods #268

In the last example, why is isLatest() determining the Verdict and Conclusion for all pods? Isn't it the combined result of all pods?

Using single Result struct.

Why getLatestPodCreated()? The original issue requires multiple pods: We should be able to get termination logs from multiple pods #268

That means that there may be multiple pods for a job, and we should get the results from the pod that is currently "active". We still only get the results from one pod, but we need to iterate over all pods to find the one whose results we should be interested in.

In the last example, why is isLatest() determining the Verdict and Conclusion for all pods? Isn't it the combined result of all pods?

As I wrote above: Our jobs will have at-most a single relevant pod, but they may launch multiple pods if we implement retries or they get rolled over to a new node.

Simplified solution starting from commit 1f8fa80.

internal/controller/jobs.go

internal/controller/testrun_controller.go

internal/controller/environment_controller.go

internal/controller/environmentrequest_controller.go

internal/controller/jobs.go

internal/controller/testrun_controller.go

…ltiple

Change-Id: Ief2cd8748054cda8b07593b8db8f4592e26344cb

Change-Id: Ic78f6007359660c83520910f86cf5f6bd075abb1

andmat900 requested a review from a team as a code owner January 7, 2025 11:24

andmat900 requested review from t-persson and fredjn and removed request for a team January 7, 2025 11:24

t-persson reviewed Jan 15, 2025

View reviewed changes

internal/controller/jobs.go Outdated Show resolved Hide resolved

t-persson reviewed Jan 17, 2025

View reviewed changes

internal/controller/jobs.go Outdated Show resolved Hide resolved

internal/controller/environment_controller.go Outdated Show resolved Hide resolved

internal/controller/jobs.go Outdated Show resolved Hide resolved

fredjn approved these changes Jan 17, 2025

View reviewed changes

andmat900 requested review from t-persson and fredjn January 20, 2025 15:29

t-persson requested changes Jan 21, 2025

View reviewed changes

andmat900 requested a review from t-persson January 21, 2025 15:24

fredjn reviewed Jan 27, 2025

View reviewed changes

andmat900 added 17 commits January 27, 2025 16:13

Retrieve termination log from specified pod

d09d2f1

draft: handling of terminationLogs() output

b8afda5

draft: handle multiple pods/containers in a job

2127388

terminationLogs() return JobResult struct

c0f53a8

Remove terminationLog(), use terminationLogs() only

53b602f

getContainerResults(): full name matching

a42c88e

refactored Result, PodResult, JobResult

19dca19

comment fixes

d597bec

comment fix

a8bc211

Conclusion and Verdict inheritance fix

c94c497

more detailed logging in terminationLogs()

0ab61b5

fix for empty termination message

fc3b82a

rename JobResults -> JobGroupResult

58995b6

result.Name fix in jobs.go

52c13a3

simplified Result struct

b025c2b

comments updated

a291285

descriptions fixed

46446cd

andmat900 force-pushed the 20250107_termination_log_multiple branch from 94be94c to 46446cd Compare January 27, 2025 15:21

error message fix

825dc1a

andmat900 requested a review from fredjn January 27, 2025 15:31

JSON annotations for Result, string join fix

1fd718e

fredjn approved these changes Jan 28, 2025

View reviewed changes

andmat900 added 3 commits January 31, 2025 09:14

Merge branch 'eiffel-community:main' into 20250107_termination_log_mu…

15c9228

…ltiple

simplified solution with getLatestPodByCreationTimestamp()

f555891

Change-Id: Ief2cd8748054cda8b07593b8db8f4592e26344cb

comment update

1f8fa80

Change-Id: Ic78f6007359660c83520910f86cf5f6bd075abb1

t-persson approved these changes Feb 3, 2025

View reviewed changes

andmat900 merged commit 9096806 into eiffel-community:main Feb 3, 2025

andmat900 added a commit to andmat900/etos that referenced this pull request Feb 10, 2025

Retrieve termination log from all pods (eiffel-community#336)

1c77c8b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Retrieve termination log from all pods #336

Retrieve termination log from all pods #336

Uh oh!

andmat900 commented Jan 7, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

t-persson Jan 21, 2025

Uh oh!

andmat900 Jan 21, 2025

Uh oh!

t-persson Jan 24, 2025

Uh oh!

andmat900 Jan 24, 2025

Uh oh!

andmat900 Jan 27, 2025

Uh oh!

t-persson Jan 29, 2025

Uh oh!

andmat900 Jan 31, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Retrieve termination log from all pods #336

Retrieve termination log from all pods #336

Uh oh!

Conversation

andmat900 commented Jan 7, 2025

Applicable Issues

Description of the Change

Alternate Designs

Possible Drawbacks

Sign-off

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

t-persson Jan 21, 2025

Choose a reason for hiding this comment

Uh oh!

andmat900 Jan 21, 2025

Choose a reason for hiding this comment

Uh oh!

t-persson Jan 24, 2025

Choose a reason for hiding this comment

Uh oh!

andmat900 Jan 24, 2025

Choose a reason for hiding this comment

Uh oh!

andmat900 Jan 27, 2025

Choose a reason for hiding this comment

Uh oh!

t-persson Jan 29, 2025

Choose a reason for hiding this comment

Uh oh!

andmat900 Jan 31, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!