Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move StartedAt time to before starting the container #47003

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

LarsSven
Copy link

@LarsSven LarsSven commented Dec 30, 2023

Fixes #45445

- What I did
Moby reports the StartedAt and FinishedAt times of containers. The StartedAt time is inaccurate as it is set too late in the container startup process, being set way past the moment the container is started. This not only makes the StartedAt be reported as relatively late, and the difference between StartedAt and FinishedAt be too optimistic, but with very short-lived containers the StartedAt is reported later than the FinishedAt, meaning the container took a negative amount of time to complete.

This MR moves the StartedAt time recording to right before the container is told to start. This makes the code slightly more complex, as the StartedAt time now has to be passed to the moment it can be recorded, but improves the accuracy of the StartedAt time.

Hello-world prior to the change:
"StartedAt": "2023-12-28T22:15:43.373989192Z",
"FinishedAt": "2023-12-28T22:15:43.373677346Z"

Hello-world after the change:
"StartedAt": "2023-12-30T14:19:39.241397794Z",
"FinishedAt": "2023-12-30T14:19:39.423867882Z"

- How I did it
I record the start time using time.Now() right before ctr.Start. I then pass this variable to the SetRunning function, which is called later and actually records the StartedAt time. Previously, this time was generated within the SetRunning function. For all unit tests and the health monitor, I simply generate the time right on the function call (So I pass time.Now() as a parameter). For unit tests, the StartedAt time doesn't matter, and for the monitor, I could not find a better place to record the time prior to SetRunning.

- How to verify it

- Description for the changelog

Container's `StartedAt` property is now recorded before container startup, guaranteeing that the `FinishedAt` is always before `StartedAt`. 

- A picture of a cute animal (not mandatory but encouraged)
20170629_165245

@thaJeztah
Copy link
Member

/cc @cpuguy83 PTAL if this is what you had in mind; #45445 (comment)

It looks like the issue is that it sets the started at time (and generate said time) after it is started meanwhile the exit event may occur before that start time is even generated.

I must admit though that I'm not "stoked" about having to pass the extra arguments; do we know exactly what code-path triggers this? Is this tapering over a bug? (curious if this is because we're setting this state in 2 separate ways; once in containerStart but also in monitor.go (daemon.ProcessEvent))

@LarsSven
Copy link
Author

LarsSven commented Jan 3, 2024

I don't think we're tapering over a bug. If the monitor was setting the state and overriding the containerStart, then moving the containerStart time earlier wouldn't have fixed the bug, as the monitor would have still overridden it and we would have gotten the exact same results as before.

From what I can see it's simply just the issue of there being a lot of code between the actual container start command, and the moment we generate the time prior to this MR. This code takes about 0.2 seconds to execute on my PC, which puts the starting time of the container about 0.2 seconds after the time the container is actually started.

My team is trying to time the execution time of arbitrary untrusted code for our product, so we would really like to use docker's internal timing as it's more accurate than the external measuring we're currently doing, but that's not possible when the results are inconsistent and return negative time differences.

@thaJeztah
Copy link
Member

From what I can see it's simply just the issue of there being a lot of code between the actual container start command, and the moment we generate the time

Right, but it looks like this PR changes the semantics of the "StartedAt" field; before this PR, the StartedAt would reflect the time at which we change the container's state (SetRunning) ("container process is started") was updated. After this PR, the field reflects the time at which the start was initiated.

Both can make sense on their own, but they do carry a different meaning. Unfortunately these are areas that originated from early in the project, where intended behavior was not always documented (if at all), but other code in the same area seems to confirm that the intent was "started" (and running), as Prometheus counters, and container events are also updated there;

moby/daemon/start.go

Lines 214 to 227 in 1f6c42c

container.HasBeenManuallyRestarted = false
container.SetRunning(ctr, tsk, true)
container.HasBeenStartedBefore = true
daemon.setStateCounter(container)
daemon.initHealthMonitor(container)
if err := container.CheckpointTo(daemon.containersReplica); err != nil {
log.G(ctx).WithError(err).WithField("container", container.ID).
Errorf("failed to store container")
}
daemon.LogContainerEvent(container, events.ActionStart)
containerActions.WithValues("start").UpdateSince(start)

However, and it looks like this is where there's a discrepancy; if the process exits before it's considered started, then FinishedAt is set before (?), which is inconsistent as for that case (apparently) the container is considered started (and exited). Which is also why I was looking at the monitor code which seems to have its own opinion on "StartedAt" (based on events), and that lead me to the question "why are both trying to manage the same state?"

SetRunning looks to be doing a lot - perhaps too much?, and the initial argument on SetRunning() at least looks dodgy (I'd have to dig why it was added); perhaps there's something to look into there as well 🤔

@LarsSven
Copy link
Author

LarsSven commented Jan 6, 2024

Is there anything I can do to help out with this?

container/state.go Outdated Show resolved Hide resolved
@FWest98
Copy link

FWest98 commented Jan 19, 2024

Right, but it looks like this PR changes the semantics of the "StartedAt" field; before this PR, the StartedAt would reflect the time at which we change the container's state (SetRunning) ("container process is started") was updated. After this PR, the field reflects the time at which the start was initiated.
..., but other code in the same area seems to confirm that the intent was "started" (and running)...

Looking at the code (but not being an expert), it now seems like the StartedAt time reflects "some time after we initiated the start and surely it will be running now", but it certainly is not the exact time at which the program actually started running. Since the current code isn't exactly correct with intended semantics anyway, I would err on the earlier side to at least ensure that StartedAt is always before FinishedAt - any program will always have a nonzero runtime, so a negative one just does not really make sense.

@FWest98
Copy link

FWest98 commented Mar 2, 2024

What's the status on this? What are the next steps to take here?

@LarsSven
Copy link
Author

LarsSven commented Mar 2, 2024

I would also be curious to the status of this PR

@vvoland
Copy link
Contributor

vvoland commented Mar 4, 2024

Personally it's LGTM after addressing #47003 (review)

But @thaJeztah @cpuguy83 might have other thoughts here

Copy link
Member

@cpuguy83 cpuguy83 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not pretty but I'm not sure we can make it so in the current design.

LGTM

@LarsSven
Copy link
Author

LarsSven commented Mar 8, 2024

@vvoland would you be able to review this PR again? I implemented your feedback.

Copy link
Contributor

@vvoland vvoland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Sorry I forgot that one can't take an address without assigning the expression to a variable.

container/state.go Show resolved Hide resolved
@LarsSven
Copy link
Author

@vvoland I implemented your changes.

Copy link
Contributor

@vvoland vvoland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! LGTM; there are just some small typos I didn't notice in my suggestion 🙈

container/state.go Outdated Show resolved Hide resolved
container/state.go Outdated Show resolved Hide resolved
@vvoland
Copy link
Contributor

vvoland commented Mar 12, 2024

Thanks ❤️
Code looks good now, just one last thing - can you please squash the commits into one?

Signed-off-by: Lars Andringa <l.s.andringa@rug.nl>
Signed-off-by: LarsSven <l.s.andringa@rug.nl>

Replaced boolean parameter by IsZero check

Signed-off-by: LarsSven <l.s.andringa@rug.nl>

Separated SetRunning into two functions

Signed-off-by: LarsSven <l.s.andringa@rug.nl>

Apply suggestions from code review

Documentation fixes

Co-authored-by: Paweł Gronowski <me@woland.xyz>
Signed-off-by: LarsSven <l.s.andringa@rug.nl>
@LarsSven
Copy link
Author

Done :)

Copy link
Contributor

@vvoland vvoland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@vvoland vvoland added this to the 27.0.0 milestone Mar 12, 2024
Copy link
Member

@laurazard laurazard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@vvoland vvoland modified the milestones: 26.1.0, 27.0.0 Apr 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

The StartDate of an exited container is later than the FinishedAt date.
6 participants