Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[usage] Change instance runtime calculation: creationTime → startedTime, stoppedTime → stoppingTime #12195

Merged
merged 2 commits into from
Aug 22, 2022

Conversation

jankeromnes
Copy link
Contributor

@jankeromnes jankeromnes commented Aug 18, 2022

Description

Change instance runtime calculation:

  • Start: use startedTime instead of creationTime
    • don't count startup time at all, e.g. scale up, image build/pull, ...
    • don't double-count when waiting for a prebuild to finish
  • Stop: use stoppingTime instead of stoppedTime
    • don't count stopping time at all, e.g. backup time

Related Issue(s)

Fixes #12177

How to test

  1. Unit tests

Release Notes

NONE

Documentation

Werft options:

  • /werft with-preview
  • /werft with-payment

@svenefftinge
Copy link
Member

I don't know the background of this decision, but I think counting in the starting and stopping time makes sense, especially until we bill usage for storage because the starting and stopping phase highly depend on the size of the workspace backup.

@jankeromnes
Copy link
Contributor Author

jankeromnes commented Aug 18, 2022

Thanks for your feedback @svenefftinge!

I don't know the background of this decision, but I think counting in the starting and stopping time makes sense, especially until we bill usage for storage because the starting and stopping phase highly depend on the size of the workspace backup.

This is exactly the reason we initially went with runtime = stoppedTime - creationTime.

However, there is one problem with creationTime -- when you start a workspace, and there is a prebuild currently running, and you decide to wait for it, if we count the runtime from creationTime, we'll end up counting the "wait for prebuild" time twice (we bill you once for the full prebuild, and once for the full workspace runtime including the waiting time).

Regarding stoppingTime vs stoppedTime, I agree with you that it's roughly proportional to the size of the workspace. However, this will change once we roll out PVC mounts, and @geropl also mentioned that we sometimes don't reliably set stoppedTime (e.g. when instances get stuck in stopping) which wouldn't be fair to bill users for.

However, I do share your worries about the shorter runtime tracking -- by doing stoppingTime - startedTime, we don't account for:

  • scale-up time
  • image build time
  • image pull time
  • git clone or backup extraction time
  • backup time / workspace size

That's maybe okay at first, but might come back to bite us in the future e.g. when we want to start billing any of these, thus making Gitpod more expensive for users (it's always easier to start expensive and make it cheaper, than to start cheap and regularly make it more expensive 😬). But since creationTime has the double-billing issue, I don't currently have a good solution to this. I feel like this is still very open to discussion.

@geropl geropl self-assigned this Aug 18, 2022
@geropl
Copy link
Member

geropl commented Aug 18, 2022

@jankeromnes Thanks for identifying the status quo, incl. the (potential) changes.

@svenefftinge First and foremost, I wanted to have the discussion and document it somewhere. Maybe I just wasn't part of the last one..? 🤷

For context: here is what we do now (Chargebee): started -> stopping

Regarding "when does the session a) start and b) end":

  • start:
    • I feel we're already billing users for prebuilds (which is the where value added)
    • I don't see benefit in billing for scale-up time, because it adds no value for users
    • Things we might missing out on: a) image-build times, b) pull times. Especially if we plan to bill for a) separately later one, and find a way to exclude b), it might make sense to keep as is for now. But be aware that "time spent (waiting) on prebuilds" >>> "time spent (waiting) on image-builds"
  • end:
    • I see the backup argument, but that should really be gone with PVC
    • Also, in a lot of technical failure scenarios we set stoppedTime very late, which was one of the reasons to introduce stoppingTime (which we configure "on user intent") in the first place. If we move to stoppedTime again, I fear we'll be subject to more complaints here.

I have no hard feelings here, but feel that end == stoppingTime is a tad more important to me bc of the reliability argument.

@svenefftinge
Copy link
Member

svenefftinge commented Aug 19, 2022

Thanks for sharing your thoughts. Makes sense 👍

@@ -280,6 +287,7 @@ func TestReportGenerator_GenerateUsageReport(t *testing.T) {
ID: uuid.New(),
UsageAttributionID: db.NewTeamAttributionID(teamID.String()),
StartedTime: db.NewVarcharTime(time.Date(2022, 06, 1, 1, 0, 0, 0, time.UTC)),
StoppingTime: db.NewVarcharTime(time.Date(2022, 06, 1, 1, 0, 0, 0, time.UTC)),
StoppedTime: db.NewVarcharTime(time.Date(2022, 06, 1, 1, 0, 0, 0, time.UTC)),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the comment says: No creation time, invalid record. that should be No startedTime, invalid record and StartedTime should be removed, I guess.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, is this actually the case? This part of the query hasn't changed:

Where("wsi.creationTime < ?", TimeToISO8601(to)).
Where("wsi.startedTime != ?", "").

And when I remove the StartedTime, this changes the test behavior from {3 raw sessions; 1 invalid session; 2 usage records} to {2 raw sessions; 0 invalid session; 2 usage records}.

For now I've rolled back this suggestion, but please let me know if the test / the query logic should change here.

if instance.CreationTime.Time().Before(maximumStart) {
instance.CreationTime = db.NewVarcharTime(maximumStart)
if instance.StartedTime.Time().Before(maximumStart) {
instance.StartedTime = db.NewVarcharTime(maximumStart)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated to this PR, but manipulating instances here was surprising, especially since they are handed out through the usage report. I don't have enough context, yet, though. Just wanted to share.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, and think it would make more sense to manipulate usage records instead of instances.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are not manipulated, as we create copies here (not working on references/pointers).

But had to read twice as well.. 😆 🙈

@geropl
Copy link
Member

geropl commented Aug 19, 2022

Had a discussion with @jankeromnes again.

We noticed that if we would (have and) use deployedTime instead of startedTime, we would have an even better cut for "session start". It would:

  • exclude image-builds ✅ (not perfect, but ok for now, as the volume is really low)
  • exclude prebuilds ✔️
  • exclude scale-up time ✔️
  • include pull times ✔️
  • include content init (with PVC) ✔️

As we do not have deployedTime anymore, we decided to move forward with this PR as is for now.

I will document the discussion that happened here, and will ask Team Workspace how much effort it is to re-vive deployedTime.

Copy link
Member

@geropl geropl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code LGTM and tests work.

@roboquat roboquat merged commit a2cf9d3 into main Aug 22, 2022
@roboquat roboquat deleted the jx/usage-runtime branch August 22, 2022 06:23
@roboquat roboquat added deployed: webapp Meta team change is running in production deployed Change is completely running in production labels Aug 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
deployed: webapp Meta team change is running in production deployed Change is completely running in production release-note-none size/L team: webapp Issue belongs to the WebApp team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Validate we're calculcating Usage based on correct dates
4 participants