Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ws-daemon] do not fail workspace if git status failed during dispose #11331

Merged
merged 2 commits into from
Jul 14, 2022

Conversation

sagor999
Copy link
Contributor

Description

(This PR is stacked on top of #11330, please review commit: 8d2d1c6)

Some workspaces fail with cannot get git status, which is not a critical error.
Observed in some cases it failed due to potentially corrupt\misconfigured .git folder (potentially due to user's actions).

This PR would make it so that we don't fail workspace when this happens and instead continue disposing it.
For example, we already do that if workspace folder does not contain .git folder in it:

if !git.IsWorkingCopy(loc) {
log.WithField("loc", loc).WithField("checkout location", s.CheckoutLocation).WithFields(s.OWI()).Debug("did not find a Git working copy - not updating Git status")
return nil, nil
}

Main reasoning is that even if we cannot get git status, all that means is that we not going to show the user in dashboard uncommitted changes, which should not be a reason for failing whole workspace.

Related Issue(s)

Fixes #

How to test

Open workspace and corrupt your .git folder
Observe that workspace still finishes correctly, albeit without showing any uncommitted changes in dashboard.

Release Notes

none

Documentation

Werft options:

  • /werft with-preview

@sagor999 sagor999 requested review from csweichel and a team July 12, 2022 23:52
@github-actions github-actions bot added team: workspace Issue belongs to the Workspace team and removed size/S labels Jul 12, 2022
@sagor999
Copy link
Contributor Author

/hold
would like to get input from @csweichel on this one, so holding to make sure he has a chance to review

@aledbf
Copy link
Member

aledbf commented Jul 13, 2022

@sagor999 I think we need to update the configuration and configure /mnt/workingarea as safe using
git config --global --add safe.directory * instead of individual directories https://github.blog/2022-04-18-highlights-from-git-2-36/
(file /home/gitpod/.gitconfig in ws-daemon)

@sagor999
Copy link
Contributor Author

@aledbf I was referring to this particular case:
https://cloudlogging.app.goo.gl/8GkTztnkJ5MdpwAz8
(check ws-daemon error log there)

I think what you are referring to is a frequent similar error that seems to be coming from prebuilds (unsafe directory error).
I will try to take a look at it as well, but it is out of scope for this particular PR (even though it is kind of related)

Copy link
Contributor

@jenting jenting left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this approach 👍
I encountered a scenario with the error git status failed, but the volume snapshot is ready to use.

In the scenario, we enable the PVC feature flag and manually make the node into a NotReady state (by disabling the k3s-agent on the host).
Then, the workspace pod becomes a Terminating state—finally, the VolumeSnapshot object is created by ws-manager and taken the snapshot of the PVC.
After relaunching the workspace pod, the workspace pod launches at another Ready node, and the user's content is recovered from the VolumeSnapshot.

Without this PR change, the dashboard shows the error cannot get git status.

The pro is we keep the user's content by PVC w/o data loss, and the con is we can't see the git difference in the dashboard.

Copy link
Contributor

@csweichel csweichel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

(minor nit, hence the hold)

/hold

components/ws-daemon/pkg/content/service.go Outdated Show resolved Hide resolved
@csweichel
Copy link
Contributor

/hold

@sagor999
Copy link
Contributor Author

/unhold

@aledbf
Copy link
Member

aledbf commented Jul 13, 2022

/hold

(due to the build error)

@sagor999
Copy link
Contributor Author

sagor999 commented Jul 13, 2022

/werft run with-clean-slate-deployment=true

👍 started the job as gitpod-build-pavel-git-status-fix.3
(with .werft/ from main)

@sagor999
Copy link
Contributor Author

sagor999 commented Jul 13, 2022

/werft run with-clean-slate-deployment=true

👍 started the job as gitpod-build-pavel-git-status-fix.4
(with .werft/ from main)

@sagor999
Copy link
Contributor Author

something funky with unit tests in this build. each time it is a different test that fails, but opened it and did a leeway build and everything built fine...
going to try to rebase.

@mads-hartmann
Copy link
Contributor

mads-hartmann commented Jul 14, 2022

/werft run

👍 started the job as gitpod-build-pavel-git-status-fix.6
(with .werft/ from main)

@mads-hartmann
Copy link
Contributor

mads-hartmann commented Jul 14, 2022

The test succeeded this time. The set of of components that Leeway decided it needed to build is very different between gitpod-build-pavel-git-status-fix.5 and gitpod-build-pavel-git-status-fix.6 - I'm not quite sure why.

But given the failing tests were sporadic and unrelated to this PR I'll remove the hold so it can be merged and you can at least wake up to a merged PR ☺️ I'll following up to understand why Leeway would decide to build different packages when executed from the same branch at 10 hours apart.

/unhold

Update: Here is the internal Slack - I think that we're using a shared Leeway cache for all non-main branches which might hide test flakiness as you only need one successful build across all non-main branches. So you experienced the flakiness because you happened to be the first to build these components off main (if my theory is correct)

@roboquat roboquat merged commit af62571 into main Jul 14, 2022
@roboquat roboquat deleted the pavel/git-status-fix branch July 14, 2022 07:24
@roboquat roboquat added deployed: workspace Workspace team change is running in production deployed Change is completely running in production labels Jul 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
deployed: workspace Workspace team change is running in production deployed Change is completely running in production release-note-none size/XS team: workspace Issue belongs to the Workspace team
Projects
No open projects
Status: Done
Development

Successfully merging this pull request may close these issues.

None yet

6 participants