Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

overlay are not unmounted #1688

Open
lissyx opened this issue Mar 27, 2023 · 8 comments
Open

overlay are not unmounted #1688

lissyx opened this issue Mar 27, 2023 · 8 comments

Comments

@lissyx
Copy link
Contributor

lissyx commented Mar 27, 2023

After upgrading to v0.4.0, I started to have sccache-dist being sluggish (~5 min to complete 10% of mozilla-central's configure when a full build without cache is ~4min).

After quick investigation, it is on server side. Trying to CTRL+C the sccache-dist server process results in:

  • long time before processes are killed
  • unable to restart the process after with:
sccache-dist: error: Overlay builder failed to start
sccache-dist: caused by: Failed to clean up builder directory
sccache-dist: caused by: failed to remove directory `XXX/sccache/build`
sccache-dist: caused by: Device or resource busy (os error 16)

A look at mount shows 1052 instances of mounted overlay similar to:

overlay on XXX/build/builds/746755680e580184e658dbc0c5af280e31b72bf026f2d64adb215757f9099101-769/target type overlay (rw,relatime,lowerdir=XXX/sccache/build/toolchains/746755680e5
80184e658dbc0c5af280e31b72bf026f2d64adb215757f9099101,upperdir=XXX/sccache/build/builds/746755680e580184e658dbc0c5af280e31b72bf026f2d64adb215757f9099101-769/upper,workdir=XXX/sccache/buil
d/builds/746755680e580184e658dbc0c5af280e31b72bf026f2d64adb215757f9099101-769/work)
@sylvestre
Copy link
Collaborator

@Xuanwo @drahnr does it bing a bell ? :)

@lissyx
Copy link
Contributor Author

lissyx commented Mar 27, 2023

Doing a second build in a row, I get 3631 overlay mounted at the end of the second build that is taking super long to finish.

@lissyx
Copy link
Contributor Author

lissyx commented Mar 27, 2023

Looks like I inversed passing env var with sudo. Properlying setting them, I can now see stuff I like (errors):

[2023-03-27T08:54:34Z ERROR sccache_dist::build] Failed to remove build directory XXX/build/builds/746755680e580184e658dbc0c5af280e31b72bf026f2d64adb215757f9099101-38: failed to remove directory `XXX/sccache/build/builds/746755680e580184e658dbc0c5af280e31b72bf026f2d64adb215757f9099101-38`
[2023-03-27T08:54:34Z ERROR sccache_dist::build] Failed to remove build directory XXX/sccache/build/builds/f06aab84da875aa1c85313a2a200f47cf9104f1ba35eb8df51245f98effa8a51-11: failed to remove directory XXX/sccache/build/builds/f06aab84da875aa1c85313a2a200f47cf9104f1ba35eb8df51245f98effa8a51-11`

@drahnr
Copy link
Collaborator

drahnr commented Mar 27, 2023

No, not aware. Suspicious that #1628 might have caused the issue, it's the only thing that modified code in the vincinity.

@lissyx
Copy link
Contributor Author

lissyx commented Mar 27, 2023

// Failing during cleanup is pretty unexpected, but we can still return the successful compile
// TODO: if too many of these fail, we should mark this builder as faulty
fn finish_overlay(&self, _tc: &Toolchain, overlay: OverlaySpec) {
// TODO: collect toolchain directories
let OverlaySpec {
build_dir,
toolchain_dir: _,
} = overlay;
if let Err(e) = fs::remove_dir_all(&build_dir) {
error!(
"Failed to remove build directory {}: {}",
build_dir.display(),
e
);
}
}

@lissyx
Copy link
Contributor Author

lissyx commented Mar 27, 2023

No, not aware. Suspicious that #1628 might have caused the issue, it's the only thing that modified code in the vincinity.

When I get more time I can try and bisect to verify, but so far, going back to v0.4.0.pre.10 (before 20a08fc) seems to be enough

@lissyx
Copy link
Contributor Author

lissyx commented Mar 27, 2023

No, not aware. Suspicious that #1628 might have caused the issue, it's the only thing that modified code in the vincinity.

When I get more time I can try and bisect to verify, but so far, going back to v0.4.0.pre.10 (before 20a08fc) seems to be enough

My guess is something in 20a08fc broke how overlay are unmounted, and this is visible via the error above, because the same way my manual rm -fr fails due to EBUSY on the mount point, fs::remote_all_dir() is likely failing due to the same thing.

@lissyx
Copy link
Contributor Author

lissyx commented Mar 27, 2023

I am still running as root via bubblewrap, in case that matters. I'll try and poke around in the code when I get some time, but I remain available to testing patches and/or prodiving more debug inputs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants