Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upAvoid rebuilding a project when cwd changes #4788
Conversation
rust-highfive
assigned
matklad
Dec 7, 2017
This comment has been minimized.
This comment has been minimized.
rust-highfive
commented
Dec 7, 2017
|
r? @matklad (rust_highfive has picked a reviewer for you, use r? to override) |
matklad
reviewed
Dec 7, 2017
|
The implementation looks good to me, and I think "purge CWD, use paths relative to a fixed position" is the right approach, but I am not sure that "workspace root heuristic" is a good enough implementation. It's certainly a heuristic because some workspace members may be outside of the workspace, and some non-members may be inside workspace. Not sure if we need 100% correct implementation here though. |
| if fs_try!(f.read_until(0, &mut cwd)) == 0 { | ||
| return Ok(None) | ||
| } | ||
| let cwd = util::bytes2path(&cwd[..cwd.len()-1])?; |
This comment has been minimized.
This comment has been minimized.
matklad
Dec 7, 2017
Member
Am I correct that new versions of Cargo will fail to parse old depinfo, which is actually OK, because it'll cause only a rebuild, and not a build failure?
This comment has been minimized.
This comment has been minimized.
alexcrichton
Dec 7, 2017
Author
Member
Hm I'm not actually sure what will happen but the hashes are all changing so I don't think newer cargo will use the same files from older cargo.
| let ws_root = cx.ws.root(); | ||
| let src = unit.target.src_path(); | ||
| assert!(src.is_absolute()); | ||
| match src.strip_prefix(ws_root) { |
This comment has been minimized.
This comment has been minimized.
matklad
Dec 7, 2017
Member
I am not too happy about this solution, because there's no guarantee that workspace members reside below workspace root (see also #4787 (comment) :) ).
I think the ideal behavior here is to use all explicit worksapce members which lay outside of the root package as anchors for paths as well. Not sure if we need to support this right from the start though.
| fn hash<H: Hasher>(&self, _: &mut H) { | ||
| // ... | ||
| } | ||
| } |
This comment has been minimized.
This comment has been minimized.
matklad
Dec 7, 2017
Member
A slightly more elaborte design here would be to introduce a notion of explicitly relative path here and track path relative to workspace root and to other workspace members outside the root. Sort of like
struct PathRoots {
// a number of anchors, such as current workspace dir, a particular workspace member, users's home directory,
// CARGO_HOME, etc. This structure is "global" for cargo and is stored in Workspace/config.
roots: Vec::<PathBuf>,
}
struct RelativePath {
root_idx: usize // index in the global PathRoots
rel_path: PathBuf
}
impl RelativePath {
fn to_path(&self, roots: &PathRoots) -> PathBuf {
roots[self.root_idx].join(&self.rel_path)
}
}Not sure that we need this, but this can cope with workspaces which are not entirely under a workspace root. (Or perhaps it was a mistake to allow non-subdirectory members?).
This comment has been minimized.
This comment has been minimized.
|
@matklad thanks for taking a look! It's true yeah that workspace members outside the workspace root do pose a problem. That being said I wasn't really sure how we'd handle them at all :( I originally had a much more complicated patch which entirely refactored the So thinking about it... Right now the workspace root comes up in a few locations:
I think it's a pretty reasonable assumption to assume all rust files for a crate are within the same folder, so I think we could pretty easily adapt the dep-info files to only list files relative to the package root rather than the workspace root (basically do some serious postprocessing). For invocations of What do you think? Maybe we should fix up dep-info parsing to work without using |
This comment has been minimized.
This comment has been minimized.
|
Hm, the part about So, if you write
Hm, depinfo is produced by the compiler, so in theory solving the problem with How hard is it to implement the |
This comment has been minimized.
This comment has been minimized.
This assumption is also pretty important for IDEs, because we need to know which stuff to index, so +1 to making it an official and documented requirement :) Though, I can imagine someone doing weird stuff like |
alexcrichton
force-pushed the
alexcrichton:rename-works
branch
from
fc7c39a
to
dd0b41b
Dec 8, 2017
This comment has been minimized.
This comment has been minimized.
|
@matklad ok I've amended the first commit and pushed up another one. This should reduce the reliance on The one usage of
With this PR as-is if you rename the I believe the fix for this would be to change how Cargo invokes rustc, instead invoking I also believe that such a fix would purely involve local modification to this logic. I think though that this may be rare enough that we may want to punt on this for now? Does that makes sense? Curious what you think! |
matklad
reviewed
Dec 11, 2017
| assert!(src.is_absolute()); | ||
| match src.strip_prefix(ws_root) { | ||
| Ok(path) => (path.to_path_buf(), Some(ws_root.to_path_buf())), | ||
| Err(_) => (src.to_path_buf(), None), |
This comment has been minimized.
This comment has been minimized.
matklad
Dec 11, 2017
Member
Returning None here means that rustc will use pretty-much arbitrary cwd, is that right? Maybe it's better to lock this down to ws_root even in this case, just to get a bit more determinism? Than we can return just PathBuf here and avoid that comment about hashing only .0.
This comment has been minimized.
This comment has been minimized.
alexcrichton
Dec 12, 2017
Author
Member
Certainly plausible! I was originally thinking that this wouldn't matter but I think with plugins nowadays it could matter for sure. I'll change.
That being said though I don't think we can avoid hashing the second field of the return value because ws_root changes over time (if the whole dir is renamed) but we don't want that to cause recompiles.
This comment has been minimized.
This comment has been minimized.
matklad
Dec 12, 2017
Member
That being said though I don't think we can avoid hashing the second field of the return value because ws_root changes over time (if the whole dir is renamed) but we don't want that to cause recompiles.
The idea is that if we lock down cwd, then we don't need to return a pair here, we can return just the path.
matklad
reviewed
Dec 11, 2017
| /// when it was invoked. | ||
| /// | ||
| /// The serialized Cargo format will contain a list of files, all of which are | ||
| /// relative if they're under `root`. or absolute if they're elsewehre. |
This comment has been minimized.
This comment has been minimized.
matklad
Dec 11, 2017
Member
Hm, weren't depinfo files, produced by Cargo, used by some other tools? #3557
So, this change will break such clients :(
This comment has been minimized.
This comment has been minimized.
alexcrichton
Dec 12, 2017
Author
Member
Oh sure!
I think though (at least the intention) that these files are just those in .fingerprint, so the ones like target/debug/foo.d should still be present unmangled from rustc itself.
This comment has been minimized.
This comment has been minimized.
|
Yeah, so I think anchoring to the package root in most places is the correct thing to do, but I still have a couple of question (left inline as well):
|
alexcrichton
force-pushed the
alexcrichton:rename-works
branch
from
dd0b41b
to
4493400
Dec 12, 2017
This comment has been minimized.
This comment has been minimized.
|
For the last point about |
This comment has been minimized.
This comment has been minimized.
Targets should be package relative, and not workspace relative. In theory, we do have necessary information:
But I totally understand that it might be hard to actually implement. So yeah, I think it might worth it to look into package relative paths for targets and locking down cwd for rustc, but, otherwise, r+ at will, this looks great! :) |
alexcrichton
force-pushed the
alexcrichton:rename-works
branch
from
4493400
to
f688e9c
Dec 12, 2017
This comment has been minimized.
This comment has been minimized.
|
Oh sure yeah targets are package relative but we still don't want to hash the It was after I ended up realizing that when I jettisoned the ~500 lines of changes to get the refactoring done as it ended up not being used in the end :( In any case, thought I already did it, but just pushed up the changes for cwd from the package in non-workspace situations |
This comment has been minimized.
This comment has been minimized.
|
@bors r+ |
This comment has been minimized.
This comment has been minimized.
|
|
This comment has been minimized.
This comment has been minimized.
bors
added a commit
that referenced
this pull request
Dec 12, 2017
This comment has been minimized.
This comment has been minimized.
|
|
alexcrichton commentedDec 7, 2017
•
edited
This commit is targeted at solving a use case which typically comes up during CI
builds -- the
targetdirectory is cached between builds but the cwd of thebuild changes over time. For example the following scenario can happen:
/projects/a.targetdirectory is cached./projects/b.targetdirectory is restored to/projects/b.The last piece of behavior is indeed unfortunate! Cargo's internal hashing
currently isn't that resilient to changing cwd and this PR aims to help improve
the situation!
The first point of too-much-hashing came up with
Target::src_path. EachTargetwas hashed and stored for all compilations, and thesrc_pathfieldwas an absolute path on the filesystem to the file that needed to be compiled.
This path then changed over time when cwd changed, but otherwise everything else
remained the same!
This commit updates the handling of the
src_pathfield to simply ignore itwhen hashing. Instead the path we actually pass to rustc is later calculated and
then passed to the fingerprint calculation.
The next problem this fixes is that the dep info files were augmented after
creation to have the cwd of the compiler at the time to find the files at a
later date. This, unfortunately, would cause issues if the cwd itself changed.
Instead the cwd is now left out of dep-info files (they're no longer augmented)
and instead the cwd is recalculated when parsing the dep info later.
The final problem that this commit fixes is actually an existing issue in Cargo
today. Right now you can actually execute
cargo buildfrom anywhere in aproject and Cargo will execute the build. Unfortunately though the argument to
rustc was actually different depending on what directory you were in (the
compiler was invoked with a path relative to cwd). This path ends up being used
for metadata like debuginfo which means that different directories would cause
different artifacts to be created, but Cargo wouldn't rerun the compiler!
To fix this issue the matter of cwd is now entirely excluded from compilation
command lines. Instead rustc is unconditionally invoked with a relative path
if the path is underneath the workspace root, and otherwise it's invoked as an
absolute path (in which case the cwd doesn't matter).
Once all these fixes were added up it means that now we can have projects where
if you move the entire directory Cargo won't rebuild the original source!
Note that this may be a bit of a breaking change, however. This means that the
paths in error messages for cargo will no longer be unconditionally relative to
the current working directory, but rather relative to the root of the workspace
itself. Unfortunately this is moreso of a feature right now rather than a bug,
so it may be one that we just have to stomach.
Closes #3273