Skip to content

Support platform-defined standard directories #5183

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

Support platform-defined standard directories #5183

wants to merge 1 commit into from

Conversation

soc
Copy link

@soc soc commented Mar 15, 2018

This change stops cargo from violating the operating system rules
regarding the placement of config, cache, ... directories on Linux,
macOS and Windows.

Existing directories and overrides are retained.

The precedence is as follows:

  1. use the CARGO_HOME environment variable if it exists (legacy)
  2. use CARGO_CACHE_DIR, CARGO_CONFIG_DIR etc. env vars if they exist
  3. use the ~/.cargo directory if it exists (legacy)
  4. follow operating system standards

A new cargo command, dirs, is added, which can provide path
information to other command line tools.

Fixes:
#1734
#1976
rust-lang/rust#12725

Addresses:
rust-lang/rfcs#1615
#148,
#3981

@rust-highfive
Copy link

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @alexcrichton (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

@soc
Copy link
Author

soc commented Mar 15, 2018

Hi everyone, I would love to get some feedback on this (and I couldn't reach anyone at #cargo).
There are probably still things that are wrong and need to be adapted, but I'd like to get some feedback early.

@soc
Copy link
Author

soc commented Mar 15, 2018

New cargo dirs command

Examples:

  • with existing .cargo directory
$ target/debug/cargo dirs
CARGO_CACHE_DIR:  "/home/soc/.cargo"
CARGO_CONFIG_DIR: "/home/soc/.cargo"
CARGO_DATA_DIR:   "/home/soc/.cargo"
CARGO_BIN_DIR:    "/home/soc/.cargo/bin"
  • without existing .cargo directory
$ target/debug/cargo dirs
CARGO_CACHE_DIR:  "/home/soc/.cache/cargo"
CARGO_CONFIG_DIR: "/home/soc/.config/cargo"
CARGO_DATA_DIR:   "/home/soc/.local/share/cargo"
CARGO_BIN_DIR:    "/home/soc/.local/bin/"

// This is written in the most straight-forward way possible, because it is
// hard as-is to understand all the different options, without trying to
// save lines of code.
pub fn cargo_dirs() -> CargoDirs {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should Config::cargo_dirs be renamed/moved to CargoDirs::new?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, new would looks slightly better I think, though current option is OK as well!

@bors
Copy link
Contributor

bors commented Mar 15, 2018

☔ The latest upstream changes (presumably #5176) made this pull request unmergeable. Please resolve the merge conflicts.

@aturon
Copy link
Member

aturon commented Mar 16, 2018

cc @nrc

@soc soc changed the title [WIP] Add support for platform-defined standard directories Add support for platform-defined standard directories Mar 17, 2018
Copy link
Member

@matklad matklad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this looks great to me @soc!

One thing I am really worried about is how are we going to test this =/

  • we want to test it across at least mac/windows/linux (and probably on windows, there's also msvc/mingw/sigwin/linux subsystem axis?)
  • we want to test different fallback scenarios
  • we want to test this in conjunction with rustup

All this together implies to me that plain #[test] tests ain't gonna work here at all :(

I am thinking about a really heavy weight solution, like preparing docker images for different initial state of the machines, and then writing tests as bash scripts, which install rustup, create cargo project, etc. But, one does not simply create a docker image for windows I guess 🤷‍♂️ ?

It's also interesting that this PR actually does two things:

  • it refactors Cargo to support CargoDirs instead of monolithic CARGO_HOME.
  • it changes default locations for stuff, using directories and environment variables.

I wonder if it makes sense to split this over two pull request, and implement a refactoring first, while preserving current behavior fully. That way, we can separately check that the refactoring does not introduce regressions by itself, and then maximally concentrate on the fallback bits.

@rust-lang/cargo

@@ -122,7 +143,7 @@ pub fn install(
if installed_anything {
// Print a warning that if this directory isn't in PATH that they won't be
// able to run these commands.
let dst = metadata(opts.config, &root)?.parent().join("bin");
let dst = metadata(opts.config, &Filesystem::new(root.config_dir))?.parent().join("bin");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be root.bin_dir I guess?

// This is written in the most straight-forward way possible, because it is
// hard as-is to understand all the different options, without trying to
// save lines of code.
pub fn cargo_dirs() -> CargoDirs {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, new would looks slightly better I think, though current option is OK as well!

let home_dir = ::home::home_dir().ok_or_else(|| {
format_err!("Cargo couldn't find your home directory. \
This probably means that $HOME was not set.")
}).unwrap();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think unwraps may panic in some obscure, yet real world scenario, so we really need proper error handlng. Changing the return type to CargoResult<CargoDirs> and replacing .unwraps with ? should do the trick I think?

As for an example of weird scenario, there were bug in Cargo about ::std::env::current_exe call failing, because Cargo was executed in chroot without procfs :)

let mut cache_dir = cargo_dirs.cache_dir().to_path_buf();
let mut config_dir = cargo_dirs.config_dir().to_path_buf();
let mut data_dir = cargo_dirs.data_dir().to_path_buf();
// fixme: executable_dir only available on Linux, use data_dir on macOS and Windows?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be figured out in the RFC thread perhaps?


// 3. .cargo exists
let legacy_cargo_dir = home_dir.join(".cargo");
if cargo_home_env.is_none() && legacy_cargo_dir.exists() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be strictly a fall-back perhaps? That is, if all variables are not set, we set all dirs from .cargo, in contrast with the current per-dir approach?

for current in paths::ancestors(pwd) {
let possible = current.join(".cargo").join("config");
for current in paths::ancestors(&dirs.current_dir) {
let possible = current.join(".cargo").join("config"); // fixme: what to do about this?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing to be done here? It's an explicit feture of Cargo that it looks for .cargo/config in call parent directories. This exists for per-project .cargo dirs. Or am I missing something here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think you are right. I just wanted to be extra careful by marking all changes where I wasn't absolutely sure.

@soc
Copy link
Author

soc commented Mar 17, 2018

@matklad Yes, testing is also a concern for me.

I think as a first step it would really help to have a list of different configurations and scenarios, so it is at least possible to test everything in an organized fashion, even if it is done manually.

I'm not sure whether splitting things into two commits makes sense, I feel that having an additional transitory state would have some benefits, but would also add overhead that would outweigh the benefits.

Especially because a similar PR needs to be done for rustup. With changes in one commit we would only have to test 4 different setups, {cargo: pre-change, post-change} * {rustup pre-change, post-change}.

With an additional state in the middle, this would balloon up to 9 different setups, for which all configurations and scenarios need to be tested against.

@matklad
Copy link
Member

matklad commented Apr 9, 2018

@soc have you managed to prepare a similar PR for rustup as well? I think updating rusupt would be a next step here, because we really do want to land changes to rustup and cargo simultaneously :)

@soc
Copy link
Author

soc commented Apr 10, 2018

@matklad Not yet, but here is my plan:

  • I could really need some review of the current code to get a better understanding if this is the way people want to go (especially in regard to Path vs. Filesystem).
  • I would be thankful if someone has some time to look into the remaining test failures with me. I can make tests pass, but sometimes I'm not sure my changes to the tests are testing what was originally intended.
  • Implement changes in rustup.
  • Write some documentation that explains the changes.
  • Fix the existing documentation.
    • Fix the official documentation.
    • Fix third-party tutorials, blogs, StackOverflow, etc.
    • Ping book authors.

@@ -235,9 +256,9 @@ fn install_one(
// We have to check this again afterwards, but may as well avoid building
// anything if we're gonna throw it away anyway.
{
let metadata = metadata(config, root)?;
let metadata = metadata(config, &Filesystem::new(dirs.config_dir.clone()))?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels like metadata could perhaps be a method of CargoInstallDirs?

pub cache_dir: Filesystem,
pub config_dir: Filesystem,
pub data_dir: PathBuf,
pub bin_dir: PathBuf,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great to add short docstrings here, to explain which directory stores which data.

@@ -32,22 +33,119 @@ use util::Filesystem;

use self::ConfigValue as CV;

#[derive(Clone, Debug)]
pub struct CargoDirs {
pub home_dir: Filesystem,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't actually use this field I think?

let home_dir = ::home::home_dir().ok_or_else(|| {
format_err!("Cargo couldn't find your home directory. \
This probably means that $HOME was not set.")
})?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So looks like we don't use home_dir for anything except fallback, so let's move this as far down as possible, so that we don't actually fail if HOME is not set, but explicit directories are! We might want to test this behavior as well: working without home directory is great for reproducible and isolated builds.

#[cfg(target_os = "macos")]
let _bin_dir = cargo_dirs.data_dir().parent().map(|p| p.join("bin"));
#[cfg(target_os = "windows")]
let _bin_dir = cargo_dirs.data_dir().parent().map(|p| p.join("bin"));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's extract this into a function

#[cfg(os = )]
fn bin_dir(dirs: &ProjectDirs) -> Option<PathBuf>

} else if let Some(val) = self.get_path("build.target-dir")? {
let val = self.cwd.join(val.val);
let val = self.dirs.current_dir.join(val.val);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use self.cwd() here for future-proofing.

let home_path = self.home_path.clone().into_path_unlocked();
let credentials = home_path.join("credentials");
let config_path = self.dirs.config_dir.clone().into_path_unlocked();
let credentials = config_path.join("credentials");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, and what is the "correct" place for credentials? config might be not the right place, because people sometimes publish it... I suggest raising this question on the RFC thread, if it wasn't raised already.

@@ -638,7 +736,7 @@ impl Config {
None => false,
};
let path = if maybe_relative {
self.cwd.join(tool_path)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.cws()

@@ -163,7 +163,7 @@ fn new_credentials_is_used_instead_old() {
execs().with_status(0),
);

let config = Config::new(Shell::new(), cargo_home(), cargo_home());
let config = Config::new(Shell::new(), CargoDirs::new().unwrap());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I think we want to point CargoDirs to CARGO_HOME inside test root.

One option is to provide another constructor for CargoDirs which accepts cargo_home: PathBuf. Another option is to modify the test such that it reads the config file directly. I am slightly in favor of the second approach.

@@ -910,7 +910,7 @@ fn build_script_needed_for_host_and_target() {
);
}

#[test]
//#[test]
fn build_deps_for_the_right_arch() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This actually passes for me locally

@alexcrichton
Copy link
Member

r? @matklad

@soc
Copy link
Author

soc commented Apr 14, 2018

@matklad The tests are green now. Does this look fine to you?

@brson
Copy link
Contributor

brson commented Apr 16, 2018

Moderation note: this is a blog post-length comment with a lot of technical detail interspersed with some unconstructive critiques. I'm keeping the content here for technical reference, but collapsing it by default. --@aturon

So, I understand you are doing this. Fine.

I encourage you to read all of this carefully. I know it's long.

I'm not here to try to persuade you against this - you have made up your mind. All my previous and carefully considered arguments are on the RFC thread, as well as in private emails that some of you have access to.

This thread contains speculation about how to modify rustup for this world, as well as code review, including what I see as bugs in this patch.

I suggest you think very hard about how this interacts with rustup and the ecosystem before turning this on. So far all I see on the RFC is handwaving about making rustup conform too.

It's not that easy.

Sure you can land this as is - and effectively no cargo installation anywhere will use it, because rustup explicitly controls CARGO_HOME. It will immediately benifit (sorta) distros - in that they will start putting data in places the OS likes, that most Rust users don't expect because most users use rustup.

rustup must support all toolchains going back forever, and it must do seamless upgrades. I personally have at one point put a lot of thought into this RFC, particularly regards to rustup compatibility, and concluded that it was not at all worth the effort or complexity. Of course I've forgotten the details, but I'm sure one of you has a private email I sent outlining my thoughts.

Here's a vague spitball outline of what has to happen in rustup to make this work. In the first pass I'm just going to assume the HOME/.rustup directory continues to exist, but obviously you are going to change that too. That will come with similar problems that I'll speculate about in a bit.

The easiest thing to do wrt to upgrading would be for upgraded existing rustup installs to just have rustup continue using the existing legacy paths after the upgrade (rustup controls all the relevant environment variables, including the ones in this patch) - they'll need some heuristic to decide to do this; leaving only new installs to deal with the new regime. You would need to take special account of non-self-upgrade (manual) installs that are overwriting legacy installs still. Not sure offhand what the differences are, so maybe there's just no way out of coding both fresh installs and upgrades.

a thing to consider re rustup custom installation and env vars

Today rustup uses RUSTUP_HOME and/or CARGO_HOME to determine the installation location, both on initial install and upgrade. Maybe that same strategy is ok in the new regime, even though there are even more env vars. Even today though, when a custom CARGO_HOME/bin ends up on the path, CARGO_HOME I think must be set properly for custom setups to work. It might make sense to write some little config file right inside the bin directory (so rustup can find it) explaining what the original installation thought the directory scheme was, and let that be overridden w/in rustup by all the env vars.

But during upgrades, since rustup invokes rustup to do the installation, the installer does have to read and obey all the environment variables. In new-world that will include CARGO_DATA_DIR etc.

I'm not going to describe it yet, but assume that RUSTUP_HOME gets the same treatment as CARGO_HOME, that is, it evolves to RUSTUP_CACHE_DIR, RUSTUP_DATA_DIR, etc. I know you have plans to merge rustup and cargo, but I doubt you are going to do that before this conversion. An alternative to making new RUSTUP_FOO env vars would be to have new-world rustup just use CARGO_DATA_DIR, etc., a step towards convergence. Either way, the logic below is similar.

Furthermore, under the current RFC (more-or-less) CARGO_HOME (and presumably RUSTUP_HOME) must be continued to be obeyed. So if installation sees CARGO_HOME and/or RUSTUP_HOME without the new-world env-vars, it will use them just as it does today, creating a legacy-layout installation.

So lets talk about the easy case, a fresh install of rustup. I'll discuss actually detecting the difference between a fresh install, a legacy upgrade, and a regular upgrade later.

a rustup fresh install w/ platform-specific schemes

  • this entire data type will need to be extracted to a crate so rustup can use it and determine the installation dirs.

  • probably the first thing to do is create all the CargoDirs (and rustup dirs) and confirm they are all writable, and if not rollback and print an error

  • rustup will install its bin to CargoDirs::bin_dir, as well as all the proxy bins, which hardlink to rustup.exe

  • rustup on Windows adds CargoDirs::bin_dir to the system PATH, like today, verifying
    that it's not already there.

  • on Unix it probably needs to scan PATH and decide whether whether a normalized CargoDirs::bin_dir aready exists in it. Probably also need to scan the appropriate shell script for the platform for our custom PATH-setting snippet. If neither exist then it needs to do what it currently does and frob some shell files to put it there (as long as whatever flag that disables path-setting is off).

  • create $HOME/.rustup, obeying $RUSTUP_HOME as necessary - at this point you can probably jettison the legacy logic that symlinks .multirust to .rustup.

  • install the selected toolchain to .rustup\toolchains

  • write env (the shell-environment config script) to CARGO_BIN_DIR

  • when running subtools, as it does today by setting CARGO_HOME (to prevent other cargo invocations from going bonkers), when rustup runs a subcommand it sets all of CARGO_CACHE_DIR, etc to the correct values, as determined by rustup (and also probably CARGO_HOME as well - more on this later).

  • for the sake of legacy tooling, you might aught to symlink .cargo/bin, .cargo/env, .cargo/git and .cargo/registry for at least a year, a slow deprecation. .cargo/bin will also need to be kept in
    sync with CARGO_BIN_DIR and CARGO_HOME/bin. Symlinks abound.

  • this is probably the time to stop symlinking ~/.multirust - it's been long enough, and as stated below in my review, this patch is already dropping some legacy hacks for finding CARGO_HOME.

  • (Optionally) As mentioned previously, drop a config file right next to the rustup bin, like rustup.dirs, that describes the directory layout rustup used at install time - it becomes the default directory set, overridden by the env vars, making custom-located installations much easier to manage with this new proliferation of directories.

As far as I can tell all the LD_LIBRARY_PATH, DYLD_LIBRARY_PATH and PATH munging in rustup stays the same.

Just as an aside, today it requires setting both RUSTUP_HOME and CARGO_HOME during install to control where a full Rust installation lives (and after this work it will require setting 4 envvars). It's probably worth exploring a new variable (or flag), RUST_HOME that just encapsulates everything - put the entire install in a single directory that can be blasted away at will. Combined with my previous suggestion of dropping a rustup.dir (or whatever) config file next to rustup.exe, and the automatic PATH augmentation to find rustup.exe, this would make simple, self-contained Rust installs - that's something I would value much more than this ongoing effort to spread rust components out all over the system, in a per-system way.

dealing with legacy toolchains

That all seems kinda easy so far, right? Yeah, well rustup has to also support every single stable, beta and nightly going back to 1.0, and none of those support anything but CARGO_HOME.

So you've still got to have rustup stash away a directory somewhere to represent .cargo for legacy applications. From this patch I might speculate the place for that is data_dir/oldcargo. (though because of Windows patch restrictions maybe just cargo - these new paths have longer names than the old paths so you're going to hit the windows path length limit sooner).

Now, as to how rustup should differentiate when toolchains should be using oldcargo vs the platform directories, it's actually easy if we can set both CARGO_HOME and CARGO_BIN_DIR, etc. at the same time without confusing the new toolchain. This patch basically does that, but I think it has some bugs and could be written better (described below in my review).

So speaking of data_dir/oldcargo, the installer is probably going to want to hardlink every single tool from both CARGO_BIN_DIR to oldcargo/bin (AKA the new CARGO_HOME/bin) and vice versa. That means during install, and probably by post-processing every cargo install / cargo uninstall command to keep the two directories in sync. There are almost certainly tools out there that expect CARGO_HOME/bin/ to contain some arbitrary thing. (Also maybe could just symlink the bin directory).

That's kinda all I can think of right now - not too difficult - as long as you consider my suggestion below about the precedence of CARGO_HOME and the new env vars. If you don't like that suggestion, then you might need to query the toolchain itself for new-world support before deciding how to set up the environment.

It's worth noting that e.g. CARGO_CACHE_DIR etc can change from invocation to invocation, and that can invalidate the symlinks set up during installation time for e.g. CARGO_DATA_DIR/oldcargo/registry, giving new and legacy tools an inconsistent view of the world.. For maximum correctness, rustup kinda needs to diff the contents of any of these environment variables it receives against the contents of the symlinks and rewrite the legacy symlinks on the fly; or just reconstruct oldcargo's symlinks every single time top-level rustup is run. Bleh, and expensive. Could possibly do it in a background thread while waiting to execute the real tool. It's an icky corner case. (But of course mutating the installation is a race against other rustup invocations - not that rustup is at all concurrent-safe today anyway).

It's also worth noting that, under this proposal CARGO_HOME is set (by rustup) to CARGO_DATA_DIR/oldcargo, and ~/.cargo is just a temporary convenience. So if somebody gives us a new CARGO_DATA_DIR, there's nothing really connecting that to ~/.cargo/whatever/ - though, probably you could compare CARGO_DATA_DIR/oldcargo's inode to CARGO_HOME's and if they are diffrent, rewrite CARGO_HOMEs contents. Bleh again.

Of course the frequency that a "newcargo" toolchain interacts with an "oldcargo" toolchain may be essentially never, and this doesn't matter.

Another thing to consider is that some outside force might set both CARGO_HOME / RUSTUP_HOME, and CARGO_newstuf. In this case, per my proposal, CARGO_HOME loses; the new scheme wins.
That might mean that rustup needsto modify the old-style envvars to point to the correct internal
directories, ala data_dir/oldcargo. Likewise it seems that data_dir/rustup needs to be treated in
a very similar manner - filled symlinks to the "real" location of rustup data. Yeah, pretty much, even though rustup itself always sets CARGO_HOME and RUSTUP_HOME sensibily, at every invocation
that also includes the new env-vars it needs to validate / rewrite those variables to point to the correct
internal directory for legacy content.

Now the hard case, a seamless upgrade:

a rustup upgrade

You'll need a way to detect a legacy vs. new layout - and you can't count on any Rust env vars being set, or necessarily meaning anything sensible - e.g. in the case someone manually downloads an upgrade, or trying to set only CARGO_CACHE_DIR, etc. while doing a legacy upgrade.

You'll need to keep the legacy home-dir discovery logic in-tree for detecting the upgrade.

I don't have a full suggestion offhand. Probably the safest thing is to drop a marker file somewhere during new-world installation that means 'new layout'. Could also put it into rustups settings.toml, assuming you are doing the similar rustup upgrade at the same time and symlinking ~/.rustup.

Ok, yeah, deciding if it's a legacy upgrade or a fresh-install/modern-upgrade might look like

  • If RUSTUP_HOME is defined, but no new-RUSTUP/CARGO vars it's a legacy-style install
  • Use the legacy RUSTUP_HOME/CARGO_HOME discovery logic to see if installation exists there
  • Also use the new directory discovery logic to discover if things exist there
  • If stuff exists at both places, then check whether CARGO_HOME contains the upgrade marker (we're going to leave temporary ~/.cargo symlinks around for compat), and they aren't just symlinks to each other.
  • If nothing exists at the new locations but does at the old locations, it's a legacy upgrade.
  • If stuff exists at both places, but there's no upgrade marker then tableflip - bad case.

Edit: OK, the more I think about it, the more a typical self-upgrade is going to look like RUSTUP_HOME/CARGO_HOME, with no way of knowing whether rustup should or should not do the legacy upgrade to the new format (unless it sees additional new env vars that must have been user-provided): somebody could be explicitly telling rustup it wants to use old-style self-contained layout. So basically, the standard self-upgrade will leave the existing layout in place. Maybe every few days it should print a message saying "Hey - do you want to upgrade to the new Rust directory structure?", with a new command that does the upgrade.

If that's the case, then I think even a manual over-install that says nothing about RUSTUP_HOME/CARGO_HOME should probably just not touch the layout and let the user deal with it later.

Something like that...

Actual install procedure:

  • install, upgrade, and uninstall all need full rollback capabilities, and not just with files, but like with windows registry where PATH is stored. rustup has some for toolchains, that has definitely worked in the past, but may be need to be extended and robustified.

  • rustup is replaced - at present there is no pre/post-install script, so it just sits there in .cargo/bin until somebody executes it. As a prerequisite somebody could add pre/post-install scripts to cargo bin, but you still can't rely on them because rustup always updates to the latest version, so the running version may not have pre/post script capability against the new revision. Otherwise, all rustup changes happen lazily on first invocation of any rustup proxy (so far they've been small, but this one is huge).

  • If you felt like you needed pre/post scripts you would need to do more work yet to teach the new rustup to bail gracefully if it wasn't asked to run a pre-install script, maybe even downloading the last rustup known to support pre/post scripts, installing that, then installing itself.

  • Regardless at some point rustup gets the opportunity to completely change the entire world, a fraught process full of potential errors and surprises.

  • Right now the upgrade process is non-interactive, but you might want to print a screen saying that major shit is about to go down, please confirm.

  • do a sanity check that none of the old-and-new-directories, e.g. CARGO_HOME/bin and CARGO_BIN_DIR etc. are equivalent inodes (they may be symlinks) - that's a rare bonkers situation and should probably just bail.

  • create all the destination folders ahead of time and confirm they are writable.

  • rustup contains a module with filesystem rollback logic on failure that it uses for installing toolchains. I don't know how great it is, but I've seen it work, and you probably want to do all the following with it. Though you might mostly end up shuffling directories around, not files - in either case you need to be able to rollback.

  • Make the new location to replace CARGO_HOME/bin and move every file into it, the same for
    CARGO_HOME/git, etc.

  • Do the bin syncing/mirroring routine described above where the contents of CARGO_HOME_BIN and oldcargo/bin are brought into alignment (or set up a symlink between the two)

  • Note that you are going to hit the rustup self replacement problem on windows, that the self-installer does acrobatics to work around. You are going to have to do something special to move CARGO_HOME/bin/rustup.exe to CARGO_BIN_DIR. Probably the best thing to do is go to the trouble of teaching rustup to do pre-install / post-install scripts as mentioned earlier, making sure that the change-the-world rustup can only be invoked by an install-script aware previous rustup. That should avoid the entire windows-self replacement problem altogether - the old rustup puts new-rustup in a temporary location, runs the pre-install script, then just terminates, making itself replaceable. (Edit: oh oh the downside to this is once the original process terminates, nobody is waiting for the self-updater to terminmate - so failure is basically not reported correctly - need to think harder about it). Edit: per one of my comments above, there's really no way to know during a typical self-upgrade that rustup should do the legacy->new conversion - all it'll see is RUSTUP_HOME/CARGO_HOME, and it may be
    better to just leave the current layout in place, with nag screens occasionally suggestion to run an upgrade command.

  • Create all the symlinks for legacy CARGO_HOME/bin, etc. as during install, for the sake of tooling that is left behind, to be removed someday.

  • Delete any legacy ~/.multirust symlink.

  • You may or may not need to drop symlinks into ~/.rustup/whatever for temporary backcompat. Not sure if anybody is poking around in that dir. I'd lean toward no.

  • Frob paths. You are going to need to use the legacy logic for determining CARGO_HOME. On windows, remove the legacy path, add the new path. On Linux, search the appropriate shell files for the PATH stub (with legacy directory) and remove it. Like during install, check "PATH" to see if the new directory is already on the path (like it might be on Fedora), otherwise splat something into the appropriate shell script.

oh, what about making the .rustup directory obey this new scheme?

rustup seems to contain these directories:

  • downloads
  • settings.toml
  • tmp
  • toolchains
  • update-hashes

In the same crate you extract all that CargoDirs logic to, you should probably do similar for
rustup.

  • downloads -> data_dir
  • settings.toml -> config_dir
  • tmp -> cache_dir
  • toolchains -> data_dir
  • update-hashes -> tmp_dir

And, again, as a step toward merging cargo and rustup, you could just use reuse CargoDirs instead of creating a new RustupDirs.

The actual upgrade process should be similar to that for .cargo described above - just move directories around with rollback. Both the .cargo and .rustup upgrade should be done in one transaction probably.

Consider leaving ~/.rustup symlinks in place temporarily, though probably much less necessary than ~/.cargo.

uninstall

Uninstall should be trivial - just check all the possible places where stuff could be and delete them. That includes the legacy ~/.cargo symlinks. Not sure if the current system has rollback or not, but would be nice.

Uninstall does have the windows self-deleting problem again, but there's existing code to deal with it.

dealing with the ecosystem

You might aught grep every project that uses CARGO_HOME and figure out what to do with them. The important ones need to be upgraded before this stuff is deployed. You also need to coordinate with editors like IntelliJ. Presumably, a major rustup upgrade that blocks on user input would be very bad for some tools like IntelliJ. There are probably a lot of people automating rustup upgrades - so an interactive upgrade might be out of the question - maybe an upgrade that says "decided not to upgrade for some reason, do this instead to confirm"... not sure. Alternately, just make the upgrade perfect and bug-free, and don't worry about it.

Though, if you use the scheme I've described here where rustup sets both the legacy and new env vars, and you symlink .cargo's directories for a deprecation period and the new data_dir/oldcargo (forever), you can probably get away with a lot, and everything should just magic.

some patch reviews

Finally, I don't see much going on in this patch w/r/t the test suite. This patch is adding a huge number of variations to what is posible with cargo data storage. Is the test suite continuing to use CARGO_HOME? Is it just using the live per-platform directories?

There should basically be tons of new tests here, testing in all the various new combinations. It should not be testing via the live platform-specific directories (that's how you break people's computers), so I would expect tests to be mocking out those directories with CARGO_CACHE_DIR etc, and I don't see that. Pretty much every test in the test suite would best be parameterized over [CARGO_HOME || THE_NEW_STUFF], but also tested in cases where only some of the new env vars are set. From the code it looks like any of the new CARGO_FOO env vars that aren't set fallback to the CARGO_HOME equivalent (though the docs don't seem to be explicit about this, and I mentioned before I think this is incorrect), so all those weird combinations seem best to be tested (and I think this is subtly the wrong scheme - described below).

Furthermore, find_bin_dirs appears to only be defined for linux, mac and win; so other platforms that try to use the new system are litterally just going to exit with "couldn't find the directory in which executables are placed". This seems less than desirable. Known cargo-aware platforms this seems to not work for include FreeBSD and NetBSD.

a trivial nit

This unconditionally calls ProjectDirs::from with the name "Cargo". Capitalized project names are common on Windows, but not on Unix. I don't see any on my WSL system, though it's not overly populated. Whatever.

more patch reviews and a backcompat solution for rustup

The code for setting up the various CargoDirs fields looks suspicious to me - if CARGO_HOME is not set, but only some of the new CARGO_* vars are, then it looks like the others will be equal to PathBuf::default(). It would probably be more reliable to preinitialize all the dirs with the system dirs as fallback, then override them with the env vars.

It looks to me that the directory discovery code does not do what the op says:

  • use the CARGO_HOME environment variable if it exists (legacy)
  • use CARGO_CACHE_DIR, CARGO_CONFIG_DIR etc. env vars if they exist
  • use the ~/.cargo directory if it exists (legacy)
  • follow operating system standards

Instead it's more like

  • use CARGO_CACHE_DIR, etc if they exist
  • where they don't use a dir based on CARGO_HOME
  • if no env vars exist use ~/.cargo if it exists
  • otherwise use what the directories crate suggests

That is, all directories are initialized via CARGO_HOME if it exists, then for any "new" environment variable that exists, the corresponding directory is overwritten, using the legacy directory for missing env vars. I think this is probably wrong.

I'd suggest the following algorithm to fix the aforementioned bug and retain compatibility with simultaneously setting CARGO_HOME plus all the new env vars, making rustup seamlessly work with all past toolchains:

  • initialize all the directories to whatever ProjectDirs says - this is the base case if anything else goes wrong
  • if any of the new env vars are set, set them individually. Return.
  • if CARGO_HOME is set, set all the dirs as appropriate. Return.
  • if HOME/.cargo exists, use that. Return

And that way rustup can set CARGO_HOME plus CARGO_CACHE_DIR, etc. always, and legacy toolchains will just work, and new toolchains will just work, with both legacy and new rustup.

The only downside I can see to this is if some oblivious tool somewhere in the stack tries to override CARGO_HOME for the new toolchain, or use CARGO_HOME to figure something out about cargo. I can imagine that is the rationale for the original op's desire to prefer CARGO_HOME over new-env-vars. But check this out:

OK, ok, here's a rustup solution that would make everything compatible with everything under rustup:

I've already said rustup should symlink the contents of ~/.cargo through a deprecation period. I also said that rustup should maintain a data_dir/oldcargo directory for legacy toolchains. Well, that data_dir/oldcargo directory should just contain symlinks to the new-world directories under the system directory - that way every version of cargo is looking at exactly the same files (assuming they can follow symlinks... need to test that), and it doesn't matter whether any particular component (under rustup) is using CARGO_HOME or new-env-vars.

more patch review

The library this depends on directories, uses SHGetKnownFolderPath, a Vista+ function. I don't know if FX or anybody compiles Cargo for windows or not, but it's something to be aware of.

This patch drops backwards-compatibility code for finding CARGO_HOME on windows, and under multirust.

That is probably fine since generally rustup tells cargo where to find it's stuff, and it was quite a long time ago that that change happened.

rustup has similar, but more complex backcompat code for finding RUSTUP_HOME, but it may be ok to drop that as well.

installer changes

So there are 4 env vars now:

  • CARGO_CACHE_DIR
  • CARGO_CONFIG_DIR
  • CARGO_DATA_DIR
  • CARGO_BIN_DIR

These are nominally needed for communication between toolchain components. (I'm just going to assume that all the rustup components are ultimately folded into these four directories).

CARGO_HOME and RUSTUP_HOME are also passed through the environment, but should be derived from the other four, for legacy toolchains.

These should all be exposed through the rustup interactive and non-interactive installer, defaulting to the platform locations, plus probably a "self-contained" option that puts everything in one directory (which I would value much more than the platform-independent scheme under development), and as suggested earlier, recorded in a config file next to rustup.exe so it can "just work" in cases where these values are not the platform-determined values.

The installer would probably present three options when it came to installation dirs:

  • Default (platform dirs)
  • Self-contained (pick a single dir)
  • Customize (specify all four values)

I am, upon further consideration, suspicious of the need for a human to ever impose different values on these from run to run. Sure it's easy to imagine use cases for using a different cache dir for a particular build - bit are they real use cases? Certainly today it's not possible, and it's never bothered me.

Again, I just read the RFC and it continues to be ridiculously underspecified. The motivation of the RFC doesn't say anything about needing to customize these environment variables per-run.

I guess it's possible for tools to care about them, like some presumably care about the value of CARGO_HOME, but stilll, it's only rustup (or some other outside force) that exposes these envars - not cargo itself, so you don't get access to these unless you are running rustup. It's very possible to design an equivalent system with none of these envvars at all, though I guess it's easiest to pass these configuration items down a long chain of tools by stuffing them into the environment.

But these things seem like "installation-time" options that you don't ever touch again. They may be necessary internally for rustup to communicate to toolchain components, but do humans actually want to mess with these?

For example, I've got CARGO_BIN_DIR on my PATH and I write CARGO_BIN_DIR=foo cargo - WTF does that mean? it's just nonsense. Subsequent sub-invocations are going to be looking in the absolutely wrong place to find any usefully-compatible tools.

I can possibly see a use case where some complex tooling wants to invoke multiple Rust installations - cargobomb for example I think does this, and modifies CARGO_HOME mid-execution.

Whatever.

in closing

I emplore you put this entire change behind a feature-flag environment variable, do the rustup change behind the same environment variable, change whatever downstream tools need to understand this, then test things real, real good before deploying this.

Again, I know you know my opinion that this is a huge amount of complexity and effort for very little practical advantage.

FWIW I'm available at hourly rates to make all these fine details work correctly, my lovelies.

glhf

cc @alexcrichton @aturon @nrc @matklad

Edit: @matlkad indicated below that we always want CARGO_HOME to indicate "put all the Rust bins in one place", which nullifies much of the simplication with rustup here, allowing CARGO_HOME to coexist with the new variables. It is also contrary to this patch, which gives XDG variables precedence over CARGO_HOME (and which I still believe has bugs otherwise, as noted).

@retep998
Copy link
Member

The library this depends on directories, uses SHGetKnownFolderPath, a Vista+ function. I don't know if FX or anybody compiles Cargo for windows or not, but it's something to be aware of.

Considering Rust only supports Windows 7 or newer as the host operating system, this is a non-issue.

@soc
Copy link
Author

soc commented Apr 16, 2018

@brson Thanks for the notes, they provide a lot of hints at use cases that need to be carefully considered and sorted out.

Some comments on the straight-forward parts (comments on other parts as I come around to them):

So far all I see on the RFC is handwaving about making rustup conform too.

I don't have write-access to the RFC, so I can't keep it in sync with the changes implemented in this PR.
There are substantial differences between the RFC and the implementation, and it is best to read the code, as it doesn't make much sense to create yet-another document that people have to cross-reference.

it's actually easy if we can set both CARGO_HOME and CARGO_BIN_DIR, etc. at the same time without confusing the new toolchain [...] And that way rustup can set CARGO_HOME plus CARGO_CACHE_DIR, etc. always, and legacy toolchains will just work, and new toolchains will just work.

That's a good idea.

So you've still got to have rustup stash away a directory somewhere to represent .cargo for legacy applications. From this patch I might speculate the place for that is data_dir/oldcargo.

I'd probably just let them use .cargo forever and adjust the logic of what happens if both .cargo and .config/cargo etc. similar to the suggestion you made about supporting both CARGO_HOME and CARGO_..._HOME.

It would probably be more reliable to preinitialize all the dirs with the system dirs as fallback, then override them with the env vars.
It looks to me that the directory discovery code does not do what the op says:

This was probably lost in translation ... it was suggested to avoid computing the system paths until it is determined that they are needed, that's also why the code is more complicated than necessary.

Furthermore, find_bin_dirs appears to only be defined for linux, mac and win; so other platforms that try to use the new system are litterally just going to exit with "couldn't find the directory in which executables are placed". This seems less than desirable. Known cargo-aware platforms this seems to not work for include FreeBSD and NetBSD.

Yes, that's an issue. Just got some suggestions for the directories library today to make sure it works on more operating systems than just the "big three". After this is addressed, I'll update that method accordingly.

This unconditionally calls ProjectDirs::from with the name "Cargo". Capitalized project names are common on Windows, but not on Unix.

The name is automatically adjusted based on platform rules, and will be cargo on Linux.
ProjectDirs::from("org", "Rust-Lang", "Cargo") or something would have been even more "correct", but I didn't want to push my luck too far.
Rust devs should just let me know how they want the Cargo folder to look like on each platform, and I'll adjust it accordingly.

I suggest you put this entire change behind a feature-flag environment variable, do the rustup change behind the same environment variable, change whatever downstream tools need to understand this, then test things real, real good before deploying this.

Yes, that's the plan.

this is a huge amount of complexity and effort for very little practical advantage

Yep, paying down technical debt is never easy. It would have been way less painful if the changes were made before 1.0, as people suggested, but that ship has sailed. :-/

Again, thanks a lot for all the information and helpful suggestions!

@brson
Copy link
Contributor

brson commented Apr 16, 2018

Considering Rust only supports Windows 7 or newer as the host operating system, this is a non-issue.

@retep998 This is not correct. Rust supports XP in custom configurations. The forge indeed doesn't have a check-mark for Cargo on XP.

@brson
Copy link
Contributor

brson commented Apr 16, 2018

Thanks for the responses @soc.

I'd probably just let them use .cargo forever and adjust the logic of what happens if both .cargo and .config/cargo etc. similar to the suggestion you made about supporting both CARGO_HOME and CARGO_..._HOME.

I thought the whole point of this exercise was to excise ~/.cargo. I don't see that as being any easier, since none of the new env vars give any clue where to find it, and it can't be derived if the rustup bin lives somewhere else, and the PATH points to the rustup bin inside one of the new directories.

This was probably lost in translation ... it was suggested to avoid computing the system paths until it is determined that they are needed, that's also why the code is more complicated than necessary.

As I mentioned, afaict there's a bug in the code where some of the paths can be uninitialized as written.

@alexcrichton
Copy link
Member

I'm going to close this because it's been stale and quiet for quite some time now unfortunately. The Cargo team is pretty tied up until after the 2018 edition, but I think we can perhaps look to help out integrating this and fixing remaining issues after the edition release.

@flying-sheep
Copy link

Hi! Rust 2018 is here, it’s a fresh new year! Time to get this rolling 😄

@flying-sheep
Copy link

Uh, how did I manage to unassign @matklad just by commenting‽

&& cargo_cache_env.is_none()
&& cargo_config_env.is_none()
&& cargo_data_env.is_none()
&& cargo_bin_env.is_none() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem to handle the case where some but not all of the environment variables are set.

@joshtriplett
Copy link
Member

joshtriplett commented Jan 9, 2019

@soc Let's see if we can get this revived.

I posted one comment for a corner case this doesn't seem to handle.

Could you please update this to fix the conflicts, and ensure that it still passes tests?

And could you please add tests for the various configuration cases (existing legacy configuration, existing XDG configuration, both, no existing configuration), to make sure they all work as expected?

@FranklinYu
Copy link

I don’t like the solution for macOS. See dirs-dev/directories-rs#47.

@soc
Copy link
Author

soc commented Mar 31, 2019

@joshtriplett Sorry for the late reply.

I think my changes so far are flawed in the sense that all the existing logic should be retained as-is, otherwise it becomes really really hard to make sure the current behavior is the same without flipping the hypothetical switch to the new structure.

Sadly, at the moment I have little time for this (and the website shenanigans didn't help either).

I can offer some advice, though: As a first step, identify each an every place in cargo, rustup, etc. that uses the current structure and add an explicit branch with a check for the proposed -Z flag there.

Then test that all tools reach the right branch when the flag is set/not set.

Only at this point it makes sense to even start working on the new structure

bin_dir = legacy_cargo_dir.join("bin");
// 4. ... otherwise follow platform conventions
} else {
let cargo_dirs = ProjectDirs::from("", "", "Cargo");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, in Miri we will likely use ProjectDirs::from("org", "rust-lang", "miri") to get more descriptive names on platforms that commonly do that. Might be a good idea to do the same here?

@alexcrichton
Copy link
Member

I'm going to close this for now because it's been languishing for some time, but if someone is willing to take this up again and resubmit it the Cargo team would be interested in finding a reviewer for it!

spacekookie added a commit to spacekookie/cargo that referenced this pull request Apr 1, 2020
This commit is a continuation and adaptation of rust-lang#5183, which aimed to
make cargo no longer reliant on the `$HOME/.cargo` directory in user's
home's, and instead uses the `directories` crate to get
platform-defined standard directories for data, caches, and configs.

The priority of paths cargo will check is as follows:

1. Use `$CARGO_HOME`, if it is set
2. Use `$CARGO_CACHE_DIR`, `$CARGO_CONFIG_DIR`, etc, if they are set
3. If no environment variables are set, and `$HOME/.cargo` is present,
   use that
4. Finally, use the platform-default directory paths
@soredake
Copy link

Any progress on this?

@luis-guimaraes-exoawk
Copy link

Any updates?

@LucasFA
Copy link

LucasFA commented Feb 13, 2024

For anyone interested, there is currently a pre-RFC in the forums:
https://internals.rust-lang.org/t/pre-rfc-split-cargo-home/19747

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.