Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cargo doesn't uniformly read configuration from env vars #5416

Open
alexcrichton opened this issue Apr 25, 2018 · 15 comments
Open

Cargo doesn't uniformly read configuration from env vars #5416

alexcrichton opened this issue Apr 25, 2018 · 15 comments
Labels
A-configuration Area: cargo config files and env vars A-environment-variables Area: environment variables C-bug Category: bug S-needs-design Status: Needs someone to work further on the design for the feature or fix. NOT YET accepted. Z-advanced-env Nightly: advanced-env

Comments

@alexcrichton
Copy link
Member

Most configuration read through cargo uses helpers like Config::get_string which automatically read env vars, but configuration reading portions of Cargo that go through Config::get_table do not read environment variables. The return value here is a HashMap which doesn't proxy reads to an environment variable.

This affects, for example, source replacement. Source replacement is not configurable through environment variables (a bug) because of its use of get_table.

We should tackle this via one of two routes:

  • First, remove get_table entirely. Just don't expose the ability to read a table and force all configuration reads to go through get_string and friends. This may or may not be uniformly possible as IIRC there's portions of Cargo that rely on iter which is a bit of a bummer.
  • Second, change get_table to return a wrapper rather than a HashMap. Reads of the HashMap would "do the right thing" with environment variables. Again though it's not clear how such a wrapper would implement iter as well...

In any case, some questions to still work through here!

@alexcrichton alexcrichton added the C-bug Category: bug label Apr 25, 2018
@rtsuk
Copy link

rtsuk commented Apr 25, 2018

I'm going to take a whack at the "remove get_table" route.

@rtsuk
Copy link

rtsuk commented Apr 25, 2018

After further discussion I'm not going to try this approach right now, instead musing about an alternate escaping of period and hyphen so that iteration over environmental variables could be unambiguous.

@rtsuk
Copy link

rtsuk commented Apr 30, 2018

After pondering it for a few days I don't think there is such an alternate escaping. I think that it is not possible to support specifying an arbitrary TOML table via environmental variables. The space of key names in TOML is too rich to be encoded in the very limited character set allotted to environmental variables.

If one wanted to support overriding source replacement from environmental variables, one approach would be to use an array of tables where the property names can be defined to be unambiguous when converted to environmental variables. An individual table could have an index number in its name.

So instead of:

[source.crates-io]
replace-with = "vendored-sources"

[source.vendored-sources]
directory = "../third_party/rust-crates/vendor"

one would write:

[[sources]]
name = "crates-io"
replacement = "vendored-sources"

[[sources]]
name = "vendored-sources"
directory = "../third_party/rust-crates/vendor"

The environmental variable version of this would be:

CARGO_SOURCES_001_NAME = "crates-io"
CARGO_SOURCES_001_REPLACEMENT = "vendor-sources"
CARGO_SOURCES_002_NAME = "vendored-sources"
CARGO_SOURCES_002_DIRECTORY = "../third_party/rust-crates/vendor"

One would then implement get_table_array() and have it create an array of tables from environmental variables as described above, falling back to the config TOML in the same manner as other environmental variables.

@alexcrichton
Copy link
Member Author

@rtsuk sounds plausible to me! In general I'd be totally on board with basically reworking how [source] configuration is specified so it's more amenable to reading it through environment variables.

Another possibility would be something like CARGO_SOURCES_ALL=comma,separated,list,of,names and that could inform Cargo which ones to read from env vars perhaps?

@ehuss
Copy link
Contributor

ehuss commented May 10, 2018

Would it be possible for Config to iterate over the environment so that get_table can stay the same? For example, get_table("source") would fetch all CARGO_SOURCE_* environment variables, split the key by _, and inject all those values into the result.

@ehuss
Copy link
Contributor

ehuss commented May 10, 2018

Nevermind - obviously it wouldn't know the correct type for each variable.

I'm liking the idea of making a serde Deserializer that can handle environment variables. I made one a few days ago, but it was more basic and ignored environment variables.

@alexcrichton
Copy link
Member Author

Yeah I do think that we can transparently read env vars (given enough support) for specific keys, the problem for source replacement though is that we can't iterate over tables which are defined entirely in env vars because we can reconstruct a query for a specific key into an env var but we can't take an env var and say what key/table it corresponded to

@matklad
Copy link
Member

matklad commented May 11, 2018

I wonder if we need to provide a giant escape hatch like CARGO_CFG_TOML or CARGO_CFG_JSON, which just accepts a toml/json string with the contest of .cargo/config?

The benefit of the current setup is that a human can easily override config on the command line. However looks like to specify tables the syntax would need to be ugly and not human friendly anyway, so perhaps just providing a giant TOML escape hatch would work better for those who need it?

@alexcrichton
Copy link
Member Author

Not a bad idea! TOML generally is dependent on newlines which doesn't jive well with env vars, but I'd be fine accepting either

@ehuss
Copy link
Contributor

ehuss commented May 15, 2018

I have a prototype that auto-converts with serde here: ehuss@63f76b0

Essentially you can do something like let p: TomlProfile = config.load_type("profile.dev") and it will auto-magically convert the config (including environment variables). It supports structs, arbitrary maps, optional values, etc.

If you'd like, I can continue working on this and submit a PR (just the serde part, no actual changes to Cargo, yet). I realize it is a lot of code, so maybe it's not worth it (I'm no serde expert, so some parts can probably be simpler). However, I think it would be helpful for the Profile work I'm doing, and could be used for the source map, and anything else in the future. Overall I'm not sure how useful environment variables are.

Things left to do:

  • Clean up and finish unimplemented types. Possibly use macros to reduce the bloat.
  • Better error handling. Include the filename of any bad configs.
  • Write real tests.

Possible enhancements:

  • Support TOML in environment values. I think this would be possible to support for arbitrary keys (CARGO_PROFILE_DEV="..."). It could deduce that it needs a rich data type and parse as-needed.
  • Support lists. Not sure what the preferred way to handle env vars would be. Perhaps just TOML-style lists?
  • Underscores aren't supported for environment variable map keys (but they are supported for structs like OPT_LEVEL). So you can't specify a package with a name like ansi_term. Not sure if there is any way around that.

@alexcrichton
Copy link
Member Author

Nice!

I think I'd personaly be in favor of going down such a route, it'd probably remove almost all of our methods elsewhere I think as well if we tried hard enough :)

For TOML-in-env I think we'll want to hold off on that for now, but I could imagine perhaps not the bare key but something like CARGO_PROFILE_DEV_TOML='...' and I actually really like the idea of using toml to specify arrays!

@ehuss
Copy link
Contributor

ehuss commented May 19, 2018

PR for just the config part posted at #5552.

@ehuss
Copy link
Contributor

ehuss commented Jun 4, 2018

@rtsuk Are you still working on this?

At least for source replacement, another option is to use the -Z advanced-env experimental option. It would allow you to specify source replacement like this:

CARGO_SOURCE_crates-io_REPLACE_WITH="another-source"
CARGO_SOURCE_another-source_DIRECTORY="vendor"

It only supports letters/numbers/dashes for the source name (no underscores). Also, the use of mixed upper/lowercase is a little unconventional and weird looking, but works in all environments I'm aware of.

Another option is to extend it to support inline tables like this:

CARGO_SOURCE="{crates-io={replace-with='another-source'}, another-source={directory='vendor'}}"

Just some alternate ideas.

@rtsuk
Copy link

rtsuk commented Jun 4, 2018

@ehuss I am not. My personal need for it has passed and it looked very hairy. Should the need arise again, though, I'll check out the advanced-env option.

@ehuss ehuss added the A-configuration Area: cargo config files and env vars label Feb 21, 2019
@ehuss ehuss added Z-advanced-env Nightly: advanced-env A-environment-variables Area: environment variables labels Sep 21, 2019
bors added a commit that referenced this issue Dec 19, 2019
Config enhancements.

This is a collection of changes to config handling. I intended to split this into separate PRs, but they all built on one another so I decided to do it as one. However, I can still split this up if desired.

High level overview:

- Refactorings, mainly to remove `pub` from `Config::get_table` and use serde API instead.
- Add `--config` CLI option.
- Add config `include` to include other files.

This makes some progress on #5416.
Closes #6699.

This makes a minor user-visible change in regards to `StringList` types. If an array is specified in a config as a list, and also as an env var, they will now be merged. Previously the environment variable overrode the file value. But if it is a string, then it won't join (env var takes precedence). I can probably change this, but I'm not sure if the old behavior is desired, or if it should merge all the time?

**Future plans**
This lays the groundwork for some more changes:
- Work on #7253 (`debug-assertions` and `debug` fails in environment vars). I have some ideas to try.
- Consider removing use of `get_list` for `paths`, and use a `Vec<ConfigRelativePath>`. This will require some non-trivial changes to how `ConfigSeqAccess` works. This is one of the last parts that does not use the serde API.
- Possibly change `[source]` to load config values in a lazy fashion. This will unlock the ability to use environment variables with source definitions (like CARGO_SOURCE_CRATES_IO_REPLACE_WITH).
- Possibly change `[profile]` to load config profiles in a lazy fashion. This will make it easier to use environment variables with profiles, particularly with arbitrarily named profiles.
- Possibly remove the case-sensitive environment variables in `-Zadvanced-env`. I think they are just too awkward, and prone to problems. Instead, drive people towards using `--config` instead of env vars.
- Add support for TOML tables in env vars (like `CARGO_PROFILES={my-profile={opt-level=1}})`). I started implementing it, but then looking at the use cases, it didn't seem as useful as I initially thought. However, it's still an option to try.

**Refactoring overview**

- `[source]` table now uses the serde API.
- `[target]` table now uses the serde API. This is complicated since the 'cfg()' entries are different from the triple entries. The 'cfg()' tables are loaded separately, and are accessed from `Config::target_cfgs`. Otherwise, it just uses `config.get` of the specific target.TRIPLE.
    - Moved the target config stuff into `config/target.rs`.
- Various changes to make this work:
    - Added `PathAndArgs` type which replaces `config.get_path_and_args`.
    - Changed `ConfigKey` to track the key parts as a list (instead of a string). This fixes an issue where quoted keys weren't handled properly (like `[foo.'a.b'.bar]`). This also seems to make a little more sense (it was joining parts into a string only to immediately call `split` on it). Changed various APIs to take a `ConfigKey` object instead of a string to avoid that splitting behavior.
    - `ValueDeserializer` now pre-computes the `Definition` so that it can provide a better error message when a value fails to deserialize.

Overall, there shouldn't be significant user-visible changes. Some error messages have changed and warnings have been added for some ignored keys. `-Zadvanced-env` now works for source and target tables, though I'm not really happy with that feature.
@gilescope
Copy link
Contributor

Flattening all of toml to environment variables seems a big ask. We'd get a lot of utility if env vars in the config were expanded:

[source.vendored-sources]
directory = "$A_DIR/vendor"

(and %A_DIR% on windows)

Is that an option that could be considered?

@epage epage added the S-needs-design Status: Needs someone to work further on the design for the feature or fix. NOT YET accepted. label Oct 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-configuration Area: cargo config files and env vars A-environment-variables Area: environment variables C-bug Category: bug S-needs-design Status: Needs someone to work further on the design for the feature or fix. NOT YET accepted. Z-advanced-env Nightly: advanced-env
Projects
None yet
Development

No branches or pull requests

6 participants