Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unify std and no_std APIs for IndexMap and IndexSet #207

Closed
wants to merge 1 commit into from

Conversation

kevinaboos
Copy link

Both IndexMap and IndexSet restrict access to their new() and with_capacity() methods behind the has_std feature gate.
Turns out, this isn't actually required because we can use DefaultHashBuilder from the hashbrown crate for the default value of the S: BuildHasher type parameter.

This unifies the API for downstream users of this crate that also support both std and no_std environments, which significantly improves ergonomics and simplifies code.
I have also modified the documentation to reflect these changes.

Note: I believe we could also remove the #[cfg(has_std)] restriction on the two macros indexmap!() and indexset!(), but I didn't include those changes here yet. I can add them for consistency's sake, if you'd like.

by using `DefaultHashBuilder` for the `S: BuildHasher` parameter.
@kevinaboos
Copy link
Author

For more info about the motivation behind this changeset, here's an example.

Assume we start with this original code, which is std-only:

let mut my_map = indexmap::IndexMap::new();

Before this PR

Downstream crates require tedious changes to support both std and no_std environments that depend on indexmap, which are unergonomic and make the code less readable.

At the crate root:

#[cfg(feature = "std")]
type IndexMap<K, V> = indexmap::IndexMap<K, V>;
#[cfg(not(feature = "std"))]
type IndexMap<K, V> = indexmap::IndexMap<K, V, hashbrown::hash_map::DefaultHashBuilder>;

And then, we must make these changes everywhere indexmap is used:

#[cfg(feature = "std")]
let mut my_map = indexmap::IndexMap::new();
#[cfg(not(feature = "std"))]
let mut my_map = crate::IndexMap::with_hasher(hashbrown::hash_map::DefaultHashBuilder::new());

Such example crates include object, several crates in the wasmtime project, and probably many more that I'm unaware of.

After this PR

No changes are required to the original code at all. 🎉

@cuviper
Copy link
Member

cuviper commented Nov 19, 2021

At a high level, I don't want to lock an external crate dependency in our public API, especially pre-1.0. That is, hashbrown's default is just an alias for ahash::RandomState (0.7), and we shouldn't lock that into our own 1.0 API. So if anything, we should make it something in our crate like:

#[cfg(has_std)]
pub use std::collections::hash_map::RandomState as DefaultHashBuilder;

#[cfg(not(has_std))]
pub struct DefaultHashBuilder(ahash::RandomState);
// It could privately use any no_std choice internally with forwarding impls,
// or just write our own simple hasher to avoid any dependency.

pub struct IndexMap<K, V, S = DefaultHashBuilder> { ... }

I'm not sure how safe it is for us to toggle the S default in the API at all though. Generally, it would be a breaking change to have different S based on features, because one crate may depend on us with different features than another crate does, each expecting different default S, but that all gets unified by cargo.

But it's weird because we detect has_std and we have a feature for it. So I think the only way S breaks is if has_std would not have been detected in one case, but the std feature is forced in another place. That's maybe ok, because it implies that has_std detection was broken, so they shouldn't be expecting S to be different than std's RandomState.

I'll think about it, and give @bluss time to weigh in too...

@kevinaboos
Copy link
Author

I see, thanks for the detailed response! I'm definitely keen to consider your concerns and make any requested changes; for ex, a newtype/wrapper struct is totally fine with me.

The good thing about this PR's current changeset is that it's not a breaking change, as it only offers a default type parameter that can still be customized (as I'm sure you know).

But it's weird because we detect has_std and we have a feature for it. So I think the only way S breaks is if has_std would not have been detected in one case, but the std feature is forced in another place. That's maybe ok, because it implies that has_std detection was broken, so they shouldn't be expecting S to be different than std's RandomState.

I'm not sure what would happen if both no_std and std are concurrently enabled, as I can't think of a way to make that happen, but it seems to be beyond the scope of this PR. I'm not 100% sure, but I think that cargo might fail during the crate resolution phase (?).

@cuviper
Copy link
Member

cuviper commented Nov 19, 2021

The good thing about this PR's current changeset is that it's not a breaking change, as it only offers a default type parameter that can still be customized (as I'm sure you know).

Sure, users can pick their own, but that doesn't give us free reign to change our default. Someone might write:

let map: IndexMap<String, String, RandomState> = IndexMap::new();

... then that's a strong commitment to the default S = RandomState.

Or a weirder case, probably less likely, is to depend on it by coherence:

impl<K, V> MyTrait for IndexMap<K, V> {} // = DefaultHashBuilder ?
impl<K, V> MyTrait for IndexMap<K, V, RandomState> {}
impl<K, V> MyTrait for IndexMap<K, V, ahash::RandomState> {}

... that's only allowed if they are all distinct S.

That's where I'm trying to explore if there's a way that our next release, with or without the std feature, changes the compilation for any of these sorts of things. Broken has_std is all I've come up with -- which is "accidentally" no_std.

@kevinaboos
Copy link
Author

kevinaboos commented Nov 19, 2021

Thanks for the reply again! I'd like to understand things in detail so I can help with any required changes, but I also don't want to waste your time. So feel free to address my below responses if you want, but no pressure.

Sure, users can pick their own, but that doesn't give us free reign to change our default. Someone might write:

let map: IndexMap<String, String, RandomState> = IndexMap::new();

... then that's a strong commitment to the default S = RandomState.

Apologies for my ignorance here, but I don't see why this could be problematic. Such a statement would still work unchanged before & after this PR, right? Please correct me if I'm wrong.


Regarding the coherence issue, that's an interesting scenario you bring up, and while I agree it's unlikely, I do of course respect the need to consider potential issues there. I think your suggestion of using a newtype wrapper should avoid the issue of conflicting trait implementations, since it would be a new struct type that previously wasn't available to implement traits for.

Another option, admittedly not my preference, is to add a feature to the crate that would only set a default type for S if the feature was enabled. For example, one called ahash or default_hash_builder, etc... TBD.
That wouldn't be quite as ergonomic as the current solution, but it would be way better because it'd only require changes to a downstream crate's Cargo.toml file.
This feature could then be removed in the next version that introduced other (if any) breaking changes.

@bluss
Copy link
Member

bluss commented Nov 19, 2021

@cuviper can't it still be a problem if a crate is written using indexmap and nostd, and then it has its environment shifted by someone depending on that crate and std? Then it suddenly does not compile.

The situation here is that this situation in the Pr is generally disallowed/unworkable but there might be some tricky argument for why it works. Great, more tricks, that didn't work so well for us with the std detection 😅

When I have time I want to release indexmap 2 to be able to solve this properly.

@kevinaboos
Copy link
Author

@bluss thanks for the response!

can't it still be a problem if a crate is written using indexmap and nostd, and then it has its environment shifted by someone depending on that crate and std? Then it suddenly does not compile.

I don't believe it's possible to construe a scenario where this change causes a compilation failure, but I'm not 100% perfectly sure. If you have an example, perhaps we can arrive at a solution that addresses that.

The situation here is that this situation in the Pr is generally disallowed/unworkable but there might be some tricky argument for why it works.

Are you saying the use case I presented when introducing this PR isn't allowable, or that the situation @cuviper brought up earlier isn't feasible?

When I have time I want to release indexmap 2 to be able to solve this properly.

Sure. However, it'd be really nice to have this change there now; it'd relieve folks of the burden from changing hundreds of LoC that use indexmap and instead just allow them to bump the version in their Cargo.toml files.

Copy link
Member

@bluss bluss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To try to explain, this PR breaks the rule that cuviper mentioned

Generally, it would be a breaking change to have different S based on features, because one crate may depend on us with different features than another crate does, each expecting different default S, but that all gets unified by cargo.

and that means that the solution here is generally disallowed/unworkable unless a good explanation is found for why it would be allowed (Edit: To clarify, here I'm assuming the requested change for a wrapper type is done, that's the easy part, doesn't solve the crate feature question). This by the usual cargo rule that crates features have to be purely additive (and std is not, with this change). rust-lang/cargo#4328

@@ -5,6 +5,8 @@ pub use crate::rayon::set as rayon;

#[cfg(has_std)]
use std::collections::hash_map::RandomState;
#[cfg(not(has_std))]
use hashbrown::hash_map::DefaultHashBuilder;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @cuviper said, we can't expose hashbrown types in our public API. This change would make hashbrown a public dependency and it would not be possible for us to bump that version without changing our own major version. As cuviper said, we could just solve that with a wrapper.

Hashbrown is not becoming our public dependency, because it causes problems for our own versioning story.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, thanks for the clarification.

@kevinaboos
Copy link
Author

Thanks for the clarification! I'm more than happy to make the change to enclose hashbrown's DefaultHashBuilder in a newtype wrapper.

@bluss
Copy link
Member

bluss commented Nov 19, 2021

Yes, but the other question is the harder problem to crack.

@cuviper
Copy link
Member

cuviper commented Nov 19, 2021

@cuviper can't it still be a problem if a crate is written using indexmap and nostd, and then it has its environment shifted by someone depending on that crate and std? Then it suddenly does not compile.

Hmm, yeah that's a plausible scenario where even a working has_std detection would break things. You could have it working on a true no_std target with one S = default, then break when moving to a std target with a different S.

@kevinaboos
Copy link
Author

Yes, but the other question is the harder problem to crack.

Agreed. Thanks for all the excellent feedback and clarifications regarding this std detection issue.

In the interim, while waiting for a major version bump to v2.0.0, would my earlier idea of gating the S = default statement behind a feature work? It seems like it would avoid such problems, since downstream users of this crate would have to explicitly enable the new feature.

@bluss
Copy link
Member

bluss commented Nov 19, 2021

That unfortunately does not change the situation in formal terms since either the std feature or your new feature is non-additive after that.

@kevinaboos
Copy link
Author

That unfortunately does not change the situation in formal terms since either the std feature or your new feature is non-additive after that.

Hmm, ok. I suppose this'll have to wait for v2.0 then (?). Let me know if there's any way I can help.

@bluss
Copy link
Member

bluss commented Nov 27, 2021

Hey, thanks for exploring this anyway. You could have found something that we had missed (I have not been searching around in this space, and yes I had ignored anything about using features this way since I consider non-additive features something that we can't use).

Non-additive features won't be possible in 2.0 either, of course, but maybe there is something that can be done when removing the build script solution and if we can admit some minor breaking change to new or similar (?)

@kevinaboos
Copy link
Author

No problem, and thanks for your kind words.

@cuviper
Copy link
Member

cuviper commented May 6, 2022

FYI, the master branch is now on 2.0.0-pre, so this is the time to make any changes. I already moved to explicit "std" only, but otherwise the default S is still in the same situation.

@@ -38,8 +38,7 @@ rayon = { version = "1.2", optional = true }

[dependencies.hashbrown]
version = "0.11"
default-features = false
features = ["raw"]
features = ["ahash", "raw"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahash pulls in a non-trivial amount of dependencies. On linux it pulls in libc, version_check, cfg-if, once_cell and getrandom.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good find, I hadn't noticed that since I primarily build for a custom no_std target. If you're aware of other hashing crates with fewer dependencies, I could experiment with using them as the default. I'll look for others myself too, in the meantime.

If considering multiple hashing crates, we could gate the inclusion of each one behind a feature; this would allow std and no_std users alike to select different default hashers.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can only do that if we make the features mutually exclusive, as a hard build error. You could have one part of a large dependency tree that wants indexmap one way, and another part wanting the other way, but Cargo unifies dependencies into one build so we can't satisfy both.

But that's only about default S and the few methods that assume that. Most of the API is perfectly fine with bring-your-own-hasher S, so each part of that dependency tree can fill their own needs.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I had used that approach of pseudo-mutually-exclusive features for another project via the compile_error!() macro, but it's admittedly tedious and feels hacky.

I do hear what you're saying about multiple downstream crates using indexmap differently, and unfortunately there's no way around that.

@kevinaboos
Copy link
Author

FYI, the master branch is now on 2.0.0-pre, so this is the time to make any changes. I already moved to explicit "std" only, but otherwise the default S is still in the same situation.

Thanks for the heads up! I'll take a look at the changes and see if we can make this PR work for everyone.

@cuviper
Copy link
Member

cuviper commented May 9, 2022

I think we'll still run into the fundamental issue, that a feature ("std") should not change the meaning of a type like IndexMap<K, V>, with its hidden/default S. That said, I'll back off and see what you come up with. :)

@dhedey
Copy link

dhedey commented Jan 14, 2023

In HashBrown, depending on whether the ahash feature is present, they use either:

/// Default hasher for HashMap.
#[cfg(feature = "ahash")]
pub type DefaultHashBuilder = ahash::RandomState;
/// Dummy default hasher for `HashMap`.
#[cfg(not(feature = "ahash"))]
pub enum DefaultHashBuilder {}

Would we consider something similar here?

#[cfg(feature = "std")]
type DefaultHashBuilder = std::collections::hash_map::RandomState;
#[cfg(not(feature = "std"))]
#[cfg(feature = "ahash")]
type DefaultHashBuilder = ahash::RandomState;
#[cfg(not(feature = "std"))]
#[cfg(not(feature = "ahash"))]
pub enum DefaultHashBuilder {}

This also means that to enable ahash you'd have to disable default-features and explicitly add the ahash feature, which might be more palatable?

@dhedey
Copy link

dhedey commented Jan 14, 2023

Relatedly, would we consider a change from:

#[cfg(feature = "std")]
impl<K, V> IndexMap<K, V> {
    /// Create a new map. (Does not allocate.)
    #[inline]
    pub fn new() -> Self {
        ...
    }
    ...
}

To the following, like the FromIterator impl? Or is there something I've missed?

impl<K, V, S: BuildHasher + Default> IndexMap<K, V, S> {
    /// Create a new map. (Does not allocate.)
    #[inline]
    pub fn new() -> Self {
        ...
    }
    ...
}

This would mean that the new and with_capacity methods get included in no-std mode when you define an alias such as:

type IndexMap<K, V> = indexmap::IndexMap<K, V, hashbrown::hash_map::DefaultHashBuilder>;

@cuviper
Copy link
Member

cuviper commented Jan 14, 2023

Note that hashbrown still only provides new and with_capacity when "ahash" is enabled... those can't be implemented with that uninhabited dummy type (unless they would just panic!). I'm not really sure what the purpose of that dummy hasher is -- maybe just to avoid repeating the struct definition without the default type as we do.

Choosing different hashers based on features is still problematic, since features are unified by cargo. Two crates in the build tree may ask for indexmap with different choices, using APIs like with_hasher where they are passing a real hasher instance. Those can't have different types!

Removing the default S means everyone has to specify the hasher type somewhere. Defaults Affect Inference goes into this problem, "S = RandomState is load bearing."

IndexMap::default() is new() without a default S. There's no short alternative to with_capacity, but you can write with_capacity_and_hasher(c, Default::default()) -- maybe we can add with_capacity_and_default_hasher(c) if there's a name that's less of a mouthful.

@dhedey
Copy link

dhedey commented Jan 14, 2023

Understood, thanks for the explanation and the links, really appreciate your reply and the article link.

To summarise my understanding:

  • In HashBrown, they only have one supported hasher which the crate can compile with - I'm requesting two. Different crates in the build tree might assume that they use the standard hasher, and the non-std hasher; and when they get combined, one of them loses, and there's a compile error. Not good.
  • new() implementations can't be on a generic impl with a default type because otherwise IndexMap::new() wouldn't work any more because the compiler won't automatically infer the default Type. Instead, users would be forced to write something like IndexMap::<_, _, X>::new() which is obviously not okay. The linked article has a really good explanation on this. The compiler can however work its magic with traits (such as Default and FromIterator) on the other hand.

@cuviper
Copy link
Member

cuviper commented Jul 17, 2023

Note that indexmap 2 has been released, with its change to no longer detect std implicitly. However, this does not change the situation around new and with_capacity, and I left the default hasher as that from std.

@cuviper cuviper closed this Jul 17, 2023
@kevinaboos
Copy link
Author

Thanks for the update, appreciate it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants