Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MCP for adding #[must_use] annotations throughout the standard library #35

Closed
jkugelman opened this issue Oct 13, 2021 · 5 comments
Closed
Labels
mcp A Major Change Proposal (lightweight RFC) T-libs-api

Comments

@jkugelman
Copy link

jkugelman commented Oct 13, 2021

Proposal

The #[must_use] annotation is used on types and functions throughout the standard library. A cursory search shows approximately 300 existing uses, which is fantastic. However, spurred by a Stack Overflow question where the poster misused str.to_lowercase(), I looked into the standard library's overall coverage and found that there are potentially another 800+ missing annotations (see below for details).

I propose going through the entire standard library and adding as many of these annotations as we can. While the existing annotations prevent many common mistakes there are still many additional functions and types that could benefit from them.

In talking to library team members and polling the Rust community at large, there does not seem to be any opposition to adding more annotations. The shortfall is mostly due to:

  1. The lack of a clear policy around #[must_use]. Should it be used liberally, or only in error-prone situations?
  2. Risk of churn to the Rust ecosystem. Adding lots of warnings could be disruptive.

Thankfully, neither adding nor removing warnings breaks Rust's stability guarantees. Despite the "must" in its name, #[must_use] is a warning not an error. Cargo passes #[allow(warnings)] to dependencies so crates aren't affected by their dependencies sprouting new warnings.

Socialization

To gauge whether this effort would be welcome I started a Zulip discussion on the t-libs stream. The reception was positive. @yaahc recommended I ask the community at large on social media or on the Rust Internals forum.

For the Rust Internals discussion I gathered all of clippy's 800 must_use_candidate lints. I anticipated there might be resistance so I compiled a set of categories that covered most of the lints and listed some examples of each. You can see the list here, copied from the IRLO thread:

List of examples grouped by category

Transformer "mimics"

Like the monsters that pose as treasure chests in Dark Souls and eat you when you get near, these are newbie traps. They sound like they might mutate their input argument but actually return a new value.

str::to_uppercase(self) -> String;
str::to_lowercase(self) -> String;
str::trim_left(&self) -> &str;
str::trim_right(&self) -> &str;
<[u8]>::to_ascii_uppercase(&self) -> Vec<u8>;
<[u8]>::to_ascii_lowercase(&self) -> Vec<u8>;
i32::swap_bytes(self) -> i32;
i32::rotate_left(self, n) -> i32;
i32::to_be(self) -> i32;
i32::to_le(self) -> i32;

Non-trap transformers

These methods also transform their input, but unlike the "mimics" above they don't trick anybody into thinking they might do it in place.

Option::copied(self) -> Option<T>;
Option::cloned(self) -> Option<T>;
Rc::into_raw(self) -> *const T;
Box::<str>::into_string(self) -> String;
String::into_boxed_str(self) -> Box<str>;

Pure functions

Pure functions have clear inputs and outputs. There's no point in calling them if you're going to ignore the result.

f32::is_nan(self) -> bool;
i32::is_negative(self) -> bool;
f32::sin_cos(self) -> (f32, f32);
slice::is_ascii(self) -> bool;
char::is_control(self) -> bool;
memchr(u8, &[u8]) -> Option<usize>;
memrchr(u8, &[u8]) -> Option<usize>;

Getters

Like the pure functions, why call a getter if you don't check the result? I can't think of a good reason.

into_keys is interesting because it does have the side effect of consuming the map. One could use it to drop the map. But that's a crazy use case, no? They ought to be calling drop.

str::len(&self) -> usize;
str::is_empty(&self) -> bool;
BTreeMap::keys(&self) -> Keys<K, V>;
BTreeMap::into_keys(self) -> Keys<K, V>;
Utf8Error::valid_up_to(&self) -> usize;
Formatter::width(&self) -> Option<usize>;

Used for panics?

It's plausible someone might write a[i] to cause a panic if the index is out of bounds. Is that a legitimate use case or should they get a must use warning?

Index::index(&self, Idx) -> &Self::Output;

Allocating constructors

It's wasteful to allocate a new object just to throw it away.

Vec::with_capacity(usize) -> Vec<T>;
Arc::downgrade(&Arc<T>) -> Weak<T>;
String::from_utf16_lossy(&[u16]) -> String;
Backtrace::capture() -> Backtrace;
mpsc::channel() -> (Sender<T>, Receiver<T>);

Non-allocating constructors

These constructors are cheap. If you don't save the return values it doesn't really cost you much. It's still wasteful, but only minorly so.

Vec::new() -> Vec<T>;
iter:empty() -> Empty<T>;
mem::zeroed() -> T;
str::from_utf_unchecked(&[u8]) -> &str;
MaybeUninit::uninit() -> MaybeUninit<T>;
Instant::now() -> Instant;

It turns out it wasn't as controversial as I'd expected. @joshtriplett said to do "all of them" and was showered with ❤️s. I took that as a green light.

Work to date

I've been working my way through the list a little bit each day. I've added ~500 annotations across ~20 PRs to date. You can see the full list of changes in the tracking issue:

Criteria

The policy I've been using is to add #[must_use] to any function that:

  1. Has no mutable args,
  2. Has a non-unit return type, and
  3. Has no side effects.

Side effects include things like mutating statics or spawning threads.

In other words, pure functions.

Concerns

The biggest concern is not generating a ton of churn. We need to be prepared to revert if there's some pattern in the ecosystem that uses one of these functions that leads to a ton of warnings. We're allowed to add warnings, but it can still be disruptive.

We don't expect a substantial number of warnings and plan to revert anything that generates substantial new warnings through the ecosystem (e.g. as detected in crater runs for the release).

The other issue is that we're very sensitive to giving people "warning fatigue", which is especially likely if there are false positives.

Mentors or Reviewers

@joshtriplett has been reviewing my work. And I may or may not have stolen some of his words for this writeup.

Past discussion

There's an existing discussion dating back to 2018 that predates my threads. Unfortunately, I did not know about it when I started working on this.

Future discussion

  • We lack clear policy. The existing policy describes when #[must_use] can be used, which is not the same as when it should be.

  • Many people have proposed solving this in a different manner, whether that be via new language rules, or the compiler auto-detecting when #[must_use] applies, or anything else that doesn't involve hundreds of manual annotations.

Links

@m-ou-se m-ou-se added I-libs-api-nominated Indicates that an issue has been nominated for discussion during a team meeting. T-libs-api labels Oct 13, 2021
@scottmcm
Copy link
Member

In other words, pure functions.

I wonder if it would make sense to add a new #[pure] attribute for those cases, since it could have a stronger meaning than just #[must_use] and it might be nice to have a different lint name for the things like .len() where a stray unused call is less of a worry than, say, using .map(...) when you meant .for_each(...).

But that could always be done later, for code we control at least. I guess whether it'd be worth doing earlier would depend on how much we'd expect the ecosystem to follow this and start also putting #[must_use] on such things.

@cuviper
Copy link
Member

cuviper commented Oct 13, 2021

I think #[pure] has a stronger implication for compiler optimization, like we might coalesce or remove redundant calls, whereas #[must_use] feels true to its reality of just being a lint.

@yaahc
Copy link
Member

yaahc commented Oct 13, 2021

That sounds fun, so long as #[pure] doesn't imply a stability commitment that isn't implied by #[must_use]. I think a big part of how low the bar is for adding new #[must_use] attributes is the fact that we can always freely remove them later.

Though honestly, even if it is more of a commitment it would probably still be useful, we'd just use it more sparingly.

@m-ou-se
Copy link
Member

m-ou-se commented Oct 20, 2021

cc @rust-lang/libs-api

@m-ou-se m-ou-se added mcp A Major Change Proposal (lightweight RFC) and removed I-libs-api-nominated Indicates that an issue has been nominated for discussion during a team meeting. labels Oct 27, 2021
@m-ou-se
Copy link
Member

m-ou-se commented Oct 27, 2021

This was discussed in the meeting last week, and no concerns were brought up. We're happy to see this effort continue.

Thanks for the detailed write up and working on these improvements!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
mcp A Major Change Proposal (lightweight RFC) T-libs-api
Projects
None yet
Development

No branches or pull requests

5 participants