Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New lint: [duplicate_map_keys] #12575

Draft
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

SeseMueller
Copy link

@SeseMueller SeseMueller commented Mar 27, 2024

Fixes #11978


changelog: new [duplicate_map_keys] lint

@rustbot
Copy link
Collaborator

rustbot commented Mar 27, 2024

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @llogiq (or someone else) some time within the next two weeks.

Please see the contribution instructions for more information. Namely, in order to ensure the minimum review times lag, PR authors and assigned reviewers should ensure that the review label (S-waiting-on-review and S-waiting-on-author) stays updated, invoking these commands when appropriate:

  • @rustbot author: the review is finished, PR author should check the comments and take action accordingly
  • @rustbot review: the author is ready for a review, this PR will be queued again in the reviewer's queue

@rustbot rustbot added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties label Mar 27, 2024
@SeseMueller
Copy link
Author

As outlined in #11978, the main goal now is to also detect if multiple inserts into a HashMap are done on the same key, which is considerably more difficult. I'll start work on it in the near future.

Comment on lines 11 to 15
/// When two items are inserted into a `HashMap` with the same key,
/// the second item will overwrite the first item.
///
/// ### Why is this bad?
/// This can lead to data loss.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// When two items are inserted into a `HashMap` with the same key,
/// the second item will overwrite the first item.
///
/// ### Why is this bad?
/// This can lead to data loss.
/// Warn when a `Hashmap` is set up with the
/// same key appearing twice.
///
/// ### Why is this bad?
/// When two items are inserted into a `HashMap` with the same key,
/// the second item will overwrite the first item.
/// This can lead to data loss.

///
/// ### Why is this bad?
/// This can lead to data loss.
///
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
///
///
/// # Known Problems
/// False negatives: The lint only looks into
/// `HashMap::from([..])` calls.

// Then check the keys
{
// Put all keys in a vector
let mut literals = Vec::new();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could use SpanlessEq to correctly lint on arbitrary key expressions.

Also, why not return a (possibly empty) set of duplicate key spans instead of a plain bool? That way we could better pinpoint the error.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for point out SpanlessEq. The function now returns Option<Vec<(Expr, Expr)>>, so all pairs of found duplicates. I should probably someday change it to return all Expressions with their duplicates instead of split over multiple pairs.

use std::collections::HashMap;

fn main() {
let example = HashMap::from([(5, 1), (5, 2)]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to see more tests. For example one with a () key, one with a key expr containing a return statement, and one with a custom key type containing a bool where eq always returns true and hash is an empty function, then HashMap::from([(Bad(true), 1), (Bad(false)), 2)]). That would constitute an (acceptable) false negative.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've now written and tested all tests you recommended, but I didn't fully understand what you meant with "key expr containg a return statement"

    let _ = HashMap::from([(return Ok(()), 1), (return Err(()), 2)]); // expect no lint 

is what I did, but everything should already be caught by clippy::diverging_sub_expression, unless I missed or misunderstood something.

/// let example = HashMap::from([(5, 1), (5, 2)]);
/// ```
#[clippy::version = "1.79.0"]
pub HASH_COLLISION,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Hash collision" to me means different keys with the same hash, but this lint specifically looks for same keys.
(A hash collision also isn't bad in and of itself and won't lead to overwriting entries, a hashmap can deal with that)

Should this be renamed from hash_collision to something else? What about duplicate_map_keys?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I should have caught that. duplicate_map_keys is a good name, I've changed it in the latest commit.

@SeseMueller SeseMueller changed the title New lint: [hash_collision] New lint: [duplicate_map_keys] Mar 28, 2024
expr: &'a rustc_hir::Expr<'_>,
) -> Option<Vec<(rustc_hir::Expr<'a>, rustc_hir::Expr<'a>)>> {
// If the expression is a call to `HashMap::from`, check if the keys are the same
if let ExprKind::Call(func, args) = &expr.kind
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if let ExprKind::Call(func, args) = &expr.kind
if let ExprKind::Call(func, [arg]) = &expr.kind

Matching a 1-element slice would let you avoid the args.len() == 1 check and index later.

Comment on lines +65 to +75
let mut keys = Vec::new();

for arg in *args {
//
if let ExprKind::Tup(args) = &arg.kind
&& !args.is_empty()
// && let ExprKind::Lit(lit) = args[0].kind
{
keys.push(args[0]);
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let mut keys = Vec::new();
for arg in *args {
//
if let ExprKind::Tup(args) = &arg.kind
&& !args.is_empty()
// && let ExprKind::Lit(lit) = args[0].kind
{
keys.push(args[0]);
}
}
let keys = args.iter().filter_map(|arg| {
if let ExprKind::Tup([key, ..]) = arg.kind {
Some(key)
} else {
None
}
}).collect::<Vec<_>>();

Comment on lines +80 to +81
for i in 0..keys.len() {
for j in i + 1..keys.len() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a quadratic algorithm that will slow down considerably with larger sets (try a thousand elements). A better way would be to use a SpanlessHash per item and add them to a HashSet, which is linear.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Awaiting review from the assignee but also interested parties
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Warn for overwriting HashMap entry
4 participants