Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add lints for raw string literals #112373

Closed
wants to merge 5 commits into from

Conversation

GrishaVar
Copy link
Contributor

This PR introduces two lints related to raw {ε, byte, c} string literals. Though these literals currently allow up to 255 hashes to surround them, my main thought was that lines like

let x = r#^{200}" normal string "#^{200};

should never occur (without a warning) in code.

The first lint, unused_raw_string_hash, triggers when a raw string literal is surrounded by more hashes than it needs. In a way, it's similar to the unused_parens lint, but for hashes.

The second lint expands on the idea of unnessesary raw string syntax. When a string literal contains no escapes and no quotes, prefixing it with an r has no impact on its value. The unused_raw_string lint triggers when a string literal is raw without needing to be.

With the current implementation, a file containing

let x = [
    r#" " "#,  // just right
    r#" \ "#,  // too many hashes
    r#" ! "#,  // too many hashes, not raw
     r" ! ",   // not raw
    
];

emits

warning: raw string literal uses more hashes than it needs.
  --> /home/user/dev/test/test.rs:13:9
   |
13 |         r#" \ "#,
   |         ^^^^^^^^ this raw string requires 0 hashes, but 1 is used
   |
   = note: `#[warn(unused_raw_string_hash)]` on by default

warning: raw string literal uses more hashes than it needs.
  --> /home/user/dev/test/test.rs:14:9
   |
14 |         r#" ! "#,
   |         ^^^^^^^^ this raw string requires 0 hashes, but 1 is used

warning: string literal does not need to be raw.
  --> /home/user/dev/test/test.rs:14:9
   |
14 |         r#" ! "#,
   |         ^^^^^^^^ removing the `r` and hashes would result in the same value
   |
   = note: `#[warn(unused_raw_string)]` on by default

warning: string literal does not need to be raw.
  --> /home/user/dev/test/test.rs:15:10
   |
15 |          r" ! ",
   |          ^^^^^^ removing the `r` would result in the same value

Some related thoughts:

  • This PR is currently not mergable, many improvements can (and should) still be made. This is just a proof-of-concept which emits the warnings.
  • These lints are closely related. If hashes are present, triggering unused_raw_string will also trigger unused_raw_string_hash. The latter would be redundant information, so perhaps should be supressed.
  • I have not implemented suggestions, but these should be MachineApplicable without issue.
  • It may be that the unused_raw_string_hash lint is too strict. It could be amended to trigger if
    • there are n too many hashes, or perhaps
    • there are too many and more than n hashes
      though this treatment is not given to unused_parens.

Please let me know if this is worth pursuing futher, if any code/ideas should be adapted, or if anything else comes to mind :)

@rustbot label: +T-lang

Proof-of-concept, lots more could be done, eg. suggestions and short-circuiting search
Many improvements still possible, eg. suggestions and narrower span
@rustbot
Copy link
Collaborator

rustbot commented Jun 7, 2023

r? @WaffleLapkin

(rustbot has picked a reviewer for you, use r? to override)

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-lang Relevant to the language team, which will review and decide on the PR/issue. labels Jun 7, 2023
@rust-log-analyzer
Copy link
Collaborator

The job mingw-check failed! Check out the build log: (web) (plain)

Click to see the possible cause of the failure (guessed by this bot)
   Compiling miropt-test-tools v0.1.0 (/checkout/src/tools/miropt-test-tools)
error: raw string literal uses more hashes than it needs.
  --> src/tools/miropt-test-tools/src/lib.rs:59:48
   |
59 |                 let ext_re = regex::Regex::new(r#"(\.(mir|dot|html))$"#).unwrap();
   |                                                ^^^^^^^^^^^^^^^^^^^^^^^^ this raw string requires 0 hashes, but 1 is used
   |
   = note: `-D unused-raw-string-hash` implied by `-D warnings`
error: could not compile `miropt-test-tools` (lib) due to previous error
warning: build failed, waiting for other jobs to finish...
Build completed unsuccessfully in 0:05:48

@@ -541,6 +541,22 @@ lint_unused_op = unused {$op} that must be used
.label = the {$op} produces a value
.suggestion = use `let _ = ...` to ignore the resulting value

lint_unused_raw_string = string literal does not need to be raw.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the Rust Compiler Development Guide, diagnostics should not have trailing punctuation.

Suggested change
lint_unused_raw_string = string literal does not need to be raw.
lint_unused_raw_string = string literal does not need to be raw

} would result in the same value

lint_unused_raw_string_hash =
raw string literal uses more hashes than it needs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise

Suggested change
raw string literal uses more hashes than it needs.
raw string literal uses more hashes than it needs

@scottmcm
Copy link
Member

scottmcm commented Jun 7, 2023

Hmm, it's not obvious to me that extra hashes is really a problem that needs to be warned about.

They remind me of the parens in let x = (a * b) + c;, which are also not necessary, but that we nevertheless don't warn about.

If I always write my regexes with raw strings even if they don't happen to contain \s, or if I put a big block of text in ####s for safety just in case, that doesn't seem unreasonable to me. Kinda like how I used b'\"' in a recent PR because it lined up better with the other escape sequences I was matching (b'\n' and b'\'' and such), even though that \ isn't actually needed either.

To me, this is the kind of "hey, did you know that you could ________" thing that's wonderful as a subtle hint from R-A in an IDE, but I'm not convinced that it's something I'd ever want to see as a warning on the command line while I'm working on something.

Is there some particular harm you're hoping to avoid with these lints that I'm missing?

@GrishaVar
Copy link
Contributor Author

it's not obvious to me that extra hashes is really a problem that needs to be warned about.

I was under the impression that that lints do not necessarily have to indicate "problems" in the code (eg while_true, redundant_semicolons). Is there some criteria somewhere for lints that I missed?

They remind me of the parens in let x = (a * b) + c;, which are also not necessary, but that we nevertheless don't warn about

Yeah, I see what you mean. In this case, the braces are useful because they make the implicit associativity strength explicit through the braces, so it can improve readability. I'd argue extra hashes are more of a x = (a * b + c)
kind of thing (which does trigger a warning), but in that case any number of extra parentheses clearly has no impact on the result whereas with the hashes it may not be immediately obvious.

Is there some particular harm you're hoping to avoid with these lints that I'm missing?

My main idea was to prevent code with hundreds of hashes from going through. A while ago I reduced the hash count maximum to 255 from 2^16, I'm reminded by a justification someone (oh, I looked it up, it was you! small world) gave for this: 255 is an order of magnitude more than any plausible situation. In general, if someone finds themselves needing/wanting hundreds of hashes, they are doing something rather questionable and may very easily find themselves crossing the 255 limit, which would be an issue. I would imagine most people aren't aware 256 hashes will be rejected by the lexer.

I mentioned in my previous message, the hash warning criterion may be too strict. It could, for instance, be applied to messages which have 50 hashes too many. Or perhaps just any literal with more than 50 hashes, too many or not. In the current state, the warning would presumably encourage people to use fewer hashes if possible, keeping them away from the 255 bound.

@WaffleLapkin WaffleLapkin added S-waiting-on-team Status: Awaiting decision from the relevant subteam (see the T-<team> label). and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jun 7, 2023
@WaffleLapkin
Copy link
Member

Switched to S-waiting-on-team Status: Awaiting decision from the relevant subteam (see the T-<team> label). , as new lints generally require a T-lang approval. I'll review the implementation if/when T-lang approves this lint.

@est31
Copy link
Member

est31 commented Jun 7, 2023

They remind me of the parens in let x = (a * b) + c;, which are also not necessary, but that we nevertheless don't warn about

The expression precedence table is quite complicated and one can't expect people to keep it in their head. For example take something like v / w % 3, is this (v / w) % 3 or is this v / (w % 3) :). In one of the two cases the ()s are technically redundant, in the other they aren't. The best way to understand the expression is to have the ()s present.

If you lint about extra ()s then that's different IMO than to lint about extra #s which follows simpler rules, because removing ()s might make an expression harder to understand.

@petrochenkov
Copy link
Contributor

petrochenkov commented Jun 7, 2023

Maybe this can go into clippy first?
This lint doesn't seem to use any internal compiler data unavailable to clippy, and the issue it addresses isn't especially pressing.

@WaffleLapkin
Copy link
Member

fyi: there is also a PR to add such a lint to clippy: rust-lang/rust-clippy#10884

@GrishaVar
Copy link
Contributor Author

Looks like I was a couple days late, shame. My unused_raw_string idea is not implemented there, perhaps it should be @Centri3.

@GrishaVar GrishaVar closed this Jun 8, 2023
@scottmcm
Copy link
Member

scottmcm commented Jun 8, 2023

I was under the impression that that lints do not necessarily have to indicate "problems" in the code (eg while_true, redundant_semicolons). Is there some criteria somewhere for lints that I missed?

while true has impacts on initialization checking and such, which is why it's worth warning about.

The goal is to teach people that writing

pub fn foo() -> i32 {
    while true {
        // ...
        return 4;
    }
    unreachable!()
}

is a poor way to go about it, since it would be better as

pub fn foo() -> i32 {
    loop {
        // ...
        return 4;
    }
}

It's definitely fair that it's not a clearly-delineated line, though. (( expr )) being worth linting on but (a * b) + c not being worth linting on isn't something for which I think we have clear heuristics.

Basically, my thought here was that if someone wants to do

const BOOK_SNIPPIT: &str = r#################"

Then, in your new *variables* directory, open *src/main.rs* and replace its
code with the following code, which won’t compile just yet:

<span class="filename">Filename: src/main.rs</span>

```rust,ignore,does_not_compile
{{#rustdoc_include ../listings/ch03-common-programming-concepts/no-listing-01-variables-are-immutable/src/main.rs}}
```
Save and run the program using `cargo run`. You should receive an error message
regarding an immutability error, as shown in this output:
```console
{{#include ../listings/ch03-common-programming-concepts/no-listing-01-variables-are-immutable/output.txt}}
```
This example shows how the compiler helps you find errors in your programs.
Compiler errors can be frustrating, but really they only mean your program
isn’t safely doing what you want it to do yet; they do *not* mean that you’re
not a good programmer! Experienced Rustaceans still get compiler errors

"#################;

Then I think I'm fine with that, even though it's definitely more #s than are really needed.

I guess maybe I'm thinking of it kinda like people using

////////////////////////////////////////////////////

as separators.

flip1995 pushed a commit to flip1995/rust that referenced this pull request Jun 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-team Status: Awaiting decision from the relevant subteam (see the T-<team> label). T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-lang Relevant to the language team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants