Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add check for possible CStr literals in pre-2021 #118691

Merged
merged 1 commit into from Dec 20, 2023

Conversation

chfogelman
Copy link
Contributor

Fixes #118654

Adds information to errors caused by possible CStr literals in pre-2021.

The lexer separates c"str" into two tokens if the edition is less than 2021, which later causes an error when parsing. This error now has a more helpful message that directs them to information about editions. However, the user might also have written c "str" in a later edition, so to not confuse people who are using a recent edition, I also added a note about whitespace.

We could probably figure out exactly which scenario has been encountered by examining spans and editions, but I figured it would be better not to overcomplicate the creation of the error too much.

This is my first code PR and I tried to follow existing conventions as much as possible, but I probably missed something, so let me know!

@rustbot
Copy link
Collaborator

rustbot commented Dec 7, 2023

r? @b-naber

(rustbot has picked a reviewer for you, use r? to override)

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Dec 7, 2023
@@ -640,6 +640,17 @@ impl<'a> Parser<'a> {
}
}

// extra info for `c"str"` before 2021 edition or `c "str"` in all editions
if self.prev_token.is_ident_named(Symbol::intern("c"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if self.prev_token.is_ident_named(Symbol::intern("c"))
if self.prev_token.is_ident_named(sym::c)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably also want to check for cr (as in cr"", raw c-string literals). If I'm not mistaken “sym::c && StrRaw” only accounts for c r"" (notice the space).

// extra info for `c"str"` before 2021 edition or `c "str"` in all editions
if self.prev_token.is_ident_named(Symbol::intern("c"))
&& let TokenKind::Literal(token::Lit { kind: token::Str | token::StrRaw(..), .. }) =
&self.token.kind
Copy link
Member

@fmease fmease Dec 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could check if the c immediately precedes the string literal via self.prev_token.span.hi() == self.token.span.lo() to rule out the false positives like c "" (notice the space) since I don't know how helpful the c-str notes are for c "". For comparison, we don't provide any special diagnostics for b "" and r "".

Copy link
Member

@fmease fmease Dec 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While you noted

We could probably figure out exactly which scenario has been encountered by examining spans and editions, but I figured it would be better not to overcomplicate the creation of the error too much.

it should be as simple as self.prev_token.span.eq_ctxt(self.token.span) && self.token.span.edition < Edition::Edition2021.

The eq_ctxt check prevents the notes about cstr literals from misfiring on:

// edition: <2021
macro_rules! m { ($x:ident) => { $x"" } }
fn main() { m!(c) }

Your current implementation fires here (I think, unless I'm misremembering and we have an uninterpolated NtIdent 🤔).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fixing this up, but using this macro as an example, self.prev_token.span.eq_ctxt(self.token.span) == true. The previous token's span appears to be that of the $x token in the macro expansion body, rather than the c argument to the macro, so they have the same context. Is this correct, or am I missing something?

&self.token.kind
{
err.note("you may be trying to declare a CStr literal");
err.note("`CStr` literals require edition 2021");
Copy link
Member

@fmease fmease Dec 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
err.note("`CStr` literals require edition 2021");
err.note("`c-string literals require at least edition 2021");

&& let TokenKind::Literal(token::Lit { kind: token::Str | token::StrRaw(..), .. }) =
&self.token.kind
{
err.note("you may be trying to declare a CStr literal");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could also think about moving this check above the creation of err or alternatively .cancel() the err and provide a specialized error message instead of expected one of [symbol soup] like proposed in #118654: c-string literals are not supported in Rust 2018 and older.

@chfogelman
Copy link
Contributor Author

I've gone ahead and removed the ambiguity with whitespace, leaving that to just be a standard "expected one of" errors for consistency with b"str" and others. I added some additional checks to ensure that what we have is really intended to be a cstr literal, and also to pick up raw cstr literals and eliminate false positives in macro expansion. And I changed the error message itself to remove the operator soup and just say that the string token itself was unexpected.

This could still misfire on the characters cr# if they are not the start of a c-string, but I'm not sure that's worth trying to get perfect. At this point, the lexer has already turned this into the tokens [cr, #,...], so this would essentially require us to re-lex the rest of the statement to be 100% sure we're looking at something that would be a cstr literal in later editions.

@fmease
Copy link
Member

fmease commented Dec 14, 2023

r? fmease

@rustbot rustbot assigned fmease and unassigned b-naber Dec 14, 2023
Copy link
Member

@fmease fmease left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that's great! Sorry for taking so long. Your code looks good, I just have some nitpicks about the comments and the diagnostics.

Lastly, could you squash your commits in one? After that, it should be ready to go!

@@ -640,6 +640,34 @@ impl<'a> Parser<'a> {
}
}

// Extra info for `c"str"` before 2021 edition or `c "str"` in all editions. The heuristic
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Extra info for `c"str"` before 2021 edition or `c "str"` in all editions. The heuristic
// Try to detect an attempt by the user to write a c-string literal before the 2021 edition. The heuristic

or something like that (it gives a better summary imo). Note that your comment still mentions c "str" (with a space) which is no longer accurate.

// edition where c-string literals are not allowed. There is the slight possibility of a
// false positive for a `cr#` that wasn't intended to start a c-string literal, but the
// lexer was greedy and didn't preserve whether the `r#` on its own would have started a
// valid raw string literal.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sentence is fine but you could also think about expanding upon it with something along the lines of “and we don't want to perform unbounded lookahead here to check if we have a sequence of hashes followed by a string literal”.

&& self.prev_token.span.hi() == self.token.span.lo()
&& !self.token.span.at_least_rust_2021()
{
err.cancel();
Copy link
Member

@fmease fmease Dec 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, since this subdiagnostic can have false positives, it might be wiser to keep the “token soup” error, not sure. When I suggested canceling the error, I assumed the diagnostic would be 100% accurate and we could replace the main message of the diagnostic similar to error[E0670]: `async fn` is not permitted in Rust 2015.

Your call though. If you'd like to keep the shorter message, I'd go with unexpected token (dropping the found) and with unexpected token over found here since the current phrasing isn't used in any other of rustc's error messages or only very rarely regarding the latter.

let mut err =
self.struct_span_err(self.token.span, format!("found unexpected token {descr}"));
err.span_label(self.token.span, "found here".to_owned());
err.note("you may be trying to declare a c-string literal");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
err.note("you may be trying to declare a c-string literal");
err.note("you might have meant to write a c-string literal");

Not a native speaker but the word declare doesn't feel right to me in this context.

self.struct_span_err(self.token.span, format!("found unexpected token {descr}"));
err.span_label(self.token.span, "found here".to_owned());
err.note("you may be trying to declare a c-string literal");
err.note("c-string literals require edition 2021 or later");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
err.note("c-string literals require edition 2021 or later");
err.note("c-string literals require the 2021 edition or later");
Suggested change
err.note("c-string literals require edition 2021 or later");
err.note("c-string literals require Rust 2021 or later");

(minor) Preexisting diagnostics always seem to be using one of the suggested expressions.

@fmease fmease added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Dec 19, 2023
@chfogelman
Copy link
Contributor Author

chfogelman commented Dec 19, 2023

Okay, reverted to token soup because I agree it's more appropriate for this case; we really just want a hint here, not a whole new error (also saves me the trouble of being too creative with error messages). Also changed declare->write, and edition->Rust, but left the other phrasing as-is. And sqashed. Thanks for the reviews!

@fmease
Copy link
Member

fmease commented Dec 19, 2023

Thanks! :)

@bors r+ rollup

@bors
Copy link
Contributor

bors commented Dec 19, 2023

📌 Commit 2c96025 has been approved by fmease

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Dec 19, 2023
compiler-errors added a commit to compiler-errors/rust that referenced this pull request Dec 19, 2023
…mease

Add check for possible CStr literals in pre-2021

Fixes [rust-lang#118654](rust-lang#118654)

Adds information to errors caused by possible CStr literals in pre-2021.

The lexer separates `c"str"` into two tokens if the edition is less than 2021, which later causes an error when parsing. This error now has a more helpful message that directs them to information about editions. However, the user might also have written `c "str"` in a later edition, so to not confuse people who _are_ using a recent edition, I also added a note about whitespace.

We could probably figure out exactly which scenario has been encountered by examining spans and editions, but I figured it would be better not to overcomplicate the creation of the error too much.

This is my first code PR and I tried to follow existing conventions as much as possible, but I probably missed something, so let me know!
bors added a commit to rust-lang-ci/rust that referenced this pull request Dec 19, 2023
…mpiler-errors

Rollup of 7 pull requests

Successful merges:

 - rust-lang#118691 (Add check for possible CStr literals in pre-2021)
 - rust-lang#118973 (rustc_codegen_ssa: Don't drop `IncorrectCguReuseType` , make `rustc_expected_cgu_reuse` attr work)
 - rust-lang#119071 (-Znext-solver: adapt overflow rules to avoid breakage)
 - rust-lang#119089 (effects: fix a comment)
 - rust-lang#119096 (Yeet unnecessary param envs)
 - rust-lang#119118 (Fix arm64e-apple-ios target)
 - rust-lang#119134 (resolve: Feed visibilities for unresolved trait impl items)

r? `@ghost`
`@rustbot` modify labels: rollup
bors added a commit to rust-lang-ci/rust that referenced this pull request Dec 20, 2023
…mpiler-errors

Rollup of 7 pull requests

Successful merges:

 - rust-lang#118691 (Add check for possible CStr literals in pre-2021)
 - rust-lang#118973 (rustc_codegen_ssa: Don't drop `IncorrectCguReuseType` , make `rustc_expected_cgu_reuse` attr work)
 - rust-lang#119071 (-Znext-solver: adapt overflow rules to avoid breakage)
 - rust-lang#119089 (effects: fix a comment)
 - rust-lang#119096 (Yeet unnecessary param envs)
 - rust-lang#119118 (Fix arm64e-apple-ios target)
 - rust-lang#119134 (resolve: Feed visibilities for unresolved trait impl items)

r? `@ghost`
`@rustbot` modify labels: rollup
bors added a commit to rust-lang-ci/rust that referenced this pull request Dec 20, 2023
…iaskrgr

Rollup of 7 pull requests

Successful merges:

 - rust-lang#118691 (Add check for possible CStr literals in pre-2021)
 - rust-lang#118973 (rustc_codegen_ssa: Don't drop `IncorrectCguReuseType` , make `rustc_expected_cgu_reuse` attr work)
 - rust-lang#119071 (-Znext-solver: adapt overflow rules to avoid breakage)
 - rust-lang#119089 (effects: fix a comment)
 - rust-lang#119094 (Add function ABI and type layout to StableMIR)
 - rust-lang#119102 (Add arm-none-eabi and armv7r-none-eabi platform-support documentation.)
 - rust-lang#119107 (subtype_predicate: remove unnecessary probe)

Failed merges:

 - rust-lang#119135 (Fix crash due to `CrateItem::kind()` not handling constructors)
 - rust-lang#119141 (Add method to get instance instantiation arguments)

r? `@ghost`
`@rustbot` modify labels: rollup
@bors bors merged commit f3f9b30 into rust-lang:master Dec 20, 2023
11 checks passed
@rustbot rustbot added this to the 1.76.0 milestone Dec 20, 2023
rust-timer added a commit to rust-lang-ci/rust that referenced this pull request Dec 20, 2023
Rollup merge of rust-lang#118691 - chfogelman:improve-cstr-error, r=fmease

Add check for possible CStr literals in pre-2021

Fixes [rust-lang#118654](rust-lang#118654)

Adds information to errors caused by possible CStr literals in pre-2021.

The lexer separates `c"str"` into two tokens if the edition is less than 2021, which later causes an error when parsing. This error now has a more helpful message that directs them to information about editions. However, the user might also have written `c "str"` in a later edition, so to not confuse people who _are_ using a recent edition, I also added a note about whitespace.

We could probably figure out exactly which scenario has been encountered by examining spans and editions, but I figured it would be better not to overcomplicate the creation of the error too much.

This is my first code PR and I tried to follow existing conventions as much as possible, but I probably missed something, so let me know!
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

confusing error when using c-string literals on older editions
5 participants