Syntax errors for string and int #7952

bobhy · 2023-02-03T06:00:18Z

Description

Added a few syntax errors in ints and strings, changed parser to stop and show that error rather than continue trying to parse those tokens as some other shape. However, I don't see how to push this direction much further, and most of the classic confusing errors can't be changed.

Flagged as WIP for the moment, but passes all checks and works better than current release:

I have yet to figure out how to make these errors refer back to the book, as I see some other errors do.
How to give syntax error when malformed int is first token in line? Currently parsed as external command, user gets confusing error message.
Would like to be more strict with decimal int literals (lacking, e.g, `0x' prefix). Need to tinker more with the order of parse shape calls, currently, float is tried after int, so '1.4' has to be passed.

(Description of your pull request goes here. Provide examples and/or screenshots if your changes affect the user experience.)

〉"\z"
Error: 
   ╭─[entry #3:1:1]
 1 │ "\z"
   ·  ─┬─
   ·   ╰── Syntax error in string, unrecognized character after escape '\'.
   ╰────

Canonic presentation of a syntax error.

〉"  \u{01ffbogus}"
Error: 
  × Invalid syntax
   ╭─[entry #2:1:1]
 1 │ "  \u{01ffbogus}"
   ·    ───────┬──────
   ·           ╰── Syntax error in string, expecting 1 to 6 hex digits in unicode escape '\u{X...}', max value 10FFFF.
   ╰────

Malformed unicode escape in string, flagged as error.
String parse can be opinionated, it's the last shape tried.

〉0x22bogus
Error: nu::shell::external_command (link)
  × External command failed
   ╭─[entry #4:1:1]
1 │ 0x22bogus
   · ────┬────
   ·     ╰── executable was not found
   ╰────
  help: No such file or directory (os error 2)

A correct number in first token would be evaluated, but an incorrect one is treated as external command? Confusing to users.

〉0 + 0x22bogus
Error: 
  × Invalid syntax
   ╭─[entry #5:1:1]
1 │ 0 + 0x22bogus
   ·     ────┬────
   ·         ╰── Syntax error in int, invalid digits in radix 16 int.
   ╰────

Can give syntax error if token is unambiguously int literal. e.g has 0b or 0x prefix, could not be a float.

〉0 + 098bogus
Error: nu::parser::unsupported_operation (link)

  × Types mismatched for operation.
   ╭─[entry #6:1:1]
 1 │ 0 + 098bogus
   · ┬ ┬ ────┬───
   · │ │     ╰── string
   · │ ╰── doesn't support these values.
   · ╰── int
   ╰────
  help: Change int or string to be the right types and try again.

But decimal literal (no prefix) can't be too strict. Parser is going to try float later. So '1.4' must be passed.

User-Facing Changes

First and foremost, more specific error messages for typos in string and int literals. Probably improves interactive user experience.

But a script that was causing and then checking for specific error might notice a different error message.

(List of all changes that impact the user experience here. This helps us keep track of breaking changes.)

Tests + Formatting

Added (positive and negative unit tests in cargo test -p nu-parser. Didn't add integration tests.

Make sure you've run and fixed any issues with these commands:

cargo fmt --all -- --check to check standard code formatting (cargo fmt --all applies these changes)
cargo clippy --workspace -- -D warnings -D clippy::unwrap_used -A clippy::needless_collect to check that you're using the standard code style
cargo test --workspace to check that all tests pass

After Submitting

If your PR had any user-facing changes, update the documentation after the PR is merged, if necessary. This will help us keep the docs up to date.

fdncred · 2023-02-03T12:32:51Z

Thanks for this PR. I'm very excited about making error messages better. I think this is a killer feature of nushell that we should continue to invest in making better.

One thing that helps drive home your changes is to have before and after examples in the PR. I just did "\z" to see what it was before and wow, that new message is so much more clear.

bobhy · 2023-02-03T15:10:56Z

De nada! I'd like to be able to do more (than the current nothing) on the decimal literal int, and, as mentioned above, on the literal int as first position in the expression. And I have an idea it'd like to experiment with to enable syntax errors in many of the other leaf node terms. But I won't have much time this coming week to polish them. If you think there's value in the PR as is, I'll submit it for review and let you merge it, then come back for phase 2 later on. I can update the docs and release notes for this PR now.

sholderbach · 2023-02-06T10:28:38Z

Thanks for tackling the parser beast and getting our error messages closer to our promises!

Great to see more helpful messages for the constraints on the literal/escape syntax.

One general concern I have that might not be touched by this PR is that we shouldn't necessarily try to turn Type errors into Syntax errors. I can see that creating supportive "did-you-mean" error messages when parsing the code anyways is often more straightforward when you can include a parsing of the unhappy path for enriched suggestions. But if we were to encode too many semantics of the type system at parse time, we could potentially have many places that define type semantics with a higher chance of breakage. (The comment around the literal parse order, seems to speak of past traumatic experiences there)

bobhy · 2023-02-06T15:13:03Z

Hmm, I agree with the concern! As I'm going through the current exercise, I am struggling to find unambiguously correct opportunities to add a helpful syntax error. Most times, the right thing is to let the token fall through the current entity recognizer (e.g float) and be handled by some other instead (bare string). The only way to allow float, for example, to give good syntax errors is to have a very fine-tuned pre-parse check. It's not good enough to verify that the token has a decimal point or an 'e' for scientific notation, you must verify it has at most 1 decimal point (so it's not a range or a semantic version `1.0.1'), and that it doesn't have any slashes (so it's not a relative path, like './foo/bar, or, peversely, './999/e20'). These pre-checks would be examples of highly fragile parser code that I think you're also concerned about. I hear that you don't currently have a definitive grammar for Nushell and that coming up with one would be hard. Maybe that means the language itself needs is a redefinition? I've heard some discussion of the tension between shell-like and traditional-programming-like languages, I'd push for the former direction: be at once syntactically simpler (e.g maybe all lexemes must be white-space-delimited (where range expression today is not) and more minimal-look-ahead-friendly? (I don't know the correct term for this, but I mean that you should be able to restrict the following tokens easily from what you've seen so far). e.g, know you're in a command invocation expression (e.g because first token is a keyword, internal call to defined function or external call to an executable you've already identified from enumerating the path), so when you later see a token with leading hyphen, you know it's a flag and not a number with a unary minus. Thus the number parser could have a simpler precheck, confident it would not be invoked in inappropriate context. I think I can complete the current PR by adding reliable syntax errors to int and maybe even float, but I definitely feel I'm climbing the steep side of the mountain and don't think the work naturally extends much further. regards, Bob

…

On Mon, Feb 6, 2023 at 5:28 AM Stefan Holderbach ***@***.***> wrote: Thanks for tackling the parser beast and getting our error messages closer to our promises! Great to see more helpful messages for the constraints on the literal/escape syntax. One general concern I have that might not be touched by this PR is that we shouldn't necessarily try to turn Type errors into Syntax errors. I can see that creating supportive "did-you-mean" error messages when parsing the code anyways is often more straightforward when you can include a parsing of the unhappy path for enriched suggestions. But if we were to encode too many semantics of the type system at parse time, we could potentially have many places that define type semantics with a higher chance of breakage. (The comment around the literal parse order, seems to speak of past traumatic experiences there) — Reply to this email directly, view it on GitHub <#7952 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAPBIZFKJE6OT3TPSTCW32DWWDG6BANCNFSM6AAAAAAUP3ZAPY> . You are receiving this because you authored the thread.Message ID: ***@***.***>

fdncred · 2023-02-09T18:42:13Z

@bobhy Do you think this PR is ready to land?

bobhy · 2023-02-09T23:00:46Z

I will wrap this up in the next few hours and confirm then. Sorry for the radio silence, family visiting!

fdncred · 2023-02-09T23:55:20Z

No worries at all. Just wanted to check in on ya. Take your time.

bobhy · 2023-02-10T00:02:30Z

Assuming CI checks pass, this PR is ready for review. It does add syntax errors for hex and octal literals, but does nothing for decimal or float literals. I could not shoehorn those into the parser code without getting way too clever.
I'll open a new issue to discuss potential parser changes, and kick it off with the kinds of errrors I'd like to be able to catch.

sholderbach

Thanks for getting our literals into a much better shape!

Preamble: You are probably now more of an expert on our parser than me. So take my comments with a grain of salt.

The only thing jumping out to me were your fixme comments, wondering i we are in a good enough place with them or if they are out of date.
Else I am maybe a bit pedantic about the naming of our error variants as they might be used a bit randomly in some places already (more a problem with ShellError)

Wondering if we should the unrelated Cargo.lock version bump as that might cause churn if we want to revert something.

crates/nu-parser/src/parser.rs

crates/nu-parser/src/errors.rs

sholderbach · 2023-02-10T12:46:42Z

crates/nu-parser/src/parser.rs

-                "filesize".into(),
-                "non-filesize unit".into(),
+            Some(ParseError::Expected(
+                "filesize with valid units".into(),


Do we have an error variant where we could display the valid units as a help text? Or will this ParseError::Expected always be hidden through a later parse?

::Expected will be hidden in ::Any shape, can't say about other contexts. I had to change the variant to make parse_filesize() play nice in the ::Any shape. If it issues any kind of terminating (and therefore user-visible) error, it'll mask what might be a valid bare string, like an executable name. For example:

〉7day 1wk -------------------------------------------------------------- 〉7da Error: nu::shell::external_command (link) × External command failed ╭─[entry #181:1:1] 1 │ 7da · ─┬─ · ╰── executable was not found ╰──── help: No such file or directory (os error 2)

but if the token were 7zip, you would not want parse_duration() issuing a terminating error. So we have to figure out how to stop calling many of the entity parsers on what might be the name of an executable.

crates/nu-parser/src/parser.rs

crates/nu-parser/tests/test_parser.rs

Cargo.toml

bobhy · 2023-02-10T15:06:44Z

Thanks for your review! Sorry about the version changes, n00b here didn't realize that would be recorded in the config. Shall I revert that? I do think "syntax error" is one broad class the parser should report. To the user, (I think) it means, you've probably got the right idea, just correct your spelling. Another broad class includes "type error", those mean Nushell doesn't work that way, correct your thinking. I'd like to add the link back to docs in the error, but didn't see existing examples where these point to something really specific to the error being reported. I'll work on this later today.

…

On Fri, Feb 10, 2023, 7:56 AM Stefan Holderbach ***@***.***> wrote: ***@***.**** commented on this pull request. Thanks for getting our literals into a much better shape! Preamble: You are probably now more of an expert on our parser than me. So take my comments with a grain of salt. The only thing jumping out to me were your fixme comments, wondering i we are in a good enough place with them or if they are out of date. Else I am maybe a bit pedantic about the naming of our error variants as they might be used a bit randomly in some places already (more a problem with ShellError) Wondering if we should the unrelated Cargo.lock version bump as that might cause churn if we want to revert something. ------------------------------ In crates/nu-parser/src/parser.rs <#7952 (comment)>: > @@ -1432,6 +1444,7 @@ pub fn parse_range( // and <range_operator> is ".." or "..<" // and one of the <from> or <to> bounds must be present (just '..' is not allowed since it // looks like parent directory) + //bugbug range cannot be [..] because that looks like parent directory Do we properly address this through parse ordering or through defining SyntaxShape::Filepath for cd etc. and doing type/signature directed parsing? ------------------------------ In crates/nu-parser/src/errors.rs <#7952 (comment)>: > + #[error("Invalid syntax")] // <detail in <entity>. + #[diagnostic()] + InvalidSyntax(String, String, #[label("{1} in {0}")] Span), Can we narrow the name or document where this variant should be used as a lot of parser errors are invalid syntax. Or could this variant also remove LabelledError as a kitchen sink catch-all? The order {1} in {0} feels maybe a bit confusing as something you always have to look up to get right. ⬇️ Suggested change - #[error("Invalid syntax")] // <detail in <entity>. - #[diagnostic()] - InvalidSyntax(String, String, #[label("{1} in {0}")] Span), + #[error("Invalid literal syntax")] // <detail in <entity>. + #[diagnostic()] + InvalidLiteralSyntax(String, String, #[label("{1} in {0} literal")] Span), ------------------------------ In crates/nu-parser/src/parser.rs <#7952 (comment)>: > match parse_filesize_bytes(bytes, span) { Some(expression) => (expression, None), None => ( garbage(span), - Some(ParseError::Mismatch( - "filesize".into(), - "non-filesize unit".into(), + Some(ParseError::Expected( + "filesize with valid units".into(), Do we have an error variant where we could display the valid units as a help text? Or will this ParseError::Expected always be hidden through a later parse? ------------------------------ In crates/nu-parser/src/parser.rs <#7952 (comment)>: > @@ -4557,11 +4576,34 @@ pub fn parse_value( SyntaxShape::Block, SyntaxShape::String, ]; + */ + let shapes = [ + SyntaxShape::Binary, + SyntaxShape::Filesize, + SyntaxShape::Duration, + SyntaxShape::Range, + SyntaxShape::DateTime, //FIXME requires 3 failed conversion attempts before failing What is the status on that? ------------------------------ In crates/nu-parser/tests/test_parser.rs <#7952 (comment)>: > + if let Some(err_pat) = expected_err { + if let Some(parse_err) = err { + let act_err = format!("{:?}", parse_err); + assert!( + act_err.contains(err_pat), + "{test_tag}: expected err to contain {err_pat}, but actual error was {act_err}" + ); + } else { + assert!( + err.is_some(), + "{test_tag}: expected err containing {err_pat}, but no error returned" + ); + } Thanks for making the effort to provide helpful messages on test failures! ------------------------------ In Cargo.toml <#7952 (comment)>: > @@ -143,6 +143,8 @@ debug = false [[bin]] name = "nu" path = "src/main.rs" +bench = false Probably unrelated to the core of the PR. What would be the effect of doing this? — Reply to this email directly, view it on GitHub <#7952 (review)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAPBIZDQXQWDBCAJG2BRSX3WWY3ILANCNFSM6AAAAAAUP3ZAPY> . You are receiving this because you were mentioned.Message ID: ***@***.***>

This disables automatic detection of `#[bench]` and other benchmarks within the crates. Our benchmarks should all live in `benches` This fixes a problem with criterion flags and should also reduce the build requirements for `cargo bench` a bit Taken from nushell#7952 See: https://bheisler.github.io/criterion.rs/book/faq.html#cargo-bench-gives-unrecognized-option-errors-for-valid-command-line-options

Avoid conflict with nushell#7952

codecov · 2023-02-12T21:39:13Z

Codecov Report

Merging #7952 (7cc3913) into main (208ffdc) will decrease coverage by 0.01%.
The diff coverage is 73.13%.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #7952      +/-   ##
==========================================
- Coverage   54.26%   54.25%   -0.01%     
==========================================
  Files         608      608              
  Lines       98967    98985      +18     
==========================================
+ Hits        53701    53708       +7     
- Misses      45266    45277      +11

Impacted Files	Coverage Δ
crates/nu-parser/src/errors.rs	`1.31% <0.00%> (-0.02%)`	⬇️
crates/nu-parser/src/parser.rs	`74.42% <74.24%> (-0.14%)`	⬇️
crates/nu-color-config/src/style_computer.rs	`80.76% <0.00%> (-0.55%)`	⬇️
crates/nu-protocol/src/value/mod.rs	`47.97% <0.00%> (+0.07%)`	⬆️

# Description This disables automatic detection of `#[bench]` and other benchmarks within the crates. Our benchmarks should all live in `benches` This fixes a problem with criterion flags and should also reduce the build requirements for `cargo bench` a bit Taken from #7952 See: https://bheisler.github.io/criterion.rs/book/faq.html#cargo-bench-gives-unrecognized-option-errors-for-valid-command-line-options # User-Facing Changes None

…unfortunately.

'0b' defers to filesize. Still can't give terminal syntax error for decimal, must fall through to float (e.g 1.4).

…ricted error variant name

sholderbach · 2023-02-12T23:32:53Z

AH you force pushed over my commit to avoid a merge conflict.

sholderbach

Let's give this a go!

* Release notes for `0.76` Please add your important new features and breaking changes to the release notes by committing to/opening a PR against the `release-notes-0.76` branch. Thank you! * Add breaking change for plugin signature (nushell#775) * add breaking change * Update blog/2023-02-21-nushell_0_76.md --------- Co-authored-by: Stefan Holderbach <sholderbach@users.noreply.github.com> * add some info on debugging commands * release notes for nushell#7952 (nushell#777) * release notes for nushell#7952 * Fix html tags that broke CI * more debug notes * Add `profile` note and screenshot (nushell#778) * add ast to debug commands section * add breaking change (nushell#790) * Remove example stuff Don't let the lorem ipsum loose * added more breaking changes notes * trim down error message documentation in blog post * Add description of some commands * Do some polishing. sequence multiplication * Screenshot help of a plugin * Add section on nu plugin * Add section on background work and full log * Executive summary * Details to "mul" --------- Co-authored-by: WindSoilder <WindSoilder@outlook.com> Co-authored-by: Darren Schroeder <343840+fdncred@users.noreply.github.com> Co-authored-by: Bob Hyman <bob.hyman@gmail.com> Co-authored-by: Jakub Žádník <kubouch@gmail.com> Co-authored-by: Reilly Wood <reilly.wood@icloud.com>

bobhy marked this pull request as draft February 3, 2023 06:01

bobhy marked this pull request as ready for review February 3, 2023 15:10

bobhy force-pushed the invalid_syntax_error branch from 1c0e721 to 6add73d Compare February 9, 2023 23:56

sholderbach reviewed Feb 10, 2023

View reviewed changes

bobhy mentioned this pull request Feb 11, 2023

release notes for #7952 nushell/nushell.github.io#777

Merged

sholderbach mentioned this pull request Feb 12, 2023

Disable auto-benchmark harness for crates #8057

Merged

sholderbach added a commit to bobhy/nushell that referenced this pull request Feb 12, 2023

Revert changes to Cargo.toml

227a353

Avoid conflict with nushell#7952

bobhy added 9 commits February 12, 2023 17:24

update dependencies, run benchmarks

93db6eb

define SyntaxError to describe detailed errors within entities

7d0b1cd

Interim -- fails tests from -p nu-protocol --test into_config

893d168

parse_int should reject b'0b' as Expected int, not syntax error.

c513db9

Add syntax error for 0x..., 0b..., 0o... ints, but not decimal ints, …

2f47a82

…unfortunately.

'0x' and '0o' are terminal syntax errors;

488314c

'0b' defers to filesize. Still can't give terminal syntax error for decimal, must fall through to float (e.g 1.4).

Shorten InvalidSyntax error text

866da75

Several more parse tests for int and number

4a68ccf

Per review feedback: Revert random dependency and cargo changes; rest…

854a62a

…ricted error variant name

bobhy force-pushed the invalid_syntax_error branch from 227a353 to 854a62a Compare February 12, 2023 22:29

Merge branch 'main' into invalid_syntax_error

7cc3913

sholderbach approved these changes Feb 13, 2023

View reviewed changes

sholderbach added this pull request to the merge queue Feb 13, 2023

Merged via the queue into nushell:main with commit 007916c Feb 13, 2023

bobhy deleted the invalid_syntax_error branch April 7, 2023 12:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Syntax errors for string and int #7952

Syntax errors for string and int #7952

bobhy commented Feb 3, 2023

fdncred commented Feb 3, 2023

bobhy commented Feb 3, 2023

sholderbach commented Feb 6, 2023

bobhy commented Feb 6, 2023 via email

fdncred commented Feb 9, 2023

bobhy commented Feb 9, 2023

fdncred commented Feb 9, 2023

bobhy commented Feb 10, 2023

sholderbach left a comment

sholderbach Feb 10, 2023

bobhy Feb 10, 2023

bobhy commented Feb 10, 2023 via email

codecov bot commented Feb 12, 2023 •

edited

sholderbach commented Feb 12, 2023

sholderbach left a comment

Syntax errors for string and int #7952

Syntax errors for string and int #7952

Conversation

bobhy commented Feb 3, 2023

Description

User-Facing Changes

Tests + Formatting

After Submitting

fdncred commented Feb 3, 2023

bobhy commented Feb 3, 2023

sholderbach commented Feb 6, 2023

bobhy commented Feb 6, 2023 via email

fdncred commented Feb 9, 2023

bobhy commented Feb 9, 2023

fdncred commented Feb 9, 2023

bobhy commented Feb 10, 2023

sholderbach left a comment

Choose a reason for hiding this comment

sholderbach Feb 10, 2023

Choose a reason for hiding this comment

bobhy Feb 10, 2023

Choose a reason for hiding this comment

bobhy commented Feb 10, 2023 via email

codecov bot commented Feb 12, 2023 • edited

Codecov Report

sholderbach commented Feb 12, 2023

sholderbach left a comment

Choose a reason for hiding this comment

codecov bot commented Feb 12, 2023 •

edited