New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Syntax errors for string and int #7952
Conversation
Thanks for this PR. I'm very excited about making error messages better. I think this is a killer feature of nushell that we should continue to invest in making better. One thing that helps drive home your changes is to have before and after examples in the PR. I just did |
De nada! I'd like to be able to do more (than the current nothing) on the decimal literal int, and, as mentioned above, on the literal int as first position in the expression. And I have an idea it'd like to experiment with to enable syntax errors in many of the other leaf node terms. But I won't have much time this coming week to polish them. If you think there's value in the PR as is, I'll submit it for review and let you merge it, then come back for phase 2 later on. I can update the docs and release notes for this PR now. |
Thanks for tackling the parser beast and getting our error messages closer to our promises! Great to see more helpful messages for the constraints on the literal/escape syntax. One general concern I have that might not be touched by this PR is that we shouldn't necessarily try to turn Type errors into Syntax errors. I can see that creating supportive "did-you-mean" error messages when parsing the code anyways is often more straightforward when you can include a parsing of the unhappy path for enriched suggestions. But if we were to encode too many semantics of the type system at parse time, we could potentially have many places that define type semantics with a higher chance of breakage. (The comment around the literal parse order, seems to speak of past traumatic experiences there) |
Hmm, I agree with the concern! As I'm going through the current exercise,
I am struggling to find unambiguously correct opportunities to add a
helpful syntax error. Most times, the right thing is to let the token fall
through the current entity recognizer (e.g float) and be handled by some
other instead (bare string). The only way to allow float, for example, to
give good syntax errors is to have a very fine-tuned pre-parse check. It's
not good enough to verify that the token has a decimal point or an 'e' for
scientific notation, you must verify it has at most 1 decimal point (so
it's not a range or a semantic version `1.0.1'), and that it doesn't have
any slashes (so it's not a relative path, like './foo/bar, or, peversely,
'./999/e20'). These pre-checks would be examples of highly fragile parser
code that I think you're also concerned about.
I hear that you don't currently have a definitive grammar for Nushell and
that coming up with one would be hard. Maybe that means the language
itself needs is a redefinition? I've heard some discussion of the tension
between shell-like and traditional-programming-like languages, I'd push for
the former direction: be at once syntactically simpler (e.g maybe all
lexemes must be white-space-delimited (where range expression today is not)
and more minimal-look-ahead-friendly? (I don't know the correct term for
this, but I mean that you should be able to restrict the following tokens
easily from what you've seen so far). e.g, know you're in a command
invocation expression (e.g because first token is a keyword, internal call
to defined function or external call to an executable you've already
identified from enumerating the path), so when you later see a token with
leading hyphen, you know it's a flag and not a number with a unary minus.
Thus the number parser could have a simpler precheck, confident it would
not be invoked in inappropriate context.
I think I can complete the current PR by adding reliable syntax errors to
int and maybe even float, but I definitely feel I'm climbing the steep side
of the mountain and don't think the work naturally extends much further.
regards,
Bob
…On Mon, Feb 6, 2023 at 5:28 AM Stefan Holderbach ***@***.***> wrote:
Thanks for tackling the parser beast and getting our error messages closer
to our promises!
Great to see more helpful messages for the constraints on the
literal/escape syntax.
One general concern I have that might not be touched by this PR is that we
shouldn't necessarily try to turn Type errors into Syntax errors. I can see
that creating supportive "did-you-mean" error messages when parsing the
code anyways is often more straightforward when you can include a parsing
of the unhappy path for enriched suggestions. But if we were to encode too
many semantics of the type system at parse time, we could potentially have
many places that define type semantics with a higher chance of breakage.
(The comment around the literal parse order, seems to speak of past
traumatic experiences there)
—
Reply to this email directly, view it on GitHub
<#7952 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAPBIZFKJE6OT3TPSTCW32DWWDG6BANCNFSM6AAAAAAUP3ZAPY>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
@bobhy Do you think this PR is ready to land? |
I will wrap this up in the next few hours and confirm then. Sorry for the radio silence, family visiting! |
No worries at all. Just wanted to check in on ya. Take your time. |
1c0e721
to
6add73d
Compare
Assuming CI checks pass, this PR is ready for review. It does add syntax errors for hex and octal literals, but does nothing for decimal or float literals. I could not shoehorn those into the parser code without getting way too clever. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for getting our literals into a much better shape!
Preamble: You are probably now more of an expert on our parser than me. So take my comments with a grain of salt.
The only thing jumping out to me were your fixme comments, wondering i we are in a good enough place with them or if they are out of date.
Else I am maybe a bit pedantic about the naming of our error variants as they might be used a bit randomly in some places already (more a problem with ShellError
)
Wondering if we should the unrelated Cargo.lock
version bump as that might cause churn if we want to revert something.
"filesize".into(), | ||
"non-filesize unit".into(), | ||
Some(ParseError::Expected( | ||
"filesize with valid units".into(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have an error variant where we could display the valid units as a help text? Or will this ParseError::Expected
always be hidden through a later parse?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
::Expected will be hidden in ::Any shape, can't say about other contexts. I had to change the variant to make parse_filesize() play nice in the ::Any shape. If it issues any kind of terminating (and therefore user-visible) error, it'll mask what might be a valid bare string, like an executable name. For example:
〉7day
1wk
--------------------------------------------------------------
〉7da
Error: nu::shell::external_command (link)
× External command failed
╭─[entry #181:1:1]
1 │ 7da
· ─┬─
· ╰── executable was not found
╰────
help: No such file or directory (os error 2)
but if the token were 7zip
, you would not want parse_duration()
issuing a terminating error. So we have to figure out how to stop calling many of the entity parsers on what might be the name of an executable.
Thanks for your review! Sorry about the version changes, n00b here didn't
realize that would be recorded in the config. Shall I revert that?
I do think "syntax error" is one broad class the parser should report. To
the user, (I think) it means, you've probably got the right idea, just
correct your spelling. Another broad class includes "type error", those
mean Nushell doesn't work that way, correct your thinking.
I'd like to add the link back to docs in the error, but didn't see existing
examples where these point to something really specific to the error being
reported.
I'll work on this later today.
…On Fri, Feb 10, 2023, 7:56 AM Stefan Holderbach ***@***.***> wrote:
***@***.**** commented on this pull request.
Thanks for getting our literals into a much better shape!
Preamble: You are probably now more of an expert on our parser than me. So
take my comments with a grain of salt.
The only thing jumping out to me were your fixme comments, wondering i we
are in a good enough place with them or if they are out of date.
Else I am maybe a bit pedantic about the naming of our error variants as
they might be used a bit randomly in some places already (more a problem
with ShellError)
Wondering if we should the unrelated Cargo.lock version bump as that
might cause churn if we want to revert something.
------------------------------
In crates/nu-parser/src/parser.rs
<#7952 (comment)>:
> @@ -1432,6 +1444,7 @@ pub fn parse_range(
// and <range_operator> is ".." or "..<"
// and one of the <from> or <to> bounds must be present (just '..' is not allowed since it
// looks like parent directory)
+ //bugbug range cannot be [..] because that looks like parent directory
Do we properly address this through parse ordering or through defining
SyntaxShape::Filepath for cd etc. and doing type/signature directed
parsing?
------------------------------
In crates/nu-parser/src/errors.rs
<#7952 (comment)>:
> + #[error("Invalid syntax")] // <detail in <entity>.
+ #[diagnostic()]
+ InvalidSyntax(String, String, #[label("{1} in {0}")] Span),
Can we narrow the name or document where this variant should be used as a
lot of parser errors are invalid syntax. Or could this variant also remove
LabelledError as a kitchen sink catch-all?
The order {1} in {0} feels maybe a bit confusing as something you always
have to look up to get right.
⬇️ Suggested change
- #[error("Invalid syntax")] // <detail in <entity>.
- #[diagnostic()]
- InvalidSyntax(String, String, #[label("{1} in {0}")] Span),
+ #[error("Invalid literal syntax")] // <detail in <entity>.
+ #[diagnostic()]
+ InvalidLiteralSyntax(String, String, #[label("{1} in {0} literal")] Span),
------------------------------
In crates/nu-parser/src/parser.rs
<#7952 (comment)>:
> match parse_filesize_bytes(bytes, span) {
Some(expression) => (expression, None),
None => (
garbage(span),
- Some(ParseError::Mismatch(
- "filesize".into(),
- "non-filesize unit".into(),
+ Some(ParseError::Expected(
+ "filesize with valid units".into(),
Do we have an error variant where we could display the valid units as a
help text? Or will this ParseError::Expected always be hidden through a
later parse?
------------------------------
In crates/nu-parser/src/parser.rs
<#7952 (comment)>:
> @@ -4557,11 +4576,34 @@ pub fn parse_value(
SyntaxShape::Block,
SyntaxShape::String,
];
+ */
+ let shapes = [
+ SyntaxShape::Binary,
+ SyntaxShape::Filesize,
+ SyntaxShape::Duration,
+ SyntaxShape::Range,
+ SyntaxShape::DateTime, //FIXME requires 3 failed conversion attempts before failing
What is the status on that?
------------------------------
In crates/nu-parser/tests/test_parser.rs
<#7952 (comment)>:
> + if let Some(err_pat) = expected_err {
+ if let Some(parse_err) = err {
+ let act_err = format!("{:?}", parse_err);
+ assert!(
+ act_err.contains(err_pat),
+ "{test_tag}: expected err to contain {err_pat}, but actual error was {act_err}"
+ );
+ } else {
+ assert!(
+ err.is_some(),
+ "{test_tag}: expected err containing {err_pat}, but no error returned"
+ );
+ }
Thanks for making the effort to provide helpful messages on test failures!
------------------------------
In Cargo.toml
<#7952 (comment)>:
> @@ -143,6 +143,8 @@ debug = false
[[bin]]
name = "nu"
path = "src/main.rs"
+bench = false
Probably unrelated to the core of the PR. What would be the effect of
doing this?
—
Reply to this email directly, view it on GitHub
<#7952 (review)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAPBIZDQXQWDBCAJG2BRSX3WWY3ILANCNFSM6AAAAAAUP3ZAPY>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
This disables automatic detection of `#[bench]` and other benchmarks within the crates. Our benchmarks should all live in `benches` This fixes a problem with criterion flags and should also reduce the build requirements for `cargo bench` a bit Taken from nushell#7952 See: https://bheisler.github.io/criterion.rs/book/faq.html#cargo-bench-gives-unrecognized-option-errors-for-valid-command-line-options
Avoid conflict with nushell#7952
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## main #7952 +/- ##
==========================================
- Coverage 54.26% 54.25% -0.01%
==========================================
Files 608 608
Lines 98967 98985 +18
==========================================
+ Hits 53701 53708 +7
- Misses 45266 45277 +11
|
# Description This disables automatic detection of `#[bench]` and other benchmarks within the crates. Our benchmarks should all live in `benches` This fixes a problem with criterion flags and should also reduce the build requirements for `cargo bench` a bit Taken from #7952 See: https://bheisler.github.io/criterion.rs/book/faq.html#cargo-bench-gives-unrecognized-option-errors-for-valid-command-line-options # User-Facing Changes None
'0b' defers to filesize. Still can't give terminal syntax error for decimal, must fall through to float (e.g 1.4).
…ricted error variant name
227a353
to
854a62a
Compare
AH you force pushed over my commit to avoid a merge conflict. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's give this a go!
* Release notes for `0.76` Please add your important new features and breaking changes to the release notes by committing to/opening a PR against the `release-notes-0.76` branch. Thank you! * Add breaking change for plugin signature (nushell#775) * add breaking change * Update blog/2023-02-21-nushell_0_76.md --------- Co-authored-by: Stefan Holderbach <sholderbach@users.noreply.github.com> * add some info on debugging commands * release notes for nushell#7952 (nushell#777) * release notes for nushell#7952 * Fix html tags that broke CI * more debug notes * Add `profile` note and screenshot (nushell#778) * add ast to debug commands section * add breaking change (nushell#790) * Remove example stuff Don't let the lorem ipsum loose * added more breaking changes notes * trim down error message documentation in blog post * Add description of some commands * Do some polishing. sequence multiplication * Screenshot help of a plugin * Add section on nu plugin * Add section on background work and full log * Executive summary * Details to "mul" --------- Co-authored-by: WindSoilder <WindSoilder@outlook.com> Co-authored-by: Darren Schroeder <343840+fdncred@users.noreply.github.com> Co-authored-by: Bob Hyman <bob.hyman@gmail.com> Co-authored-by: Jakub Žádník <kubouch@gmail.com> Co-authored-by: Reilly Wood <reilly.wood@icloud.com>
Description
Added a few syntax errors in ints and strings, changed parser to stop and show that error rather than continue trying to parse those tokens as some other shape. However, I don't see how to push this direction much further, and most of the classic confusing errors can't be changed.
Flagged as WIP for the moment, but passes all checks and works better than current release:
(Description of your pull request goes here. Provide examples and/or screenshots if your changes affect the user experience.)
Canonic presentation of a syntax error.
Malformed unicode escape in string, flagged as error.
String parse can be opinionated, it's the last shape tried.
A correct number in first token would be evaluated, but an incorrect one is treated as external command? Confusing to users.
Can give syntax error if token is unambiguously int literal. e.g has 0b or 0x prefix, could not be a float.
But decimal literal (no prefix) can't be too strict. Parser is going to try float later. So '1.4' must be passed.
User-Facing Changes
First and foremost, more specific error messages for typos in string and int literals. Probably improves interactive user experience.
But a script that was causing and then checking for specific error might notice a different error message.
(List of all changes that impact the user experience here. This helps us keep track of breaking changes.)
Tests + Formatting
Added (positive and negative unit tests in
cargo test -p nu-parser
. Didn't add integration tests.Make sure you've run and fixed any issues with these commands:
cargo fmt --all -- --check
to check standard code formatting (cargo fmt --all
applies these changes)cargo clippy --workspace -- -D warnings -D clippy::unwrap_used -A clippy::needless_collect
to check that you're using the standard code stylecargo test --workspace
to check that all tests passAfter Submitting
If your PR had any user-facing changes, update the documentation after the PR is merged, if necessary. This will help us keep the docs up to date.