refactor(ra_syntax.validation): removed code duplication from validate_literal() #2834

Veetaha · 2020-01-14T02:28:53Z

Hi! This is my first ever contribution to this project.
I've taken some dirty job from issue #223

This is a simple atomic PR to remove code duplication according to FIXME comment in the function that is the main focus of the further development.

I just didn't want to mix refactoring with the implementation of new features...

I am not sure whether you prefer such atomic PRs here or you'd rather have a single PR that contains all atomic commits inside of it?

So if you want me to add all that validation in one PR I'll mark this one as WIP and update it when the work is finished, otherwise, I'll go with the option of creating separate PRs per each feature of validation of strings, numbers, and comments respectively.

Comments about refactoring

Yeah, reducing the duplication is quite hard here, extracting into stateless functions could be another option but the number of their arguments would be very big and repeated across char and string implementations so that just writing their types and names would become cumbersome.
I tried the option of having everything captured implicitly in the closure but failed since rust doesn't have templated (or generic) closures as C++ does, this is needed because unescape_byte*() and unescape_char|str() have different return types...
Maybe I am missing something here? I may be wrong because I am not enough experienced in Rust...
Well, I am awaiting any kind of feedback!

…e_literal() function

kiljacken

Thanks for the PR!

While already a nice cleanup, I left a few comments that could improve it, especially with regards to future use :)

kiljacken · 2020-01-14T08:08:29Z

crates/ra_syntax/src/validation.rs

+    fn unquote(text: &str, prefix_len: usize, end_delimiter: char) -> Option<&str> {
+        text.rfind(end_delimiter).and_then(|end| text.get(prefix_len..end))
+    }


It might be nice to pull this one out of the function, so it's ready for re-use when we get to #540 at some point :)

@matklad, @kiljacken I have one concern about this function. Why is it so that we do a linear search for the quote character backwards in the token text string but make a constant offset from its beginning instead?
I guess we should either do a constant offset from the begging and the end of the string or search for a quote character from both ends of the string but not the mixed approach...

I'd have to check the lexer source to be sure, bit using a constant offset, for everything that's not a raw string (due to the hash matching) should be good

@kiljacken I do think that constant offset from both ends should be viable too.
Though, I'd like to write tests for this.
Yes, raw strings should have a separate unquote_raw_string() logic.

I've checked the supposition that the offset from the end of the string is constant.
If fact it is not (maybe it is a bug, I am not sure).
But there is one test that fails when I put a debug assertion to check the precondition about that, which is this one:
https://github.com/rust-analyzer/rust-analyzer/blob/master/crates/ra_assists/src/assists/raw_string.rs#L245

That partial string actually becomes a STRING token where there is only one starting qoute and no ending one

@matklad maybe you could clarify on whether this behaviour is intended?

@Veetaha it's true that string literals are not guaranteed to be syntactically valid. The code should not panic for any string, but it doesn't have to report errors if the string itself is not a valid string literal.

This is because we intentionally make the parser and the lexer robust and capable of parsing completely invalid code, so something like r##" might be parsed as a raw string literal.

kiljacken · 2020-01-14T08:15:49Z

crates/ra_syntax/src/validation.rs

+    let mut push_err = |prefix_len, (off, err): (usize, unescape::EscapeError)| {
+        let off = token.text_range().start() + TextUnit::from_usize(off + prefix_len);
+        acc.push(SyntaxError::new(err.into(), off));
+    };


Not really a big fan of this, but see my comments below for a possible solution.

kiljacken · 2020-01-14T08:25:17Z

crates/ra_syntax/src/validation.rs

+            if let Some(Err(e)) = unquote(text, 2, '\'').map(unescape::unescape_byte) {
+                push_err(2, e);
            }


This is quite good already, but would it make sense to pull it out into a seperate function:

fn validate_char(token: &SyntaxToken, prefix_len: usize, end_delimiter: char, acc: &mut Vec<SyntaxError>) -> { let text = token.text().as_str(); if let Some(Err(e)) = unquote(text, 2, '\'').map(unescape::unescape_byte) { let off = token.text_range().start() + TextUnit::from_usize(off + prefix_len); acc.push(SyntaxError::new(err.into(), off)); } }

and then the match case just becomes:

BYTE => validate_char(&token, 2, '\'', acc),

This would avoid the the push_err lambda, and it would play nicely with #540.

Yeah, only one addition, unescape::unescape_byte() should also be forwarded as a callback function to validate_char() this way the function will become generic and will have 5 parameters. If you are okay with that, tho?

Right, hadn't noticed they used different functions, but that makes sense :) This might require a type parameter for the function, are you comfortable with that or do you need a hint?

Yes, I am comfortable with that, just wanted our approval)

kiljacken · 2020-01-14T08:26:06Z

crates/ra_syntax/src/validation.rs

+            if let Some(without_quotes) = unquote(text, 2, '"') {
+                unescape::unescape_byte_str(without_quotes, &mut |range, char| {
+                    if let Err(err) = char {
+                        push_err(2, (range.start, err));
+                    }
+                })


We could do the same here as described for char, e.g. pull into a separate function. I think that would be nice :)

I've stumbled upon a language limitation here.
Since here we would have a second level higher order function
something like

fn validate_byte_or_char_str_literal(..., impl FnOnce(&str, impl FnMut(...))) {}

Notice the nested impl FnMut(...)

It would be quite a hack to do it with a dynamic dispatch here.
unescape::unescape_str(&str, impl FnMut(Range<usize>, Result<u8, EscapeError>))
This function expects the second argument (callback) to be Sized.
This means we would end up with the following hack:

fn validate_str_or_byte_str_literal<T>( token: &SyntaxToken, prefix_len: usize, suffix_len: usize, acc: &mut Vec<SyntaxError>, unescape_fn: impl FnOnce(&str, &mut dyn FnMut(Range<usize>, Result<T, unescape::EscapeError>)), ) { let text = token.text().as_str(); if let Some(without_quotes) = unquote(text, prefix_len, suffix_len) { unescape_fn(without_quotes, &mut |range, char| { if let Err(err) = char { let off = token.text_range().start() + TextUnit::from_usize(range.start + prefix_len); acc.push(SyntaxError::new(err.into(), off)); } }); } } validate_str_or_byte_str_literal( &token, 2, 1, acc, // we need to explicitly wrap this into two lambdas because cb is `&mut dyn FnMut(...)` |s, cb| unescape::unescape_byte_str(s, &mut |range, byte| cb(range, byte)), );

I've been unable make the above function reasonable, e.g. less lines of code than what it replaces, so lets just keep it at de-duplicating char and byte for now.

Same opinion here 😏

matklad · 2020-01-14T17:49:00Z

bors r=@kiljacken

Thanks @Veetaha and @kiljacken !

2834: refactor(ra_syntax.validation): removed code duplication from validate_literal() r=kiljacken a=Veetaha Hi! This is my first ever contribution to this project. I've taken some dirty job from issue #223 This is a simple atomic PR to remove code duplication according to FIXME comment in the function that is the main focus of the further development. I just didn't want to mix refactoring with the implementation of new features... I am not sure whether you prefer such atomic PRs here or you'd rather have a single PR that contains all atomic commits inside of it? So if you want me to add all that validation in one PR I'll mark this one as WIP and update it when the work is finished, otherwise, I'll go with the option of creating separate PRs per each feature of validation of strings, numbers, and comments respectively. ### Comments about refactoring Yeah, reducing the duplication is quite hard here, extracting into stateless functions could be another option but the number of their arguments would be very big and repeated across char and string implementations so that just writing their types and names would become cumbersome. I tried the option of having everything captured implicitly in the closure but failed since rust doesn't have templated (or generic) closures as C++ does, this is needed because `unescape_byte*()` and `unescape_char|str()` have different return types... Maybe I am missing something here? I may be wrong because I am not enough experienced in Rust... Well, I am awaiting any kind of feedback! Co-authored-by: Veetaha <gerzoh1@gmail.com>

bors · 2020-01-14T17:59:19Z

Build succeeded

Rust (macos-latest)
Rust (ubuntu-latest)
TypeScript

kiljacken · 2020-01-14T18:33:30Z

@matklad Looks like bors was happy to see you!

@Veetaha if you'd create a new PR with the remaining changes, that would be nice :)

Veetaha · 2020-01-14T21:08:14Z

@kiljacken, I was taking some time to study this project codebase and architecture. I didn't expect that this will be merged so soon, though I will create a new PR with amendments according to the review soon as you suggested

Veetaha · 2020-01-14T21:09:52Z

I guess I needed to mark this one as [WIP], didn't I?

kiljacken · 2020-01-14T21:30:14Z

No need to apologize, I just think bors mis-parsed Aleksey's command, so it got merged instead of allowing me to merge it (which wasn't really necessary as he already gave me r+ rights haha)

refactor(ra_syntax.validation): removed code duplication from validat…

60251da

…e_literal() function

kiljacken suggested changes Jan 14, 2020

View reviewed changes

bors bot merged commit 60251da into rust-lang:master Jan 14, 2020

lf- mentioned this pull request May 12, 2021

Add basic support for array lengths in types #8799

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(ra_syntax.validation): removed code duplication from validate_literal() #2834

refactor(ra_syntax.validation): removed code duplication from validate_literal() #2834

Veetaha commented Jan 14, 2020

kiljacken left a comment

kiljacken Jan 14, 2020

Veetaha Jan 14, 2020 •

edited

kiljacken Jan 14, 2020

Veetaha Jan 14, 2020

Veetaha Jan 15, 2020

Veetaha Jan 15, 2020

matklad Jan 15, 2020

kiljacken Jan 14, 2020

kiljacken Jan 14, 2020 •

edited

Veetaha Jan 14, 2020

kiljacken Jan 14, 2020

Veetaha Jan 14, 2020

kiljacken Jan 14, 2020

Veetaha Jan 14, 2020 •

edited

Veetaha Jan 15, 2020 •

edited

kiljacken Jan 15, 2020

Veetaha Jan 15, 2020 •

edited

matklad commented Jan 14, 2020

bors bot commented Jan 14, 2020

kiljacken commented Jan 14, 2020

Veetaha commented Jan 14, 2020 •

edited

Veetaha commented Jan 14, 2020 •

edited

kiljacken commented Jan 14, 2020

refactor(ra_syntax.validation): removed code duplication from validate_literal() #2834

refactor(ra_syntax.validation): removed code duplication from validate_literal() #2834

Conversation

Veetaha commented Jan 14, 2020

Comments about refactoring

kiljacken left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Veetaha Jan 14, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kiljacken Jan 14, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Veetaha Jan 14, 2020 • edited

Choose a reason for hiding this comment

Veetaha Jan 15, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Veetaha Jan 15, 2020 • edited

Choose a reason for hiding this comment

matklad commented Jan 14, 2020

bors bot commented Jan 14, 2020

Build succeeded

kiljacken commented Jan 14, 2020

Veetaha commented Jan 14, 2020 • edited

Veetaha commented Jan 14, 2020 • edited

kiljacken commented Jan 14, 2020

Veetaha Jan 14, 2020 •

edited

kiljacken Jan 14, 2020 •

edited

Veetaha Jan 14, 2020 •

edited

Veetaha Jan 15, 2020 •

edited

Veetaha Jan 15, 2020 •

edited

Veetaha commented Jan 14, 2020 •

edited

Veetaha commented Jan 14, 2020 •

edited