Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(ra_syntax.validation): removed code duplication from validate_literal() #2834

Merged
merged 1 commit into from Jan 14, 2020

Conversation

Veetaha
Copy link
Contributor

@Veetaha Veetaha commented Jan 14, 2020

Hi! This is my first ever contribution to this project.
I've taken some dirty job from issue #223

This is a simple atomic PR to remove code duplication according to FIXME comment in the function that is the main focus of the further development.

I just didn't want to mix refactoring with the implementation of new features...

I am not sure whether you prefer such atomic PRs here or you'd rather have a single PR that contains all atomic commits inside of it?

So if you want me to add all that validation in one PR I'll mark this one as WIP and update it when the work is finished, otherwise, I'll go with the option of creating separate PRs per each feature of validation of strings, numbers, and comments respectively.

Comments about refactoring

Yeah, reducing the duplication is quite hard here, extracting into stateless functions could be another option but the number of their arguments would be very big and repeated across char and string implementations so that just writing their types and names would become cumbersome.
I tried the option of having everything captured implicitly in the closure but failed since rust doesn't have templated (or generic) closures as C++ does, this is needed because unescape_byte*() and unescape_char|str() have different return types...
Maybe I am missing something here? I may be wrong because I am not enough experienced in Rust...
Well, I am awaiting any kind of feedback!

Copy link
Contributor

@kiljacken kiljacken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

While already a nice cleanup, I left a few comments that could improve it, especially with regards to future use :)

Comment on lines +115 to +117
fn unquote(text: &str, prefix_len: usize, end_delimiter: char) -> Option<&str> {
text.rfind(end_delimiter).and_then(|end| text.get(prefix_len..end))
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be nice to pull this one out of the function, so it's ready for re-use when we get to #540 at some point :)

Copy link
Contributor Author

@Veetaha Veetaha Jan 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@matklad, @kiljacken I have one concern about this function. Why is it so that we do a linear search for the quote character backwards in the token text string but make a constant offset from its beginning instead?
I guess we should either do a constant offset from the begging and the end of the string or search for a quote character from both ends of the string but not the mixed approach...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd have to check the lexer source to be sure, bit using a constant offset, for everything that's not a raw string (due to the hash matching) should be good

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kiljacken I do think that constant offset from both ends should be viable too.
Though, I'd like to write tests for this.
Yes, raw strings should have a separate unquote_raw_string() logic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've checked the supposition that the offset from the end of the string is constant.
If fact it is not (maybe it is a bug, I am not sure).
But there is one test that fails when I put a debug assertion to check the precondition about that, which is this one:
https://github.com/rust-analyzer/rust-analyzer/blob/master/crates/ra_assists/src/assists/raw_string.rs#L245

That partial string actually becomes a STRING token where there is only one starting qoute and no ending one

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@matklad maybe you could clarify on whether this behaviour is intended?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Veetaha it's true that string literals are not guaranteed to be syntactically valid. The code should not panic for any string, but it doesn't have to report errors if the string itself is not a valid string literal.

This is because we intentionally make the parser and the lexer robust and capable of parsing completely invalid code, so something like r##" might be parsed as a raw string literal.

Comment on lines +122 to +125
let mut push_err = |prefix_len, (off, err): (usize, unescape::EscapeError)| {
let off = token.text_range().start() + TextUnit::from_usize(off + prefix_len);
acc.push(SyntaxError::new(err.into(), off));
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really a big fan of this, but see my comments below for a possible solution.

Comment on lines +129 to 131
if let Some(Err(e)) = unquote(text, 2, '\'').map(unescape::unescape_byte) {
push_err(2, e);
}
Copy link
Contributor

@kiljacken kiljacken Jan 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is quite good already, but would it make sense to pull it out into a seperate function:

fn validate_char(token: &SyntaxToken, prefix_len: usize, end_delimiter: char, acc: &mut Vec<SyntaxError>) -> {
    let text = token.text().as_str();
    if let Some(Err(e)) = unquote(text, 2, '\'').map(unescape::unescape_byte) {
        let off = token.text_range().start() + TextUnit::from_usize(off + prefix_len);
        acc.push(SyntaxError::new(err.into(), off));
    }
}

and then the match case just becomes:

BYTE => validate_char(&token, 2, '\'', acc),

This would avoid the the push_err lambda, and it would play nicely with #540.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, only one addition, unescape::unescape_byte() should also be forwarded as a callback function to validate_char() this way the function will become generic and will have 5 parameters. If you are okay with that, tho?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, hadn't noticed they used different functions, but that makes sense :) This might require a type parameter for the function, are you comfortable with that or do you need a hint?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I am comfortable with that, just wanted our approval)

Comment on lines +139 to +144
if let Some(without_quotes) = unquote(text, 2, '"') {
unescape::unescape_byte_str(without_quotes, &mut |range, char| {
if let Err(err) = char {
push_err(2, (range.start, err));
}
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could do the same here as described for char, e.g. pull into a separate function. I think that would be nice :)

Copy link
Contributor Author

@Veetaha Veetaha Jan 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've stumbled upon a language limitation here.
Since here we would have a second level higher order function
something like

fn validate_byte_or_char_str_literal(..., impl FnOnce(&str, impl FnMut(...))) {}

Notice the nested impl FnMut(...)

Copy link
Contributor Author

@Veetaha Veetaha Jan 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be quite a hack to do it with a dynamic dispatch here.
unescape::unescape_str(&str, impl FnMut(Range<usize>, Result<u8, EscapeError>))
This function expects the second argument (callback) to be Sized.
This means we would end up with the following hack:

fn validate_str_or_byte_str_literal<T>(
    token: &SyntaxToken,
    prefix_len: usize,
    suffix_len: usize,
    acc: &mut Vec<SyntaxError>,
    unescape_fn: impl FnOnce(&str, &mut dyn FnMut(Range<usize>, Result<T, unescape::EscapeError>)),
) {
    let text = token.text().as_str();
    if let Some(without_quotes) = unquote(text, prefix_len, suffix_len) {
        unescape_fn(without_quotes, &mut |range, char| {
            if let Err(err) = char {
                let off = token.text_range().start() + TextUnit::from_usize(range.start + prefix_len);
                acc.push(SyntaxError::new(err.into(), off));
            }
        });
    }
}


validate_str_or_byte_str_literal(
    &token,
    2,
    1,
    acc,
    // we need to explicitly wrap this into two lambdas because cb is `&mut dyn FnMut(...)`
    |s, cb| unescape::unescape_byte_str(s, &mut |range, byte| cb(range, byte)),
);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been unable make the above function reasonable, e.g. less lines of code than what it replaces, so lets just keep it at de-duplicating char and byte for now.

Copy link
Contributor Author

@Veetaha Veetaha Jan 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same opinion here 😏

@matklad
Copy link
Member

matklad commented Jan 14, 2020

bors r=@kiljacken

Thanks @Veetaha and @kiljacken !

bors bot added a commit that referenced this pull request Jan 14, 2020
2834: refactor(ra_syntax.validation): removed code duplication from validate_literal() r=kiljacken a=Veetaha

Hi! This is my first ever contribution to this project.
I've taken some dirty job from issue #223

This is a simple atomic PR to remove code duplication according to FIXME comment in the function that is the main focus of the further development.

I just didn't want to mix refactoring with the implementation of new features...

I am not sure whether you prefer such atomic PRs here or you'd rather have a single PR that contains all atomic commits inside of it?

So if you want me to add all that validation in one PR I'll mark this one as WIP and update it when the work is finished, otherwise, I'll go with the option of creating separate PRs per each feature of validation of strings, numbers, and comments respectively.

### Comments about refactoring
Yeah, reducing the duplication is quite hard here, extracting into stateless functions could be another option but the number of their arguments would be very big and repeated across char and string implementations so that just writing their types and names would become cumbersome.
I tried the option of having everything captured implicitly in the closure but failed since rust doesn't have templated (or generic) closures as C++ does, this is needed because `unescape_byte*()` and `unescape_char|str()` have different return types...
Maybe I am missing something here? I may be wrong because I am not enough experienced in Rust...
Well, I am awaiting any kind of feedback!

Co-authored-by: Veetaha <gerzoh1@gmail.com>
@bors
Copy link
Contributor

bors bot commented Jan 14, 2020

Build succeeded

  • Rust (macos-latest)
  • Rust (ubuntu-latest)
  • TypeScript

@bors bors bot merged commit 60251da into rust-lang:master Jan 14, 2020
@kiljacken
Copy link
Contributor

@matklad Looks like bors was happy to see you!

@Veetaha if you'd create a new PR with the remaining changes, that would be nice :)

@Veetaha
Copy link
Contributor Author

Veetaha commented Jan 14, 2020

@kiljacken, I was taking some time to study this project codebase and architecture. I didn't expect that this will be merged so soon, though I will create a new PR with amendments according to the review soon as you suggested

@Veetaha
Copy link
Contributor Author

Veetaha commented Jan 14, 2020

I guess I needed to mark this one as [WIP], didn't I?

@kiljacken
Copy link
Contributor

No need to apologize, I just think bors mis-parsed Aleksey's command, so it got merged instead of allowing me to merge it (which wasn't really necessary as he already gave me r+ rights haha)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants