Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems with blank-masking #367

Open
birkenfeld opened this issue Aug 26, 2015 · 1 comment
Open

Problems with blank-masking #367

birkenfeld opened this issue Aug 26, 2015 · 1 comment
Labels

Comments

@birkenfeld
Copy link
Collaborator

The code chunking iterator works in terms of byte offsets. But the mask_comments function (and ´mask_sub_scopes` too, probably) construct a new string based on differences of byte offsets.

That means that if a comment or string contains non-ASCII characters, the "masked-out" string will have more characters than the original source, since multi-byte characters got replaced by multiple single-byte characters (spaces).

I can't say if this causes problems such as wrong indices in the matches...

A related problem is that if a character literal contains an escape sequence, the "masked-out" version of the code is not valid Rust anymore, since e.g. '\'' gets masked to ' ' (with two spaces). Again, I can't say if that causes an actual problem.

@phildawes
Copy link
Collaborator

Good catch.
I think internally it makes sense if we hold the byte offset in the file, so replacing a multibyte char with the same number of spaces is the right thing to do from the perspective of masking.

I'm not sure what is best to do about character literals with an escape sequence - I assume the parser balks at multi space char literals (I haven't tested). Maybe we should replace with a 1 space char literal and an extra space after the closing quote?.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants