Skip to content
This repository has been archived by the owner on Sep 27, 2020. It is now read-only.

Non-ascii characters corrupt the range data #90

Open
cgewecke opened this issue Oct 12, 2019 · 4 comments
Open

Non-ascii characters corrupt the range data #90

cgewecke opened this issue Oct 12, 2019 · 4 comments

Comments

@cgewecke
Copy link

(Originally reported at solidity-coverage 418)

("corrupt" might be overdramatizing this a little.)

It looks like ranges are calculated by character count rather than string length, and non-ascii characters are 'wider' than length 1. This can introduce unexpected drift if you're using the parser to identify string injection points when modifying source files.

Ascii: length 36

contract A {
    /// S
    uint x;
}

Non Ascii: length 37

contract A {
    /// 𝕊
    uint x;
}

These two contracts produce the same range data. Not sure this can (or should?) be fixed here. A simple work-around for my case is to sanitize files before parsing.

The issue raising this at SC involved scientific notation in a natspec comment.

@federicobond
Copy link
Owner

What you are saying is that I am giving out a unicode character range instead of a byte range? Or vice-versa?

@cgewecke
Copy link
Author

cgewecke commented Oct 12, 2019

@federicobond

giving out a unicode character range

Yes, I think that's right. You're counting unicode while JS string methods count bytes.

In the examples above, the range for uint is:

"range": [
  27,
  30
]

And running...

  • ascii.slice(0,31) ends on t;
  • nonAscii.slice(0,31) ends on n;

@federicobond
Copy link
Owner

Interesting catch. I am only copying the character ranges returned by the ANTLR parser. I just found out that there was a deprecation due to the treatment of UTF-8 in the ANTLR JS target. Can you try making this same change in src/index.js and see if it fixes it?

@cgewecke
Copy link
Author

cgewecke commented Oct 12, 2019

@federicobond Yes! That looks good.

This is the diff from a test asserting the ascii/non-ascii ASTs are deep equal.

"typeName": {
  "name": "uint"
  "range": [
-    27
-    30
+    28
+    31
  ]

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants