Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix octal escapes starting with 0 #1

Merged
merged 4 commits into from
Dec 25, 2016
Merged

Conversation

addaleax
Copy link
Contributor

Octal escape sequences starting with 0 should be expanded before the usual escape sequences to avoid ambiguity with the \0 escape.

This patch aligns the behaviour with how JS parses the sequences.

@coveralls
Copy link

coveralls commented Dec 14, 2016

Coverage Status

Coverage remained the same at 100.0% when pulling ad7eba1 on addaleax:fix-octals into 81a30b5 on iamakulov:master.

@addaleax
Copy link
Contributor Author

Actually, I’m just noticing that things are a little more tricky; Consider strings like \\x41. Right now (with or without this PR), unescape would first convert \\ to \ and then \x41 to A, which is obviously a little undesired.

@addaleax addaleax force-pushed the fix-octals branch 2 times, most recently from 8b43d6f to e7195ef Compare December 17, 2016 03:34
@addaleax
Copy link
Contributor Author

I’ve updated with a more comprehensive solution that is, unfortunately, also kind of a full rewrite of the src/ part here.

@coveralls
Copy link

coveralls commented Dec 17, 2016

Coverage Status

Coverage decreased (-17.4%) to 82.609% when pulling 8b43d6f on addaleax:fix-octals into 81a30b5 on iamakulov:master.

@coveralls
Copy link

Coverage Status

Coverage remained the same at 100.0% when pulling e7195ef on addaleax:fix-octals into 81a30b5 on iamakulov:master.

3 similar comments
@coveralls
Copy link

Coverage Status

Coverage remained the same at 100.0% when pulling e7195ef on addaleax:fix-octals into 81a30b5 on iamakulov:master.

@coveralls
Copy link

Coverage Status

Coverage remained the same at 100.0% when pulling e7195ef on addaleax:fix-octals into 81a30b5 on iamakulov:master.

@coveralls
Copy link

Coverage Status

Coverage remained the same at 100.0% when pulling e7195ef on addaleax:fix-octals into 81a30b5 on iamakulov:master.

@iamakulov
Copy link
Owner

Wow, thanks for the PR!

Also, I guess I should enable email notifications :D

Copy link
Owner

@iamakulov iamakulov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the solution. Thanks a lot, seriously, this fixes the problems I didn't even have a clue about!

Left a couple of notes regarding the code. Once we fix them, I think it's ready to merge and be published.

@@ -0,0 +1 @@
{"/home/sqrt/src/unescape-js/dist/index.js":{"path":"/home/sqrt/src/unescape-js/dist/index.js","s":{"1":1,"2":1,"3":1,"4":1,"5":1,"6":5,"7":1,"8":2,"9":1,"10":18,"11":18,"12":2,"13":16,"14":3,"15":2,"16":1,"17":13,"18":2,"19":11,"20":1},"b":{"1":[2,16],"2":[3,13],"3":[2,1],"4":[2,11]},"f":{"1":5,"2":2,"3":18,"4":18},"fnMap":{"1":{"name":"fromHex","line":24,"loc":{"start":{"line":24,"column":14},"end":{"line":24,"column":36}}},"2":{"name":"fromOct","line":27,"loc":{"start":{"line":27,"column":14},"end":{"line":27,"column":36}}},"3":{"name":"(anonymous_3)","line":31,"loc":{"start":{"line":31,"column":18},"end":{"line":31,"column":36}}},"4":{"name":"(anonymous_4)","line":32,"loc":{"start":{"line":32,"column":41},"end":{"line":32,"column":95}}}},"statementMap":{"1":{"start":{"line":3,"column":0},"end":{"line":5,"column":3}},"2":{"start":{"line":7,"column":0},"end":{"line":7,"column":32}},"3":{"start":{"line":9,"column":0},"end":{"line":9,"column":105}},"4":{"start":{"line":11,"column":0},"end":{"line":22,"column":2}},"5":{"start":{"line":24,"column":0},"end":{"line":26,"column":2}},"6":{"start":{"line":25,"column":4},"end":{"line":25,"column":51}},"7":{"start":{"line":27,"column":0},"end":{"line":29,"column":2}},"8":{"start":{"line":28,"column":4},"end":{"line":28,"column":50}},"9":{"start":{"line":31,"column":0},"end":{"line":43,"column":2}},"10":{"start":{"line":32,"column":4},"end":{"line":42,"column":7}},"11":{"start":{"line":33,"column":8},"end":{"line":41,"column":9}},"12":{"start":{"line":34,"column":12},"end":{"line":34,"column":37}},"13":{"start":{"line":35,"column":15},"end":{"line":41,"column":9}},"14":{"start":{"line":36,"column":12},"end":{"line":36,"column":86}},"15":{"start":{"line":36,"column":34},"end":{"line":36,"column":57}},"16":{"start":{"line":36,"column":62},"end":{"line":36,"column":86}},"17":{"start":{"line":37,"column":15},"end":{"line":41,"column":9}},"18":{"start":{"line":38,"column":12},"end":{"line":38,"column":34}},"19":{"start":{"line":40,"column":12},"end":{"line":40,"column":47}},"20":{"start":{"line":45,"column":0},"end":{"line":45,"column":36}}},"branchMap":{"1":{"line":33,"type":"if","locations":[{"start":{"line":33,"column":8},"end":{"line":33,"column":8}},{"start":{"line":33,"column":8},"end":{"line":33,"column":8}}]},"2":{"line":35,"type":"if","locations":[{"start":{"line":35,"column":15},"end":{"line":35,"column":15}},{"start":{"line":35,"column":15},"end":{"line":35,"column":15}}]},"3":{"line":36,"type":"if","locations":[{"start":{"line":36,"column":12},"end":{"line":36,"column":12}},{"start":{"line":36,"column":12},"end":{"line":36,"column":12}}]},"4":{"line":37,"type":"if","locations":[{"start":{"line":37,"column":15},"end":{"line":37,"column":15}},{"start":{"line":37,"column":15},"end":{"line":37,"column":15}}]}}}}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added .nyc_output to .gitignore. Could you please merge or rebase upon the master branch so that this file gets removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, done!

string = unescapeVariableHexSequence(string);
return string;
return string.replace(JSEscapeRegex, (_, match, longHex, varHex, shortHex, octal) => {
if (match[0] === 'x') {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I get from short debugging, every time this function is invoked, among longHex, varHex, shortHex and octal, only a single parameter will have a value will be defined. Is this true? Can we write just something like

if (longHex !== undefined) {
  // ...
} else if (varHex !== undefined) {
  // ...
}

then? This will make the code easier to understand.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems to work, yeah. :)

[/\\"/g, '\"'],
[/\\\\/g, '\\']
];
const JSEscapeRegex = /\\(u(\{([0-9A-Fa-f]+)\}|[0-9A-Fa-f]{4})|x([0-9A-Fa-f]{2})|(\d{3})|['"tbrnfv0/\\])/g;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: let's rename this to jsEscapeRegex? Only classes start from the upper-case letter usually.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, whatever you prefer. :)

Octal escape sequences starting with 0 should be expanded before
the usual escape sequences to avoid ambiguity with the `\0` escape.

Also, accidental double unescaping, e.g. `\\x41 → \x41 → A`
needs to be avoided. Fixing this requires using a single `.replace()`
call per input string.
@coveralls
Copy link

coveralls commented Dec 25, 2016

Coverage Status

Coverage remained the same at 100.0% when pulling 5ed0571 on addaleax:fix-octals into 56dc643 on iamakulov:master.

@coveralls
Copy link

coveralls commented Dec 25, 2016

Coverage Status

Coverage remained the same at 100.0% when pulling 8862230 on addaleax:fix-octals into 56dc643 on iamakulov:master.

Simplify: duplicate the `u` part between the first and the second alternative. This reduces the regex depth.
`\/` is not a valid JS escape sequence.
@coveralls
Copy link

Coverage Status

Coverage remained the same at 100.0% when pulling 1e7a605 on addaleax:fix-octals into 56dc643 on iamakulov:master.

1 similar comment
@coveralls
Copy link

coveralls commented Dec 25, 2016

Coverage Status

Coverage remained the same at 100.0% when pulling 1e7a605 on addaleax:fix-octals into 56dc643 on iamakulov:master.

@coveralls
Copy link

Coverage Status

Coverage remained the same at 100.0% when pulling 5fbb4d0 on addaleax:fix-octals into 56dc643 on iamakulov:master.

1 similar comment
@coveralls
Copy link

coveralls commented Dec 25, 2016

Coverage Status

Coverage remained the same at 100.0% when pulling 5fbb4d0 on addaleax:fix-octals into 56dc643 on iamakulov:master.

* Swap `varHex` and `longHex` because it’s how they actually correspond to the regex
* Reorder `if`s to have the same order as the parameters have
* For uniformity, add the `specialCharacter` param as there’s now a corresponding alternative in the regex, and alias `match` with `__` as it’s not necessary anymore
@iamakulov
Copy link
Owner

Cool, thanks for the updates! I also altered the code slightly (added docs and refactored a bit). Let’s wait for the tests to finish, and I’m merging it.

@coveralls
Copy link

coveralls commented Dec 25, 2016

Coverage Status

Coverage remained the same at 100.0% when pulling ef9b5b3 on addaleax:fix-octals into 56dc643 on iamakulov:master.

@iamakulov iamakulov merged commit a881604 into iamakulov:master Dec 25, 2016
@addaleax addaleax deleted the fix-octals branch December 25, 2016 18:54
@addaleax
Copy link
Contributor Author

Cool, thanks!

(Btw, the reason I included / there is that I was looking at regexes themselves, where \/ is valid – but of course it’s your call to decide what this module is for ;))

@iamakulov
Copy link
Owner

Btw, the reason I included / there is that I was looking at regexes themselves, where \/ is valid

Okay, I should’ve asked you, yeah :D Thanks for the explanation.

At the moment, I dedicate this module to parsing only string escape sequences. There’re two points that keep me from adding / back:

  • There’re much more regex escape sequences that just \/: it’s \., \d, etc. If we add regex support, we must support all of them.
  • I can’t quite imagine a real task that involves unescaping Regex escape sequences. If you have a real need to parse them, please share your task! It’ll really help me consider this feature.

Anyway, unescape-js@1.0.7 is here. Thank you for your changes!

@addaleax
Copy link
Contributor Author

There’re much more regex escape sequences that just \/: it’s \., \d, etc. If we add regex support, we must support all of them.

Well, \/ is kind of special because it’s not special – it’s parsed the same way it is parsed when it’s inside a string, e.g. '\/' === '/'. ;)

I can’t quite imagine a real task that involves unescaping Regex escape sequences. If you have a real need to parse them, please share your task! It’ll really help me consider this feature.

Yeah no, just playing around with stuff. This isn’t a problem for me or anything. :)

@iamakulov
Copy link
Owner

Well, / is kind of special because it’s not special – it’s parsed the same way it is parsed when it’s inside a string, e.g. '/' === '/'. ;)

Good point! I think we can handle this as a generic case: i.e. when an escape sequence is invalid, just return the character it escapes (i.e. \//, \--, etc.) Want to submit a PR? :–)

@addaleax
Copy link
Contributor Author

Maybe, but not today… everything’s still a bit christmas-y right now ;)

@iamakulov
Copy link
Owner

Sure. Happy holidays then :–)

@addaleax
Copy link
Contributor Author

Thanks – happy holidays to you too!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants