-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unexpected matches in the resulting RegExp #29
Comments
OK, found the gotcha ... var list = ["9⃣", "8⃣", "7⃣", "6⃣", "5⃣", "4⃣", "3⃣", "2⃣", "1⃣", "0⃣", "#⃣"];
var regenerate = require('regenerate');
var regenerated = regenerate.apply(null, list);
console.log(list.filter(function (chr) {
return (0 <= chr && chr <= 9) || chr == '#';
}).length ?
'There should be #0-9 in the RegExp' :
'There should be NO #0-9 in the RegExp'
);
console.log(regenerated.toRegExp()); |
Regenerate only deals with individual code points or symbols (by design). The problem is "🇨🇳", "🇺🇸", "🇷🇺", "🇰🇷", "🇯🇵", "🇮🇹", "🇬🇧", "🇫🇷", "🇪🇸", "🇩🇪", "9⃣", "8⃣", "7⃣", "6⃣", "5⃣", "4⃣", "3⃣", "2⃣", "1⃣", "0⃣", "#⃣", I’d suggest you manually create the regexp for those 21 emoji, and have the rest of the regex generated by Regenerate. |
so I guess I should convert those chars upfront and then reparse, right? Thanks. I knew it was me, although a warning or something like "you are doing it wrong with these strings" would have been nicer than a silent success with potentially broken RegExp. Talking from a parsing security point of view. Will close this anyway. |
I was trying to use this for twemoji but I've found a very weird behavior I'm not sure it's me doing it wrong or there's a bug in here (haven't checked your source code yet).
So, the resulting
RegExp
matches#
and every char between0
and9
and I've no idea what's going on and why is that, so I've prepared this test code:This should not result in the following
RegExp
:/[#0-9\xA9\xAE\u2122\u23E9-\u23EC\u23F0\u23F3\u26CE\u2705\u270A\u270B\u2728\u274C\u274E\u2753-\u2755\u2795-\u2797\u27B0\u27BF\u3030\uE50A]|\uD83C[\uDCCF\uDD70\uDD71\uDD7E\uDD8E\uDD91-\uDD9A\uDDE6-\uDDFF\uDE01\uDE02\uDE32-\uDE3A\uDE50\uDE51\uDF00-\uDF20\uDF30-\uDF35\uDF37-\uDF7C\uDF80-\uDF93\uDFA0-\uDFC4\uDFC6-\uDFCA\uDFE0-\uDFF0]|\uD83D[\uDC00-\uDC3E\uDC40\uDC42-\uDCF7\uDCF9-\uDCFC\uDD00-\uDD3D\uDD50-\uDD67\uDDFB-\uDE40\uDE45-\uDE4F\uDE80-\uDEC5]/
'cause
#0-9
is absolutely undesired as a match.Thanks for any sort of outcome.
The text was updated successfully, but these errors were encountered: