-
-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: SQL comments support and new tokenizer #141
Conversation
Thanks, looks like interesting changes! Haven't had time to read it through properly yet, but I've updated the commit lint rules on master so when you get the time, please feel free to rebase from master and that check should pass. Quick look at the comment regex tells me it would be trivial to add the # Comment
-- Comment
/* Comment */ |
d0a760e
to
21f6dec
Compare
Added
|
8824a8a
to
b15224d
Compare
|
b15224d
to
8d526d6
Compare
@parallels999 in what way does it not work? Work fine for me, both dark and light theme. |
+1 8824a8 |
Great!
Italic is harder to read in smaller fonts, so I personally would like to avoid it. |
8d526d6
to
bd43335
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've read through the code now, and wow, good job! You really managed to squeeze together that getSegments
quite a bit! I also really like the use of named groups instead of my overly complicated solution 👍
Split the commits
Could you please split the comment support and the performance improvements into two separate commits when we're done? I'll rebase this PR onto master in the end and since they're separate features I'd like them clearly separated in the commit history and the changelog.
Escaper performance
I tried it out with some more realistic real world data. The switch solution seems to be the most performant one. The difference in performance between escSwitch
and escSwitch2
can probably be considered within margin of error, so I say let's go for the escSwitch
implementation with inline replacer function.
My benchmark: https://jsben.ch/eGHEk
Comment color
\x1b[2m\x1b[90m
, albeit being a bit chaotic, seems to work in multiple scenarios for me too. I say we go with this. As for italic I'd rather keep it simple and bare bones, and if someone wants italic they can use the options for that. Thanks for the input everybody!
This works too, no tokenizer = new RegExp('(.*?)(' + [
'\\b(?<keyword>' + keywords.join('|') + ')\\b',
/\b(?<number>\d+(?:\.\d+)?)\b/,
// Note: Repeating string escapes like 'sql''server' will also work as they are just repeating strings
/(?<string>'(?:[^'\\]|\\.)*'|"(?:[^"\\]|\\.)*"|`(?:[^`\\]|\\.)*`)/,
/(?<comment>--[^\n\r]*|#[^\n\r]*|\/\*(?:[^*]|\*(?!\/))*\*\/)/,
// Future improvement: Comments should be allowed between the function name and the opening parenthesis
/\b(?<function>\w+)(?=\s*\()/,
/(?<bracket>[()])/,
/(?<special>!=|[=%*/\-+,;:<>])/,
].map(v => v.toString().replace(/^\/|\/\w*$|[\t ]+/g, '')).join('|') + '|$)', 'isy'), |
@parallels999 Yes, that would probably work. And even though it's more compact and could maybe be considered "cleaner", I think it adds to the obfuscation and makes the code harder to follow. I'd like to prioritize readability over compactness. |
So you want it split into commits |
Agreed! And JYI @parallels999 you want to remove |
@Qtax it's just an example, I do not have access to modify your PR |
Hey again! Just letting you know that I'll have to let this sit another few days as I've got quite a lot to do at the moment. Sorry for the holdup! |
bd43335
to
3aaf338
Compare
@scriptcoded np, I've split the commits. Also wanted to add the improved escape function, but jsben.ch has been down all day for me. :-/ |
Glad to see this change because it looks like you're fixing a bug with the old tokenizer: AFAICT, the old tokenizer would (incorrectly) highlight keywords/symbols inside of strings, for example
It was an architectural problem because the tokenizer ran the regexps one by one, whereas you need to tokenize with a single regexp. I hadn't seen that named capturing groups feature before, took me a while to figure it out but it works nicely. |
@Qtax So I've read through everything one more time now and ran it past some colleagues as well. So as soon as #141 (comment) is fixed I think we're ready to merge! I would push a commit myself but since you've been force pushing I'd rather not mess things up for you. |
@Qtax and just to explain I don't necessarily need the feature split into two commits specifically, but at least commits that make sense. So that when this gets merged each commit is contained and releasable. In this specific case I think the easiest is therefore to just keep it as two commits. |
Faster and simpler tokenization. Refs: scriptcoded#133
Add SQL comments support, including MySQL # comments. Refs: scriptcoded#133
3aaf338
to
a03ec17
Compare
Fixed. Force pushing since I don't want to make/keep unnecessary/unwanted commits, and prefer to update the commits instead. Especially if it's gonna be fast forwarded when merged. (Unless there is some other way to do that that I'm not aware of?) |
There we go, finally merged! Sorry for the holdup and huge thanks for the PR and all input! 🎉 Closes #133 |
Add SQL comments support.
Faster and simpler tokenization.
Refs: #133