-
Notifications
You must be signed in to change notification settings - Fork 586
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[YAML] Add entirely new YAML syntax #90
Conversation
As a side note, I discovered a couple of inconveniences while working on this, which I reported here: https://github.com/SublimeTextIssues/Core/labels/C%3A%20Syntax%20Highlighting |
👍 This is awesome man. YAML is a tricky one! Thanks @FichteFoll. Next stop Markdown 😉 |
By the way, I would really like to use this as a base for a This would also be interesting for other Sublime Text resource file types based on JSON. Don't know if this would be a feasable addition. |
When you are reviewing this, I'd like to hear about your opinion on coloring the punctuation characters, i.e. Edit: Just saw that |
Written from scratch. Package now also includes a preview file (for "good measure"), so you can test your color scheme against it or something. Also has a lot of tests, of course.
68669bf
to
a9c002e
Compare
Done. Regarding punctuation characters, I came to the following stance: For the same reason that JSON punctuation is not highlighted, I believe that YAML punctuation should not be highlighted as well. If users prefer to have them highlighted, they can do so easily by editing their color scheme, which they could also to for other languages following the same punctuation scope namin, which will hopefully be standardized at some point. The other way around (edit to override colorization of punctuation because of using scope names like |
In 3098 we added the "Performance" variant of the Syntax Tests build suite. Since this is a complete rewrite, could you take a couple of minutes and runs some tests on some decently large YAML files with this and the existing YAML syntax? This can help ensure that, in addition to the excellent coverage of different syntax you already have, there aren't any performance issues with the regular expressions. |
I ran the performance test a couple times and recorded min and max average. The new
The biggest YAML I could find (you hardly find anything >50 lines with google) was ...
That's sadly a 25x slowdown (only 5x for the first). To be expected, since YAML is really not easy to parse for computers, but the current one is just wrong in many situations and we can't have that, can we? |
@FichteFoll Hmm, 750ms for a 1200 line file is definitely less than ideal. Considering that the second example is only about twice as long as the first, the 20x slowdown seems like there is probably something we can optimize in there. I'll take a look and see if there is anything I can identify. |
Quoting the long strings in With all of the variables and includes, and being unfamiliar with all of the YAML terminology, I have not yet identified what is causing the issue. I see there are a number of patterns with multiple options that are part of negative lookaheads. My hunch is it may be related to this. The other option is that I may be able to instrument the regex engine for the performance test to help identify the regex pattern performance. |
It is very likely that unquoted scalars are the performance bottle neck in this, just because of how "expensive" they are in a computer parsing sense, since multiple checks have to be performed for each character. The way I check this currently is, as you mentioned, by using negative look-aheads that tell us to terminate a plain scalar, but I do think there is room for improvement. I don't have it in my head exactly right now since it's been a while that i worked on this, though, and I'm rather busy at the moment with deadlines. I will be able to take a look at this again after next week (2016-02-15). Maybe earlier, but unlikely. |
Adding the results of performance tests without Highlighting time of |
+1 |
(?x) | ||
(?= | ||
{{ns_plain_first_plain_out}} | ||
((?!{{_flow_scalar_end_plain_out}}).)* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here you have a lookahead, containing a negative lookahead, containing a positive lookahead.
Proposed change: ([^ :#]|\:[^ ]| [^#])*
With the two proposed changes I just made (removing nested lookahead, negative lookahead, lookhead patterns), the syntax is an order of magnitude faster. It processes |
I tweaked a few regexes further. With the unreleased dev build of Sublime Text I am seeing full-file highlighting of In short, I removed a number of negative lookaheads being applied to every character with patterns that used character classes to do positive matching. It is possible I introduced a change to behavior, although I tried not to. It would be good to have you review @FichteFoll. |
@wbond: while you are waiting for feedback, I thought this might interest you. I was curious and so I diffed the scope names applied by the two versions of the syntax to the |
Courtesy of @wbond.
sregex doesn't support them but does not need them either, since they are essentially a hack for backtracking regex engines. Courtesy of @wbond.
The nested look-aheads were a huge bottleneck for plain scalar key-value pair parsing. By utilizing linear matches in the single look-ahead, parsing speed for `PHP Source.sublime-syntax` is improved by 80%. Partly courtesy of @wbond.
Remove the misleading comment about that and add test cases.
@wbond
I also tweaked the scoping slightly here and there. There was one thing I did not incorporate, which was the addition of some match patterns before matches like PS: I got my hands on some pretty large YAML test files but they are so large that performance testing with my current ST build would not be feasable. |
This is merged in, thanks for all of your work @FichteFoll! |
Awesome! Works like a charm with Rails i18n files. Thanks! |
Thanks for the merge. Glad we finally have sane YAML highlighting. :) Here are some performance tests on big files with the slower 3107 build: ~11600 lines (400kB): 840ms Can't wait to compare against the next build. I noticed that syntax highlighting, or especially the syntax tests, only use a single core. Maybe some concurrency optimization wrt sregex could speed this up a littlebit, but it'd be quite some work I suppose. What I'm curious about is RAM usage of plugin_host.exe however, which grew significantly over the course of performance testing but eventually dropped before the tests were finished. Maybe it's unrelated? |
Remember that with 3107, this syntax is still use the oniguruma engine. So all of the performance, memory characteristics, etc will likely be affected by not utilizing patterns that require that engine. I'm thinking the memory usage of plugin_host is unrelated. Perhaps another package you have installed? It happens after the performance test on the 2nd file, and starts before the performance test on the 3rd file. I don't know off of the top of my head how allocations are happening related to lexing files in a buffer. My guess is that some detail of that is why the memory usage increases until the end of the performance test. |
Did more tests with 3110 (and removal of the possessive quantifiers):
The 4MB file still seems editable. It does lag behind slightly but it's not as bad as it was previously. If tokenizing could somehow be multithreaded, then performance would probably improve greatly (I have 4 cores with HT). |
I spent the last couple of weeks (every now and then) writing this from scratch, as I promised. Based on http://www.yaml.org/spec/1.2/spec.html.
Package now also includes a preview file (for "good measure"), so you can test your color scheme against it or something.
Also has a lot of tests, of course.
I have decided to not assign any special scopes to the punctuation characters, e.g.
:[]{},?|>%
because everything else is already colored.Notable differences:
Screenshot: with new syntax
(unfortunately
constant.numeric
andstring
have a very similar color in my color scheme, so you hardly see the difference here)Screenshot with old syntax:
Fixes jskinner/DefaultPackages#41
Fixes jskinner/DefaultPackages#167