-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
does not work syntax highlighting when used with \b and the Russian alphabet #3291
Comments
Well, the effect of the Can you post an example of the code you're trying to highlight? |
Hello. Most of the code in 1C Enterprise is written in russian - https://github.com/silverbulleters/vanessa-agiler/blob/master/src/cf/CommonModules/%D0%9E%D0%B1%D1%89%D0%B5%D0%B3%D0%BE%D0%9D%D0%B0%D0%B7%D0%BD%D0%B0%D1%87%D0%B5%D0%BD%D0%B8%D1%8F%D0%9A%D0%BB%D0%B8%D0%B5%D0%BD%D1%82%D0%A1%D0%B5%D1%80%D0%B2%D0%B5%D1%80/Ext/Module.bsl Grammar was added in #2773 When we started that PR in december, we saw, that lightshow doen't work properly with \b and cyrillic, so we changed the regexps from \b to positive look behind/forward. \b with cyrillic works in Atom, VSC and Sublime text. Is it a bug with lightshow or Github itself can't handle it? |
Ah! That proves it then. Yes, I'd say it's because of the two different engines being used. Oniguruma is Unicode-aware by default, which means it matches "word boundaries" on a more world-wise definition (instead of I do know, however, that Assertions are definitely the best way to go, in all cases. |
I have to point out, too, that Lightshow can sometimes give very misleading results. See #3130. The app is closed source, sadly, so I have no way of knowing what's really going on in there... |
The question is what engine github uses. :) |
Oh! Sorry, I wasn't clear. GitHub uses PCRE, same as Lightshow. =) |
Okay... Is it easy to change one engine to another? Can we just ask the Github guys to make this change? |
@Alhadis got 404 on your link |
regular expression with |
Wouldn't it be easier to narrow it down to what characters are valid, instead of which ones aren't...? For example:
I'm not familiar with the language's syntax, so I can't be more specific than that, but I think you get the idea. =) |
@Alhadis |
Thing is, different engines have different notions of what constitutes a "word character". When writing portable expressions, I avoid the assumption that consuming engines will be Unicode-aware. Mostly because of scenarios like this. In any case, there's no reason for the site's highlighting engine to not be Unicode-aware, given the multicultural nature of GitHub. |
Hahaha. I can answer this: we don't currently use the I don't have an ETA but I promise we plan to look into improving highlighting for non-English, non-ASCII documents. |
Ah right, that makes sense then. I wasn't aware the flag imposed performance penalties (at least as far as the PCRE library is concerned). I'd say your proposed solution is a good compromise. =) We could also only enable it for languages which use Unicode-sensitive grammars for highlighting, too (like 1C Enterprise's one does). Personally, I feel that's a more conservative approach; we could add a new option to |
It would be great! |
Don't screw it up. The Russians are watching. |
@arfon what are our next steps? In my world the plan is:
anything else? |
Right now we wait.
Possibly. Please don't start opening pull requests implementing these changes yet as we won't be able to do anything with them 😄 |
How about these in the mean time? :D Just so we're guarded against the ol' syntax-highlighting related questions... |
@vmg Can you provide a link to the issue on the Github side? |
@Alhadis Hello. |
What do you mean? |
@arfon write #3291 (comment) |
(apologies in advance, i don't intent to hijack this conversation) In C# too, we are struggling to have C# 7.2 syntax support due to the fact that upstream has moved to Oniguruma grammer. @damieng has chalked out some ideas at atom/language-csharp#112 (comment) by which we can try to make progress. Since Oniguruma is mentioned in this thread, how feasible is it for linguist to support multiple engines? Is it a too huge effort, or totally out of the scope of this project? |
“Totally out of the scope of this project” 😀 This isn’t a limitation of Linguist. It’s a limitation (possibly even a design decision - I don’t know as it pre-dates my time at GitHub) of the parser used on GitHub.com that parses the grammars. |
Just to clarify, is there any chance that someone on GitHub will come and replace the good old parser with a newer one, providing support for Oniguruma and other complicated regex syntaxes? C# language highlighting on GitHub is absolutely disgusting and it would be nice to figure out how this can be fixed in the nearest future. Other engines like BitBucket or GitLab provide a much better highlighting, sad. Thanks in advance! |
@worldbeater, PCRE supports every feature that Oniguruma does, it just uses different syntax. The perceived difference in regex support you see on GitHub is a consequence of grammar authors using Oniguruma-specific syntax instead of PCRE. This happens because Oniguruma is used by every editor which supports TextMate grammars – GitHub is a lone exception due to its use of PCRE. If the engines were reversed, you'd be asking us to provide support for PCRE's "complicated syntax" too, because the Oniguruma engine isn't good enough. |
Well, now I understand, thanks! Could you please provide more info on which PCRE version GitHub is using now? Seems work on porting Oniguruma-based syntax to PCRE-based is stuck due to some incompatibilities, I don't know: atom/language-csharp#112 (comment) That is all because Microsoft uses Oniguruma expressions in grammar for Visual Studio Code and Atom editors right now, but it can't be simply copied and pasted to play well with GitHub. Thanks in advance, @Alhadis! |
@worldbeater I doubt the PCRE version has anything to do with this. It's running in ASCII-only mode, that's all I know (I'm not staff). But of check the manpage for These are the likely discrepancies that that're affecting the C# grammar (indeed, most TextMate grammars which use Oniguruma extensions):
|
Thank you for that accurate list, @Alhadis. The most common occurrence of incompatible syntax is the |
This issue has been automatically marked as stale because it has not had activity in a long time. If this issue is still relevant and should remain open, please reply with a short explanation (e.g. "I have checked the code and this issue is still relevant because ___."). Thank you for your contributions. |
i think this issue should be opened. the problem still exists. |
This issue has been automatically marked as stale because it has not had activity in a long time. If this issue is still relevant and should remain open, please reply with a short explanation (e.g. "I have checked the code and this issue is still relevant because ___."). Thank you for your contributions. |
Any news form GitHub.com? |
This issue has been automatically marked as stale because it has not had activity in a long time. If this issue is still relevant and should remain open, please reply with a short explanation (e.g. "I have checked the code and this issue is still relevant because ___."). Thank you for your contributions. |
One more ping, guys. (sorry) |
Need help this issue |
Nothing has changed on the GitHub side of things... #3291 (comment) still applies:
This isn't likely to change in the immediate future. If you really want this issue resolved sooner rather than later, the best option is to change the grammar as discussed in various points earlier in this issue. |
This issue has been automatically marked as stale because it has not had activity in a long time. If this issue is still relevant and should remain open, please reply with a short explanation (e.g. "I have checked the code and this issue is still relevant because ___."). Thank you for your contributions. |
This issue has been automatically closed because it has not had activity in a long time. Please feel free to reopen it or create a new issue. |
Good afternoon. Create syntax highlighting for 1C:Enterprise, which supports the Russian key words.
When we use
\b(Если|If)\b
in github https://github-lightshow.herokuapp.com keywords are not highlighted. Do like this(?<=[^\w-а-яё\.]|^)(Если|If)(?=[^\w-а-яё\.]|$)
works. Files in UTF-8.The text was updated successfully, but these errors were encountered: