-
Notifications
You must be signed in to change notification settings - Fork 695
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MySQL ` (backtick) highlighting #1551
Comments
@kurtmckee has been recently doing a lot of fantastic work on the MySQL lexer -- maybe he can help :) |
Yes, I can assist here.
Root cause is that backtick-quoted and unquoted names share the same token type: Name. The only difference between them is that unquoted names have restrictions (like no spaces, cannot be a reserved keyword, etc), while quoted names have no restrictions. They are lexically identical, however.
Matthäus, I don't recall that there is an existing Name.Quoted token type or anything similar. Is there precedent for creating a one-off token type? If the new token type isn't recognized by a formatter do they trace up the token hierarchy to find something they *can* format?
For example, if I create a Name.Awesome.Quoted token, would a formatter search for that, then up the token hierarchy for Name.Awesome and then up to Name?
If this is automatic perhaps a custom sub-token could be created without breaking other formatters and styles, and perhaps the user can add formatter and style support for the custom token type. However, this delves into parts of Pygments I haven't touched yet so I'll need an expert opinion on the matter.
|
Since it's very MySQL specific we could use one of the predefined tokens (e.g. Name.Attribute) as to not break backwards compatibility.
The biggest problem then is choosing between the different predefined tokens ( The formatter does seem to inherit from parents (hence the Edit: String.Backtick exists, but that seems to be for strings and not names, thoughts? |
@Anteru, I have a serious question for you below. If you can help answer this it will allow me to both improve the MySQL lexer as well as to resolve @jord1e's need for a unique token type for quoted schema object names. @jord1e, the MySQL lexer does its best to correctly tokenize all of the input. Quoted and unquoted schema object names have no semantic difference that I'm aware of in MySQL so in both cases they are tokenized as "Name". If the token type is changed for quoted schema object names it will introduce a new semantic distinction between quoted and unquoted names. If that's the case, the semantic distinction still needs to be meaningful. I actually wanted to use a custom type for quoted schema object names. I seriously considered creating the I had thought that if I introduced custom token types then I would have to add new CSS class names in token.py. I would have to modify all of the existing color schemes to at least map the new custom token types to the existing My big question for @Anteru is: What actually has to happen if I introduce a custom token type? I anticipated this apocalyptic scenario where I would have to touch 20 to 30 files, but now that I'm checking some of the formatters it appears that they do exactly what I was hoping: if the formatter doesn't recognize the token type then it follows the token hierarchy until it finds a token type that it recognizes (like in html.py or latex.py). This is encouraging but I would really like some guidance here. I think that my original goal to uniquely tokenize quoted schema object names, as well as the fate of this ticket, primarily hinges on creating custom types. |
@kurtmckee don't forget that practicality beats purity. The token type names assigned by Pygments don't necessarily have to match the semantic meaning assigned by the language - they're not used by a parser. Usually it's a good idea since similar things will have a similar color for different lexers, but for the use case here it's much easier to use some other token type that already has useful assigned attributes in the various styles. I don't think it makes sense to introduce new styling definitions in all style classes for a token type that exclusively appears in the MySQL lexer. |
I haven't checked yet if the class falls back to the next item in the hierarchy, but it would make sense. @birkenfeld Do you know? Does Name.Foo try Name.Foo, and fall back to Name? |
That's the idea, yes. |
In which case @kurtmckee I think your question is answered. If we spot a formatter down the line which doesn't behave like this it'll be considered a bug and fixed. If you can get away with some other pre-existing token that's probably the easiest solution though. |
Great! I'll explore a custom token for this situation, test before-and-after output for the HTML formatter and the LaTeX formatter, and make sure that the unit tests are in place to validate the behavior.
If all goes well then I'll reply back to this with instructions for taking advantage of the new functionality once it's merged.
I may be able to work on this over the weekend.
…On September 24, 2020 8:24:43 PM UTC, "Matthäus G. Chajdas" ***@***.***> wrote:
In which case @kurtmckee I think your question is answered. If we spot a formatter down the line which doesn't behave like this it'll be considered a bug and fixed. If you can get away with some other pre-existing token that's probably the easiest solution though.
--
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
#1551 (comment)
|
…iquely Changes in this patch: * Name.Quoted and Name.Quoted.Escape are introduced as non-standard tokens * HTML and LaTeX formatters were confirmed to provide default formatting if they encounter these two non-standard tokens. They also add style classes based on the token name, like "n-Quoted" (HTML) or "nQuoted" (LaTeX) so that users can add custom styles for these. * Removed "\`" and "\\" as schema object name escapes. These are relics of the previous regular expression for backtick-quoted names and are not treated as escape sequences. The behavior was confirmed in the MySQL documentation as well as by running queries in MySQL Workbench. * Prevent "123abc" from being treated as an integer followed by a schema object name. MySQL allows leading numbers in schema object names as long as 0-9 are not the only characters in the schema object name. * Add ~10 more unit tests to validate behavior. Closes pygments#1551
@jord1e, I've created a pull request to fix this. I have also fixed two bugs in the lexer that I overlooked previously, involving escape characters in quoted schema object names as well as unquoted schema object names that start with leading digits. I successfully tested adding the new token names in the "friendly" scheme. You will need to add "Name.Quoted" and whatever color definition you want. If you want to highlight escape sequences in quoted schema object names, add "Name.Quoted.Escape" with a custom color definition. Please note that you won't be able to get 100% highlighting parity with MySQL Workbench. For example, Workbench incorrectly highlights schema object names with leading digits, and I've fixed this problem in Pygments with the same PR: |
…iquely (#1555) * MySQL: Tokenize quoted schema object names, and escape characters, uniquely Changes in this patch: * Name.Quoted and Name.Quoted.Escape are introduced as non-standard tokens * HTML and LaTeX formatters were confirmed to provide default formatting if they encounter these two non-standard tokens. They also add style classes based on the token name, like "n-Quoted" (HTML) or "nQuoted" (LaTeX) so that users can add custom styles for these. * Removed "\`" and "\\" as schema object name escapes. These are relics of the previous regular expression for backtick-quoted names and are not treated as escape sequences. The behavior was confirmed in the MySQL documentation as well as by running queries in MySQL Workbench. * Prevent "123abc" from being treated as an integer followed by a schema object name. MySQL allows leading numbers in schema object names as long as 0-9 are not the only characters in the schema object name. * Add ~10 more unit tests to validate behavior. Closes #1551 * Remove an end-of-line regex match that triggered a lint warning Also, add tests that confirm correct behavior. No tests failed before or after removing the '$' match in the regex, but now regexlint isn't complaining. Removing the '$' matching probably depends on the fact that Pygments adds a newline at the end of the input text, so there is always something after a bare integer literal.
Hello,

I am trying to highlighting everything between backticks exactly like in MySQL Workbench:
Code
This is my style and example sql
I compile everything using
My attempts at solving the issue
The problem is that

Name: '#993a3e'
makes everything red:I Have tried solving it with
noinherit
, but alasBacktick identification seems to be happening here, here or here. A solution would be appreciated
The text was updated successfully, but these errors were encountered: