Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CHARACTER SET isn't recognized on MySQL lexer #975

Closed
Anteru opened this issue Aug 31, 2019 · 1 comment · Fixed by #1527
Closed

CHARACTER SET isn't recognized on MySQL lexer #975

Anteru opened this issue Aug 31, 2019 · 1 comment · Fixed by #1527
Labels
S-minor severity: minor T-bug type: a bug X-imported imported from Bitbucket

Comments

@Anteru
Copy link
Collaborator

Anteru commented Aug 31, 2019

(Original issue 1271 created by dereckson on 2016-07-29T00:20:08.320023+00:00)

As a sample, here the instructions for Etherpad encoding:

#!mysql

ALTER DATABASE `etherpad` CHARACTER SET utf8mb4 COLLATE utf8mb4_bin;
ALTER TABLE `etherpad.store` CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_bin;

Expected behavior: CHARACTER SET and CONVERT TO CHARACTER SET receive the same highlighting

Actual behavior: CHARACTER received the class k, SET the class kt

@Anteru Anteru added T-bug type: a bug X-imported imported from Bitbucket S-minor severity: minor labels Aug 31, 2019
@kurtmckee
Copy link
Contributor

I am working to overhaul the MySQL lexer. I will try to improve this behavior but note that "CHARACTER" and "SET" in the lex.h file in MySQL's source code never appear together. They are defined as separate keywords so it is highly likely that my PR will have them both rendered as the same token type, with nothing explicitly connecting them in the Pygments lexer.

I will consider this closed in my PR if CHARACTER and SET share the same token type.

kurtmckee added a commit to kurtmckee/pr-pygments that referenced this issue Aug 29, 2020
Fixes pygments#975, pygments#1063, pygments#1453

Changes include:

Documentation
-------------

* Note in the lexer docstring that Oracle MySQL is the target syntax.
  MariaDB syntax is not a target (though there is significant overlap).

Unit tests
----------

* Add 140 unit tests for MySQL.

Literals
--------

* Hexadecimal/binary/date/time/timestamp literals are supported.
* Integer mantissas are supported for scientific notation.
* In-string escapes are now tokenized properly.
* Support the "unknown" constant.

Comments
--------

* Optimizer hints are now supported, and keywords are
  recognized and tokenized as preprocessor instructions.
* Remove nested multi-line comment support, which is no
  longer supported in MySQL.

Variables
---------

* Support the '@' prefix for variable names.
* Lift restrictions on characters in unquoted variable names.
  (MySQL does not impose a restriction on lead characters.)
* Support single/double/backtick-quoted variable names, including escapes.
* Support the '@@' prefix for system variable names.
* Support '?' as a variable so people can demonstrate prepared statements.

Keywords
--------

* Keyword / data type / function are now in a separate, auto-updating file.
* Support 25 additional data types (including spatial and JSON types).
* Support 460 additional MySQL keywords.
* Support 372 MySQL functions.
  Explicit function support resolves a bug that causes non-function
  items to be treated as functions simply because they have a trailing
  opening parenthesis.
* Support exceptions for the 'SET' keyword, which is both a datatype and
  a keyword depending on context.

Schema object names
-------------------

* Support Unicode in MySQL schema object names.
* Support parsing of backtick-quoted schema object name escapes.
  (Escapes do not produce a distinct token type at this time.)

Operators
---------

* Remove non-operator characters from the list of operators.
* Remove non-punctuation characters from the list of punctuation.
kurtmckee added a commit to kurtmckee/pr-pygments that referenced this issue Aug 29, 2020
Fixes pygments#975, pygments#1063, pygments#1453

Changes include:

Documentation
-------------

* Note in the lexer docstring that Oracle MySQL is the target syntax.
  MariaDB syntax is not a target (though there is significant overlap).

Unit tests
----------

* Add 140 unit tests for MySQL.

Literals
--------

* Hexadecimal/binary/date/time/timestamp literals are supported.
* Integer mantissas are supported for scientific notation.
* In-string escapes are now tokenized properly.
* Support the "unknown" constant.

Comments
--------

* Optimizer hints are now supported, and keywords are
  recognized and tokenized as preprocessor instructions.
* Remove nested multi-line comment support, which is no
  longer supported in MySQL.

Variables
---------

* Support the '@' prefix for variable names.
* Lift restrictions on characters in unquoted variable names.
  (MySQL does not impose a restriction on lead characters.)
* Support single/double/backtick-quoted variable names, including escapes.
* Support the '@@' prefix for system variable names.
* Support '?' as a variable so people can demonstrate prepared statements.

Keywords
--------

* Keyword / data type / function are now in a separate, auto-updating file.
* Support 25 additional data types (including spatial and JSON types).
* Support 460 additional MySQL keywords.
* Support 372 MySQL functions.
  Explicit function support resolves a bug that causes non-function
  items to be treated as functions simply because they have a trailing
  opening parenthesis.
* Support exceptions for the 'SET' keyword, which is both a datatype and
  a keyword depending on context.

Schema object names
-------------------

* Support Unicode in MySQL schema object names.
* Support parsing of backtick-quoted schema object name escapes.
  (Escapes do not produce a distinct token type at this time.)

Operators
---------

* Remove non-operator characters from the list of operators.
* Remove non-punctuation characters from the list of punctuation.
kurtmckee added a commit to kurtmckee/pr-pygments that referenced this issue Aug 31, 2020
Fixes pygments#975
Fixes pygments#1063
Fixes pygments#1453

Changes include:

Documentation
-------------

* Note in the lexer docstring that Oracle MySQL is the target syntax.
  MariaDB syntax is not a target (though there is significant overlap).

Unit tests
----------

* Add 140 unit tests for MySQL.

Literals
--------

* Hexadecimal/binary/date/time/timestamp literals are supported.
* Integer mantissas are supported for scientific notation.
* In-string escapes are now tokenized properly.
* Support the "unknown" constant.

Comments
--------

* Optimizer hints are now supported, and keywords are
  recognized and tokenized as preprocessor instructions.
* Remove nested multi-line comment support, which is no
  longer supported in MySQL.

Variables
---------

* Support the '@' prefix for variable names.
* Lift restrictions on characters in unquoted variable names.
  (MySQL does not impose a restriction on lead characters.)
* Support single/double/backtick-quoted variable names, including escapes.
* Support the '@@' prefix for system variable names.
* Support '?' as a variable so people can demonstrate prepared statements.

Keywords
--------

* Keyword / data type / function are now in a separate, auto-updating file.
* Support 25 additional data types (including spatial and JSON types).
* Support 460 additional MySQL keywords.
* Support 372 MySQL functions.
  Explicit function support resolves a bug that causes non-function
  items to be treated as functions simply because they have a trailing
  opening parenthesis.
* Support exceptions for the 'SET' keyword, which is both a datatype and
  a keyword depending on context.

Schema object names
-------------------

* Support Unicode in MySQL schema object names.
* Support parsing of backtick-quoted schema object name escapes.
  (Escapes do not produce a distinct token type at this time.)

Operators
---------

* Remove non-operator characters from the list of operators.
* Remove non-punctuation characters from the list of punctuation.
Anteru pushed a commit that referenced this issue Sep 6, 2020
* Overhaul the MySQL lexer

Fixes #975, #1063, #1453

Changes include:

Documentation
-------------

* Note in the lexer docstring that Oracle MySQL is the target syntax.
  MariaDB syntax is not a target (though there is significant overlap).

Unit tests
----------

* Add 140 unit tests for MySQL.

Literals
--------

* Hexadecimal/binary/date/time/timestamp literals are supported.
* Integer mantissas are supported for scientific notation.
* In-string escapes are now tokenized properly.
* Support the "unknown" constant.

Comments
--------

* Optimizer hints are now supported, and keywords are
  recognized and tokenized as preprocessor instructions.
* Remove nested multi-line comment support, which is no
  longer supported in MySQL.

Variables
---------

* Support the '@' prefix for variable names.
* Lift restrictions on characters in unquoted variable names.
  (MySQL does not impose a restriction on lead characters.)
* Support single/double/backtick-quoted variable names, including escapes.
* Support the '@@' prefix for system variable names.
* Support '?' as a variable so people can demonstrate prepared statements.

Keywords
--------

* Keyword / data type / function are now in a separate, auto-updating file.
* Support 25 additional data types (including spatial and JSON types).
* Support 460 additional MySQL keywords.
* Support 372 MySQL functions.
  Explicit function support resolves a bug that causes non-function
  items to be treated as functions simply because they have a trailing
  opening parenthesis.
* Support exceptions for the 'SET' keyword, which is both a datatype and
  a keyword depending on context.

Schema object names
-------------------

* Support Unicode in MySQL schema object names.
* Support parsing of backtick-quoted schema object name escapes.
  (Escapes do not produce a distinct token type at this time.)

Operators
---------

* Remove non-operator characters from the list of operators.
* Remove non-punctuation characters from the list of punctuation.

* Cleanup items based on feedback

* Remove an unnecessary optional newline lookahead for single-line comments
Kenny2github pushed a commit to Kenny2github/pygments that referenced this issue Sep 22, 2020
* Overhaul the MySQL lexer

Fixes pygments#975, pygments#1063, pygments#1453

Changes include:

Documentation
-------------

* Note in the lexer docstring that Oracle MySQL is the target syntax.
  MariaDB syntax is not a target (though there is significant overlap).

Unit tests
----------

* Add 140 unit tests for MySQL.

Literals
--------

* Hexadecimal/binary/date/time/timestamp literals are supported.
* Integer mantissas are supported for scientific notation.
* In-string escapes are now tokenized properly.
* Support the "unknown" constant.

Comments
--------

* Optimizer hints are now supported, and keywords are
  recognized and tokenized as preprocessor instructions.
* Remove nested multi-line comment support, which is no
  longer supported in MySQL.

Variables
---------

* Support the '@' prefix for variable names.
* Lift restrictions on characters in unquoted variable names.
  (MySQL does not impose a restriction on lead characters.)
* Support single/double/backtick-quoted variable names, including escapes.
* Support the '@@' prefix for system variable names.
* Support '?' as a variable so people can demonstrate prepared statements.

Keywords
--------

* Keyword / data type / function are now in a separate, auto-updating file.
* Support 25 additional data types (including spatial and JSON types).
* Support 460 additional MySQL keywords.
* Support 372 MySQL functions.
  Explicit function support resolves a bug that causes non-function
  items to be treated as functions simply because they have a trailing
  opening parenthesis.
* Support exceptions for the 'SET' keyword, which is both a datatype and
  a keyword depending on context.

Schema object names
-------------------

* Support Unicode in MySQL schema object names.
* Support parsing of backtick-quoted schema object name escapes.
  (Escapes do not produce a distinct token type at this time.)

Operators
---------

* Remove non-operator characters from the list of operators.
* Remove non-punctuation characters from the list of punctuation.

* Cleanup items based on feedback

* Remove an unnecessary optional newline lookahead for single-line comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-minor severity: minor T-bug type: a bug X-imported imported from Bitbucket
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants