Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overhaul the MySQL lexer #1527

Merged
merged 3 commits into from Sep 6, 2020
Merged

Overhaul the MySQL lexer #1527

merged 3 commits into from Sep 6, 2020

Conversation

kurtmckee
Copy link
Contributor

Fixes #975, #1063, #1453

Changes include:

Documentation

  • Note in the lexer docstring that Oracle MySQL is the target syntax.
    MariaDB syntax is not a target (though there is significant overlap).

Unit tests

  • Add 140 unit tests for MySQL.

Literals

  • Hexadecimal/binary/date/time/timestamp literals are supported.
  • Integer mantissas are supported for scientific notation.
  • In-string escapes are now tokenized properly.
  • Support the "unknown" constant.

Comments

  • Optimizer hints are now supported, and keywords are
    recognized and tokenized as preprocessor instructions.
  • Remove nested multi-line comment support, which is no
    longer supported in MySQL.

Variables

  • Support the '@' prefix for variable names.
  • Lift restrictions on characters in unquoted variable names.
    (MySQL does not impose a restriction on lead characters.)
  • Support single/double/backtick-quoted variable names, including escapes.
  • Support the '@@' prefix for system variable names.
  • Support '?' as a variable so people can demonstrate prepared statements.

Keywords

  • Keyword / data type / function are now in a separate, auto-updating file.
  • Support 25 additional data types (including spatial and JSON types).
  • Support 460 additional MySQL keywords.
  • Support 372 MySQL functions.
    Explicit function support resolves a bug that causes non-function
    items to be treated as functions simply because they have a trailing
    opening parenthesis.
  • Support exceptions for the 'SET' keyword, which is both a datatype and
    a keyword depending on context.

Schema object names

  • Support Unicode in MySQL schema object names.
  • Support parsing of backtick-quoted schema object name escapes.
    (Escapes do not produce a distinct token type at this time.)

Operators

  • Remove non-operator characters from the list of operators.
  • Remove non-punctuation characters from the list of punctuation.

Fixes pygments#975, pygments#1063, pygments#1453

Changes include:

Documentation
-------------

* Note in the lexer docstring that Oracle MySQL is the target syntax.
  MariaDB syntax is not a target (though there is significant overlap).

Unit tests
----------

* Add 140 unit tests for MySQL.

Literals
--------

* Hexadecimal/binary/date/time/timestamp literals are supported.
* Integer mantissas are supported for scientific notation.
* In-string escapes are now tokenized properly.
* Support the "unknown" constant.

Comments
--------

* Optimizer hints are now supported, and keywords are
  recognized and tokenized as preprocessor instructions.
* Remove nested multi-line comment support, which is no
  longer supported in MySQL.

Variables
---------

* Support the '@' prefix for variable names.
* Lift restrictions on characters in unquoted variable names.
  (MySQL does not impose a restriction on lead characters.)
* Support single/double/backtick-quoted variable names, including escapes.
* Support the '@@' prefix for system variable names.
* Support '?' as a variable so people can demonstrate prepared statements.

Keywords
--------

* Keyword / data type / function are now in a separate, auto-updating file.
* Support 25 additional data types (including spatial and JSON types).
* Support 460 additional MySQL keywords.
* Support 372 MySQL functions.
  Explicit function support resolves a bug that causes non-function
  items to be treated as functions simply because they have a trailing
  opening parenthesis.
* Support exceptions for the 'SET' keyword, which is both a datatype and
  a keyword depending on context.

Schema object names
-------------------

* Support Unicode in MySQL schema object names.
* Support parsing of backtick-quoted schema object name escapes.
  (Escapes do not produce a distinct token type at this time.)

Operators
---------

* Remove non-operator characters from the list of operators.
* Remove non-punctuation characters from the list of punctuation.
@Anteru Anteru requested a review from birkenfeld August 30, 2020 07:10
@Anteru Anteru self-assigned this Aug 30, 2020
@Anteru Anteru added this to the 2.7 milestone Aug 30, 2020
@Anteru
Copy link
Collaborator

Anteru commented Aug 30, 2020

This is a massive overhaul -- thanks a lot for the PR and the very detailed PR message. Please give us some time to review this -- looks mostly good to me, but I have only skimmed the regexes so far.

@kurtmckee
Copy link
Contributor Author

kurtmckee commented Aug 30, 2020 via email

@Anteru
Copy link
Collaborator

Anteru commented Aug 30, 2020

No need to attach screenshots -- I will run it locally; and I'm not too concerned about incorrect/missing syntax as I am about a regex with catastrophic backtracking.

Re issues: You don't have to reference the issues manually, I will close them when I merge the PR and update the changelog.

Thanks again!

@kurtmckee
Copy link
Contributor Author

kurtmckee commented Aug 30, 2020 via email

pygments/lexers/sql.py Outdated Show resolved Hide resolved
pygments/lexers/sql.py Outdated Show resolved Hide resolved
pygments/lexers/sql.py Show resolved Hide resolved
pygments/lexers/sql.py Show resolved Hide resolved
pygments/lexers/sql.py Show resolved Hide resolved
@birkenfeld
Copy link
Member

Thanks for the huge overhaul! Not much for me to pick on, just a few clarifications...

pygments/lexers/_mysql_builtins.py Outdated Show resolved Hide resolved
return results


def update_content(field_name, content):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(note to self: we really should have a util function that does the main bulk of this self-updating...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heh, I'll let you resolve this one at your leisure. 😃

@kurtmckee
Copy link
Contributor Author

kurtmckee commented Aug 31, 2020 via email

@Anteru Anteru merged commit b3f1691 into pygments:master Sep 6, 2020
@Anteru Anteru added the changelog-update Items which need to get mentioned in the changelog label Sep 6, 2020
@kurtmckee
Copy link
Contributor Author

@Anteru please close #1063 and #1453, too. These are also fixed by this patch. Thanks!

@kurtmckee kurtmckee deleted the update-mysql branch September 6, 2020 19:35
@Anteru Anteru removed the changelog-update Items which need to get mentioned in the changelog label Sep 8, 2020
Kenny2github pushed a commit to Kenny2github/pygments that referenced this pull request Sep 22, 2020
* Overhaul the MySQL lexer

Fixes pygments#975, pygments#1063, pygments#1453

Changes include:

Documentation
-------------

* Note in the lexer docstring that Oracle MySQL is the target syntax.
  MariaDB syntax is not a target (though there is significant overlap).

Unit tests
----------

* Add 140 unit tests for MySQL.

Literals
--------

* Hexadecimal/binary/date/time/timestamp literals are supported.
* Integer mantissas are supported for scientific notation.
* In-string escapes are now tokenized properly.
* Support the "unknown" constant.

Comments
--------

* Optimizer hints are now supported, and keywords are
  recognized and tokenized as preprocessor instructions.
* Remove nested multi-line comment support, which is no
  longer supported in MySQL.

Variables
---------

* Support the '@' prefix for variable names.
* Lift restrictions on characters in unquoted variable names.
  (MySQL does not impose a restriction on lead characters.)
* Support single/double/backtick-quoted variable names, including escapes.
* Support the '@@' prefix for system variable names.
* Support '?' as a variable so people can demonstrate prepared statements.

Keywords
--------

* Keyword / data type / function are now in a separate, auto-updating file.
* Support 25 additional data types (including spatial and JSON types).
* Support 460 additional MySQL keywords.
* Support 372 MySQL functions.
  Explicit function support resolves a bug that causes non-function
  items to be treated as functions simply because they have a trailing
  opening parenthesis.
* Support exceptions for the 'SET' keyword, which is both a datatype and
  a keyword depending on context.

Schema object names
-------------------

* Support Unicode in MySQL schema object names.
* Support parsing of backtick-quoted schema object name escapes.
  (Escapes do not produce a distinct token type at this time.)

Operators
---------

* Remove non-operator characters from the list of operators.
* Remove non-punctuation characters from the list of punctuation.

* Cleanup items based on feedback

* Remove an unnecessary optional newline lookahead for single-line comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CHARACTER SET isn't recognized on MySQL lexer
3 participants