Skip to content
This repository has been archived by the owner on Jun 11, 2021. It is now read-only.

ADD parsing option for column names containing spaces #35

Merged
merged 4 commits into from
Nov 13, 2018

Conversation

mknorps
Copy link
Contributor

@mknorps mknorps commented Nov 9, 2018

Hi,
I encountered a problem when parsing SQL queries with column names that contained special characters like space. In MySql (I use mariaDB) such columns are put into backticks, so a valid query yields:

SELECT `user ID`
FROM a

I added a test and parsing option to the sql_parse module.

Please merge it if you find it useful.

requirements.txt Outdated
@@ -1,2 +1,3 @@
mo-future
pyparsing
six
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not see where six is being used. Did I miss something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is used in formatting.py, line 16

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh! good catch! Please remove the six from requirements.txt and add it to tests/requirements.txt

@klahnakoski
Copy link
Contributor

Please add a test to https://github.com/mozilla/moz-sql-parser/blob/dev/tests/test_simple.py that confirms the backtick string is parsed properly. I will look into you other test to see why it appears to misbehave.

Thank you!


def test_191(self):
expected_sql = "SELECT `user ID` FROM a"
expected_json = {'select': {'value': '`user ID`'}, 'from': 'a'}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not see the backticks in the expected json. Maybe there is a bigger problem in verify_formatting that is letting this test pass

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I keep this test?
The other option would be, as you mentioned, modify escape function to use non-ansi quotes (`) for specific column names.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test must change. Change expected_json to not include the backticks:

expected_json = {'select': {'value': 'user ID'}, 'from': 'a'}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, forget my last message, and remove this test. The formatting.py generates double quoted column names

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I marked test as skipped, so limits of formatting.py are known.

@klahnakoski
Copy link
Contributor

OK, I think I see part of the problem:

def escape(identifier, ansi_quotes, should_quote):

The escape function does not recognize the backticks, and is wrapping the identifier in double quotes. the next question is why are the backticks still there.

@klahnakoski
Copy link
Contributor

I made this test (you can add it on your end, and push them to your branch so it becomes part of this PR)

def test_backtick(self):
    result = parse("SELECT `user ID` FROM a")
    expected = {'select': {'value': 'user ID'}, 'from': 'a'}
    self.assertEqual(result, expected)

which I see a parsing problem at

https://github.com/mozilla/moz-sql-parser/blob/b30c7c9832dc605d8951646c3c53bf0bb778e7ab/moz_sql_parser/sql_parser.py#L225..L226

Please verify how a backtick in such a string is escaped; does it use backslash (\` ) or double backtick (``)?

expected = {'select': {'value': 'user ID'}, 'from': 'a'}
self.assertEqual(result, expected)

def test_backticki_escape(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

excellent! I hoped you would add his type of test.

@@ -222,6 +222,8 @@ def unquote(instring, tokensStart, retTokens):
elif val.startswith('"') and val.endswith('"'):
val = '"'+val[1:-1].replace('""', '\\"')+'"'
# val = val.replace(".", "\\.")
elif val.startswith('`') and val.endswith('`'):
val = "'" + val[1:-1].replace("''", "\\'").replace("``","`") + "'"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove the .replace("''", "\\'") from this line. As far as I can tell, sequential single quotes ('') are not converted to a single quote inside a backtick string.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, done in current commit

ADD handling for 'LEFT JOIN' in parser
ADD test for 'LEFT JOIN' handling
SKIP test in format_and_parse for backticks and 'LEFT JOIN'
@mknorps
Copy link
Contributor Author

mknorps commented Nov 13, 2018

Using the package longer I found one more missing element - LEFT JOIN.

The addition is so small, I pushed it in the same PR.
Tests are updated.

Than you very much for creating this package. It is very useful and compact.

@klahnakoski klahnakoski merged commit b39fbee into mozilla:dev Nov 13, 2018
@@ -1080,3 +1080,19 @@ def test_190(self):
expected_sql = "SELECT typeof(sum(a3)) FROM a GROUP BY a1"
expected_json = {'from': 'a', 'select': {'value': {'typeof': {'sum': 'a3'}}}, 'groupby': {'value': 'a1'}}
self.verify_formatting(expected_sql, expected_json)

@skip("The escape function does not recognize the backticks,"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea to skip the tests! If someone works on a MySQL formatter, they have something to start with.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants