Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: sql method doesn't work when the query uses LIMIT clause #1903

Merged
merged 4 commits into from
Aug 8, 2019

Conversation

mpeaton
Copy link
Contributor

@mpeaton mpeaton commented Jul 30, 2019

Currently, SQL expressions which contain a LIMIT statement trigger an exception from the Omnisci Validator, as 'LIMIT 1' is used in the get_schema_using_query routine.
Exception: Sort node not supported as input to another sort

This PR introduced a BUGFIX which utilizes the built in mapd sql_validate routine, and associated thrift type mappings to build the schema via direct reference, without the one line query.

UPDATE:

  • added regex to remove ;
  • added tests using raw sql queries
  • set option 'default_limit' to None to avoid default LIMIT 10000

@xmnlab xmnlab changed the title BUGFIX: Use sql_validate to retrieve schema instead of single line query BUG: Use sql_validate to retrieve schema instead of single line query Aug 1, 2019
Copy link
Member

@cpcloud cpcloud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a test?

@@ -448,6 +471,12 @@ def _get_schema_using_query(self, query):

return sch.Schema(names, ibis_types)

def _get_schema_using_validator(self, query):
result = self.con._client.sql_validate(self.con._session, query)
return sch.Schema.from_dict({r: MapDDataType._mapd_to_ibis_dtypes[
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use sch.Schema.from_tuples instead of from_dict so that order of columns is preserved.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Working testing. I'm still a little uncertain about how ibis testing is dispatched, and what specific testing criteria would be appropriate for this change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mpeaton maybe you can improve this https://github.com/ibis-project/ibis/blob/master/ibis/tests/all/test_client.py#L69

also if you want to add tests that could break in another backend .. you can also add tests for sql at https://github.com/ibis-project/ibis/blob/master/ibis/mapd/tests/test_client.py

if you want any help related to tests ping me :)

Copy link
Contributor Author

@mpeaton mpeaton Aug 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After some deliberation, it seems that _get_schema already has test coverage within ibis/tests/all/test_client.py::test_query_schema. I was thinking that finer granularity testing would entail testing raw SQL strings against sql_validate, and would be somewhat into the weeds.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this particular PR at least...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure raw SQLs would be much better!

@cpcloud
Copy link
Member

cpcloud commented Aug 1, 2019

@mpeaton Does this also supersede #1902? If so, please close that one.

@cpcloud cpcloud added this to the Next Feature Release milestone Aug 1, 2019
@cpcloud cpcloud added bug Incorrect behavior inside of ibis omnisci labels Aug 1, 2019
ibis/mapd/client.py Outdated Show resolved Hide resolved
Copy link
Contributor

@xmnlab xmnlab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good @mpeaton !

I think we could change this PR from BUG to SUPP as self.con._client.sql_validate is a better implementation than limit 1 approach, and we can treat limit clause and ; and sql comment issues in another PR.

just check the comment related to _TDatumType_enum

current we are using the datumtype from pymapd:

if you need a dictionary .. maybe you can use it from here: https://github.com/omnisci/pymapd/blob/master/omnisci/common/ttypes.py#L34

@xmnlab
Copy link
Contributor

xmnlab commented Aug 6, 2019

@mpeaton it seems you merged master into your branch, is it right?
could you rebase your code? maybe also squash your code?
because now it show 83 files changed .. and I know you change just 3 or 4 files ... for some reason git are confused to compare your changes now.

@xmnlab
Copy link
Contributor

xmnlab commented Aug 6, 2019

@mpeaton

I am doing the rebase here .. and I will work to finish this PR.
thanks for having work on this, sql_validate does the trick here! thanks!

@xmnlab xmnlab changed the title BUG: Use sql_validate to retrieve schema instead of single line query SUPP: Use sql_validate to retrieve schema instead of single line query with LIMIT clause Aug 6, 2019
@xmnlab xmnlab changed the title SUPP: Use sql_validate to retrieve schema instead of single line query with LIMIT clause BUG: sql method doesn't work when the query uses LIMIT clause Aug 6, 2019
@xmnlab
Copy link
Contributor

xmnlab commented Aug 6, 2019

  • added regex to remove ;
  • added tests using raw sql queries
  • set option 'default_limit' to None to avoid default `LIMIT 10000

@xmnlab xmnlab self-requested a review August 6, 2019 22:13
@xmnlab xmnlab requested a review from cpcloud August 7, 2019 13:40
@xmnlab
Copy link
Contributor

xmnlab commented Aug 7, 2019

@cpcloud it seems it is done for a new review! :) thanks!

sys.version_info < (3, 6),
reason='`sql` method just available for Python 3.6 or greater.',
)
def test_sql(con, sql):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these tests actually fail without the regex munging?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes ... also for another backend if I recall correctly .. that is why I needed to move these tests from all to mapd

Copy link
Contributor

@xmnlab xmnlab Aug 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://dev.azure.com/ibis-project/ibis/_build/results?buildId=1286&view=logs

here you can see that these tests failed for impala too (LinuxTest py37)

@xmnlab
Copy link
Contributor

xmnlab commented Aug 8, 2019

@cpcloud any thoughts about this PR?

Copy link
Member

@cpcloud cpcloud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@cpcloud cpcloud merged commit 5e8f1e4 into ibis-project:master Aug 8, 2019
@cpcloud cpcloud modified the milestones: Next Feature Release, Next Major Release Aug 26, 2019
@xmnlab xmnlab deleted the validator branch December 21, 2020 18:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Incorrect behavior inside of ibis
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants