Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLN: Removing Query class #2789

Merged
merged 34 commits into from Jun 3, 2021

Conversation

datapythonista
Copy link
Contributor

The Query class and subclasses have a tricky implementation, and are causing problems in #2777.

The class SQLClient instantiates the Query class, passing itself as a parameter, and then calls it's execute method, which will call the client _execute method. This seems nonsense, and just implementing the functionality of Query (which is quite simple) in SQLClient seems to be much more reasonable.

@datapythonista datapythonista added the refactor Issues or PRs related to refactoring the codebase label May 24, 2021
@datapythonista datapythonista changed the title WIP: Removing Query class CLN: Removing Query class Jun 2, 2021
@datapythonista
Copy link
Contributor Author

@jreback this is finally passing the tests, if you don't mind having a look.

@jreback
Copy link
Contributor

jreback commented Jun 3, 2021

yeah looks good. assume this has no effect on perf anywhere?

prob worth adding a release note (even though in theory users shouldn't be affected)

@datapythonista
Copy link
Contributor Author

Added the release note. I don't see how this could affect performance (the code is exactly the same, just not using Query but methods of Client instead).

@icexelloss
Copy link
Contributor

LGTM but I am little concerned about the implications of this change on other "Query" related class such as QueryBuilder, QueryPipeline etc.

Do those classes continue to work with the removal of the Query class? Or perhaps those are not used as well?

@icexelloss
Copy link
Contributor

I am not familiar with the sql backends so maybe they are not related as all (only share the same word query in the class name)

@datapythonista
Copy link
Contributor Author

LGTM but I am little concerned about the implications of this change on other "Query" related class such as QueryBuilder, QueryPipeline etc.

Do those classes continue to work with the removal of the Query class? Or perhaps those are not used as well?

QueryBuilder is the core class of SQL backends. It's about building the SQL, while Query was independent to it, and called directly from the Client, once a SQL string was available (whether because it was written manually or generated from an Ibis expression). QueryPipeline was an empty class, never used, that was a placeholder for a future idea, but never implemented, so it's removed here.

Not sure what's the concern regarding those classes, but to me this PR is a quite important milestone. This code was generating problems when trying to clean up things, and also was very confusing (for no reason), as explained in the description. Also, based on your comment, I think it helps also to not confuse things with the other unrelated concepts, since Query here was more executing processing a cursor and generating a pandas dataframe, and in other places query refers to building a SQL.

Just to note, this is not a final stage of things. This is important since Query was very tricky, and very poorly decoupled from the Client, since it was supposed to do things, that then it was expecting the client to have certain non-obvious methods that it was calling. Making it a nightmare of hidden shortcuts. After this is merged, all the much bigger mess of the QueryBuilder is fixed, and lots of duplicate code in the backends is removed, it'll be quite easy to discuss specific changes on how to structure things and make the changes. For now IMO this is just needed in order to be able to have a decent code that can be maintained.

@datapythonista
Copy link
Contributor Author

@icexelloss let me know if my previous comment makes sense, or if you've got any other concern. Would be nice to merge this soon, as it's blocking me from the working in the other PRs I've got open. Thanks!

@jreback jreback added this to the Next release milestone Jun 3, 2021
@icexelloss
Copy link
Contributor

Sounds good to me! +1

@jreback jreback merged commit 358c369 into ibis-project:master Jun 3, 2021
@jreback
Copy link
Contributor

jreback commented Jun 3, 2021

thanks @datapythonista very nice

@emilyreff7
Copy link
Contributor

@datapythonista removing the Query class breaks this usage in ibis-bigquery:

https://github.com/ibis-project/ibis-bigquery/blob/main/ibis_bigquery/client.py#L174

I haven't taken a detailed look at this refactor PR yet but do you see a way to refactor this usage as well?

@datapythonista
Copy link
Contributor Author

@datapythonista removing the Query class breaks this usage in ibis-bigquery:

https://github.com/ibis-project/ibis-bigquery/blob/main/ibis_bigquery/client.py#L174

I haven't taken a detailed look at this refactor PR yet but do you see a way to refactor this usage as well?

Unfortunately there is no way to make these improvements without breaking the backends. It's not only bigquery, all them are affected. For the ones in this repo the PR updates them to the new way of implementing things. For the rest (bigquery, omnisci, mssql...) it'll need to be done before the next release. The changes required are the same as in this PR, if you check clickhouse for example, you'll see what needs to be done. It's quite small.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
refactor Issues or PRs related to refactoring the codebase
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants