Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add QueryInterceptor to presto-jdbc #15565

Merged
merged 1 commit into from
Jan 15, 2021

Conversation

robbypete
Copy link
Contributor

@robbypete robbypete commented Dec 28, 2020

Introduce chainable query interceptors to Presto JDBC. QueryInterceptors can modify the behavior of a query or perform side effects. QueryInterceptors' preProcess() and postProcess() methods are run before and after the query is executed in the server, respectively. If more than one QueryInterceptor classname is provided separated by a semicolon, each will be called one-by-one from left to right in the classnames provided in the connection string. Both methods can optionally return a PrestoResultSet to override the ResultSet returned by the query.

Addresses issue #15142

Test plan - Run new tests in TestQueryInterceptor, TestPrestoDriverUri, and TestJdbcConnection

== RELEASE NOTES ==

JDBC changes
* Add the interface `QueryInterceptor` to allow for custom logic to be executed before or after query execution. (:pr:`15565`)

@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Dec 28, 2020

CLA Signed

The committers are authorized under a signed CLA.

  • ✅ Rob Peterson (50f4347d6d1ed3c9fbbf45cbaf25ac6b28aadcf7)

Copy link
Contributor

@tdcmeehan tdcmeehan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some nits

@robbypete robbypete changed the title [WIP] Add QueryInterceptor to presto-jdbc Add QueryInterceptor to presto-jdbc Dec 30, 2020
Copy link
Contributor

@mayankgarg1990 mayankgarg1990 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I spent a bit more time thinking about the basic design of this PR and I have some concerns over how the implementation is done.

First of all - I am guessing that you have a basic motive in mind that you want to solve - can you share that so that I can make sure that my further comments considers that.

In the existing design

  1. In the pre-step we are delegating the execution of the query to the interceptor. This generally involves the interceptor calling connection.getStatement().execute() and this in turn invokes the whole chain of interceptors again internally and then you again call them externally. In certain situations, you can get into an infinite loop as a result of this - for example - in your testMultiPreProcess if we change the second processor to change the execution to convert 456 to 123 which is the opposite of the first one - I would generally just expect this to nullify the effect of first one, rather than getting into an infinite loop.

  2. Something similar in the post step - someone can potentially just discard the results and completely re-run the query.

Depending on the use case you are trying to solve, can we start simple - maybe just modify the sql or some of the parameters of the query like SESSION_PROPERTIES ?

@robbypete
Copy link
Contributor Author

The use case that I specifically have for a QueryInterceptor is pretty narrow (which I think you picked up on): I want to be able to modify the sql in the client before it's sent the server. In #15142 Tim suggested to look into MySQL's implementation of the StatementInterceptor as inspiration and I thought their implementation of an interceptor is a good way for me to provide a more generic solution rather than just a sql filter for my own use case.

Depending on the use case you are trying to solve, can we start simple

There are other use cases which others are hoping to use a QueryInterceptor for which involve being able to perform side effects before/after query execution related to privacy and security for Tableau (and other corp apps using Presto). Starting simple might get complicated if we have to modify an interface down the road to expand on the interceptor concept as more use cases are developed.

Admittedly, these use other use cases I'm trying to cover don't involve reusing the connection like I am in my tests so maybe a good path forward is to rethink how preProcess should act, like maybe it could act as a filter on the statement instead of allowing a ResultSet to be returned.

this in turn invokes the whole chain of interceptors again internally and then you again call them externally. In certain situations, you can get into an infinite loop as a result of this

My thinking behind this design is that this is intended - I am reusing the same connection in the QueryInterceptor and all queries executed using that connection are treated the same. MySQL deals with this by using a executeTopLevelOnly flag in the QueryInteceptor interface where a QueryInterceptor can be marked to only run in the top-level query and not in queries issued by QueryInterceptors. We could either introduce that flag or, maybe better, make all QueryInterceptors top-level by default and instead make the flag signal whether a QueryInterceptor should run on all queries.

Something similar in the post step - someone can potentially just discard the results and completely re-run the query.

This is intended. I think allowing for the modification of the response is in-line with what someone would expect from an Interceptor/Filter.

All that being said I'm open to ideas to making this fit in better and not introduce anything unexpected.

@robbypete
Copy link
Contributor Author

Added flag to QueryInterceptor signaling if the QueryInterceptor should intercept nested queries i.e. queries executed by a QueryInterceptor.

@robbypete
Copy link
Contributor Author

robbypete commented Jan 8, 2021

I did some refactoring and wrote more tests.

I uncovered an issue after writing the test testForceNestedIntercept() which is currently failing. If an interceptor has forceNestedIntercept() return true and the interceptor returns a PrestoResultSet in each postProcess call (edge case, for sure) then checkArgument in addInnerResultSet() throws (in PrestoResultSet).

A few ideas I have in mind are to either restrict queries when forceNestedIntercept() is true, restrict queries when the statementDepth counter is above 2, or to change how we handle inner ResultSets.

@tdcmeehan
Copy link
Contributor

My preference would be to remove forceNestedIntercept and have this always be disabled. If we need to introduce it later, it's a backwards compatible change, and given that it's hard to think of non-esoteric use cases for it, it seems like skirting the issue for now is the quickest and most straightforward path.

@robbypete
Copy link
Contributor Author

Removed the forceNestedIntercept flag and changed logic so that only top-level queries are intercepted.

@mayankgarg1990
Copy link
Contributor

My apologies for the delayed response - due to internal beginning of year processes I am a bit swamped - I will get to this PR early next week

Copy link
Contributor

@tdcmeehan tdcmeehan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly LGTM. Mainly I think the statement depth state should go to the statement (see below).

@robbypete
Copy link
Contributor Author

Moved the statementDepth into the Statement and removed the Connection parameter from the init.

Copy link
Contributor

@tdcmeehan tdcmeehan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it looks like we now don't take in a Connection in the init method, this trivially avoids the problem of overlapping interceptors. This forces one to use the Statement to execute a new query.

I think this is fine, however in order to better future proof this feature, I think we need to plug a hole in our current PrestoResultSet. According to the JavaDoc for Statement:

By default, only one ResultSet object per Statement object can be open at the same time. Therefore, if the reading of one ResultSet object is interleaved with the reading of another, each must have been generated by different Statement objects. All execution methods in the Statement interface implicitly close a statement's current ResultSet object if an open one exists.

There's similar language in the ResultSet JavaDoc: https://docs.oracle.com/javase/6/docs/api/java/sql/ResultSet.html#close()

Right now, an execution method will override the currentResultSet in the Statement, but not close the old one. I think we need to also make sure the old result set is closed first. This will alter the behavior for postProcess. For example, one may no longer be able to "zip" two ResultSets (the first ResultSet will automatically be closed once we obtain the second ResultSet), but it's better to enforce this now before backwards incompatible behavior is added in the client.

    private void clearCurrentResults()
            throws SQLException
    {
        PrestoResultSet rs = currentResult.getAndSet(null);
        if (rs != null) {
            rs.close();
        }
        currentUpdateCount.set(-1);
        currentUpdateType.set(null);
        currentWarningsManager.set(Optional.empty());
    }

@robbypete
Copy link
Contributor Author

Applied suggested changes to BaseQueryInterceptor.java and clearCurrentResults()

Copy link
Contributor

@mayankgarg1990 mayankgarg1990 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall the PR looks reasonable to me know. Thanks for addressing my concerns :) I do have a couple of comments around the code styling.

I have one thought that I want to discuss which is when is close() called on PrestoResultSet. The reason this close() is important is that it is linked to the execution of the underlying query and it stops the query if close is invoked. As of now the close is invoked in 2 locations -

  1. There is a chaining of result sets happening via the innerResultSet parameter in PrestoResultSet and the close is invoked in a chaining fashion when it is invoked on the last result set.
  2. Close is called in clearCurrentResults which is called every time executeInternal is invoked. This should happen whenever one of the interceptor decides to invoke execute on the PrestoStatement object

Given that we have (2), do you feel that we need (1). One situation that we need (1) for sure is when we don't reuse Statement that is passed in rather just create a new JDBC connection and return a ResultSet created from that statement - but not sure if we are trying to optimize for that situation.

@robbypete
Copy link
Contributor Author

I think you're right about innerResultSet and it can be removed since we're constraining the interceptors to the same Statement.

@mayankgarg1990
Copy link
Contributor

Also - we should fix the release note to be more descriptive

  1. Remove the General changes section
  2. Rewrite it maybe as
JDBC changes
* Add the interface `QueryInterceptor` to allow for custom logic to be executed before or after query execution. (:pr:`15565`)

@robbypete
Copy link
Contributor Author

Applied suggested changes.

  • Updated internalExecute() so that ResultSets from top level queries are set in currentResult before calling invokeQueryInterceptorsPost which allowed me to remove innerResultSet logic
  • Removed the BaseQueryInterceptor and put default methods in the interface
  • Moved the destroy() calls so that it would still be called after failed ROLLBACK
  • Updated java docs
  • Changed the return type of preProcess and postProcess to be Optional<PrestoResultSet>
  • and more smaller changes

Copy link
Contributor

@mayankgarg1990 mayankgarg1990 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me - last few comments more around code suggestions - so good to go from my side

@robbypete
Copy link
Contributor Author

Applied changes. Noticed that, because we got rid of the BaseQueryInterceptor, I had to update testUriWithQueryInterceptors in TestPrestoDriverUri and add a class that it could use in the test.

Copy link
Contributor

@tdcmeehan tdcmeehan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tdcmeehan tdcmeehan merged commit 1d78838 into prestodb:master Jan 15, 2021
@caithagoras caithagoras mentioned this pull request Feb 2, 2021
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants