Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance: Two database queries are executed for each invocation of query() #14

Open
jprafael opened this issue Apr 20, 2015 · 7 comments

Comments

@jprafael
Copy link

When invoking the AbstractDatabaseEngine#query(Expression expr) method two database queries are executed.

The first call is invoked with the following call stack:

AbstractDatabaseEngine#checkConnection()
AbstractDatabaseEngine#getConnection()
AbstractDatabaseEngine#iterator(String query, int fetchSize)

This query is only used to check the aliveness of the connection. Because of this call each query() incurs in twice the round-trip time. It is also somewhat useless since it doesn't guarantee that the intended operation will succeed because the connection might be dropped meanwhile.

@diogoeag
Copy link
Member

The idea behind this check is that the probability of a failure during the execution of a query (not only iterators) should be very similar to the probability of the failure of the check connection. With this we could retry the connection before starting executing the desired operation.

It is true that the rtt is doubled, the question is that if that impacts for real the performance of your application

@rpvilao
Copy link
Collaborator

rpvilao commented Apr 21, 2015

For me it seems that makes sense to add an option to control whether you
want or not for that check to be performed. That might some scenarios where
reconnection is not needed and it's not important to recover.

On 21 April 2015 at 12:11, diogoeag notifications@github.com wrote:

The idea behind this check is that the probability of a failure during the
execution of a query (not only iterators) should be very similar to the
probability of the failure of the check connection. With this we could
retry the connection before starting executing the desired operation.

It is true that the rtt is doubled, the question is that if that impacts
for real the performance of your application


Reply to this email directly or view it on GitHub
#14 (comment).

@defer
Copy link
Collaborator

defer commented Apr 21, 2015

I think it could be optimistic, try the query. If it fails, just do the check, retries and try again.

Then again if the rtt is a problem, there's probably something wrong with the client design anyway because it should be either batching for writes or reading asynchronously while other work completes.

@jprafael
Copy link
Author

@defer, @diogoeag PDB has no (truly) asynchronous methods. We are mimicking this using a separate thread-pool. But note that query() is not a cheap call. Our profiling shows that on a total of 16 seconds spent inside the checkConnection() function, only 4s are spent blocking for IO. The remaining 12 seconds are spent on the driver's internal functions and will keep a CPU thread busy. This prevents us from simply raising the number of processing threads until it is no longer a problem as it would degrade the performance on other components.

More importantly, the response latency is doubled which is a problem when PDB is used in a realtime environment. In this scenario batching is also sub-optimal because you either use very small batches and the RTT becomes a problem, or you use larger batches and the batch fill time itself becomes a problem.

I agree we should be optimistic and use the available connection. I would actually go a bit further and say that reconnects / retries should not be the responsibility of query() but of its caller (which might have specialized recovery/failure behavior).

@diogoeag
Copy link
Member

@jprafael these 16s are during what period of profilling? If its 20seconds its one thing, if its one day, I wouldn't care about it. Note that some of our HA capabilities (the ones related to database connections) are strictly tied to the feature of retry / recover of PDB. That will not change for now.

Can you elaborate on the real weight of this on the overall process?

@defer
Copy link
Collaborator

defer commented Apr 21, 2015

I'm not entirely sure you should be depending on pdb for latency sensitive workloads. However, if completion of the task is not a requirement for the caller to finish then yeah, either pdb or the client should behave asynchronously.

That being said, 12s of internal work seems odd and should probably be looked at. Do you have more granular data on what is taking that long?

@jprafael
Copy link
Author

@diogoeag the JVM was running for 8 mins but the application was running only on the last 4 mins (wall time) of profiling on the entire application. Total thread time amounts to 15000s but is mostly irrelevant since most threads are blocking/waiting. Total executing time of the application however is 914s. So this amounts to roughly 1.64% of the application time.

@defer It is mostly oracle.jdbc.driver.* related. I can upload an HTML export of the measurements of this particular function. Where should I send it to?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants