Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve download speed of records by the CLI and jdbc client #965

Open
wants to merge 2 commits into
base: master
from

Conversation

5 participants
@ShashwatArghode
Copy link

commented Jun 12, 2019

The sequential read by the client makes the download really slow. Split it into two threads, with one thread downloading the data and buffering while the other thread renders it to stdout (CLI) or returns it to the user (jdbc)

@cla-bot

This comment has been minimized.

Copy link

commented Jun 12, 2019

Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Shashwat Arghode.
This is most likely caused by a git client misconfiguration; please make sure to:

  1. check if your git client is configured with an email to sign commits git config --list | grep email
  2. If not, set it up using git config --global user.email email@example.com
  3. Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails

@ShashwatArghode ShashwatArghode force-pushed the ShashwatArghode:downloadSpeedPatch branch from 646b5de to 2afb7b1 Jun 12, 2019

@cla-bot cla-bot bot added the cla-signed label Jun 12, 2019

Improve download speed of records by the CLI and jdbc client
The sequential read by the client makes the download really slow. Split it into two threads, with one thread downloading the data and buffering while the other thread renders it to stdout (CLI) or returns it to the user (jdbc)

@ShashwatArghode ShashwatArghode force-pushed the ShashwatArghode:downloadSpeedPatch branch from 53ad4a1 to 5b0d28f Jun 12, 2019

@@ -83,6 +83,64 @@ public void finish()
writer.flush();

This comment has been minimized.

Copy link
@kokosing

kokosing Jun 12, 2019

Member

can you please tell what improvements you get?

This comment has been minimized.

Copy link
@ShashwatArghode

ShashwatArghode Jun 14, 2019

Author

Following is the average time taken by the query based on 3-4 runs on the data of ~160MB and 3000000 rows
CLI performance:
Before changes: 86 sec
After changes: 62 sec

JDBC performance:
Before changes: 106 sec
After changes: 90 sec

The average improvement in CLI is 28% and JDBC is 15%.

@electrum
Copy link
Member

left a comment

Thanks for contributing this feature. It will be useful to many people.

I added a few high level comments. Can you split the CLI and JDBC changes into separate commits, or even separate pull requests if that's easier? They are logically separate, so that will make them easier to review.

Show resolved Hide resolved presto-jdbc/src/main/java/io/prestosql/jdbc/PrestoResultSet.java Outdated
Show resolved Hide resolved presto-jdbc/src/main/java/io/prestosql/jdbc/PrestoResultSet.java
Show resolved Hide resolved presto-jdbc/src/main/java/io/prestosql/jdbc/PrestoStatement.java Outdated

public final class OutputHandler
implements Closeable
{
private static final Duration MAX_BUFFER_TIME = new Duration(3, SECONDS);

This comment has been minimized.

Copy link
@electrum

electrum Jun 12, 2019

Member

AlignedTablePrinter prints a fixed width table, where it uses the maximum width of a cell in a batch to determine the width of the table. The goal of this code is to provide large batches of rows to the printer so that the width doesn't change every few rows. It also has an upper time limit for each batch in case the query is returning data slowly (which can happen for very selective queries).

This comment has been minimized.

Copy link
@ShashwatArghode

ShashwatArghode Jun 14, 2019

Author

Keeping the numElements parameter in Guava's Queues.drain() API 100. So, we will either wait for 100 records or 3 seconds before draining the queue if the query is not finished.

Show resolved Hide resolved presto-cli/src/main/java/io/prestosql/cli/OutputHandler.java Outdated
@ShashwatArghode

This comment has been minimized.

Copy link
Author

commented Jun 14, 2019

Thanks for contributing this feature. It will be useful to many people.

I added a few high level comments. Can you split the CLI and JDBC changes into separate commits, or even separate pull requests if that's easier? They are logically separate, so that will make them easier to review.

Keeping the CLI and JDBC changes in the same PR as the changes are identical and after incorporating review comments they are impacting fewer files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.