Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize overlap checking for external file ingestion #3564

Closed
wants to merge 1 commit into from
Closed

Optimize overlap checking for external file ingestion #3564

wants to merge 1 commit into from

Conversation

huachaohuang
Copy link
Contributor

If there are a lot of overlapped files in L0, creating a merging iterator for
all files in L0 to check overlap can be very slow because we need to read and
seek all files in L0. However, in that case, the ingested file is likely to
overlap with some files in L0, so if we check those files one by one, we can stop
once we encounter overlap.

Ref: #3540

@huachaohuang
Copy link
Contributor Author

I have done some tests to show the comparison. During each test, 8 batches of files are ingested to a db. Files in a batch are sorted and not overlapped with each other, while files in different batches are overlapped with each other.

Here are the average ingestion time and flame graph.

  1. Without optimization:
    ingestion-time-without-optimization
    flame-graph-without-optimization

  2. With optimization in this PR:
    ingestion-time-with-optimization
    flame-graph-with-optimization

As you can see, without the optimization, the average ingestion time increases significantly after the 6th batch, when the test starts to ingest a lot of files in L0.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@siying has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Contributor

@anand1976 anand1976 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Travis TEST_GROUP 2 is failing due to out-of-space. Rebase and run that test again?

If there are a lot of overlapped files in L0, creating a merging iterator for
all files in L0 to check overlap can be very slow, because we need to read and
seek all files in L0. However, in that case, the ingested file is likely to
overlap with some files in L0, so if we check those files one by one, wecan stop
once we encounter overlap.
@facebook-github-bot
Copy link
Contributor

@huachaohuang has updated the pull request.

@huachaohuang
Copy link
Contributor Author

@anand1976 Done, PTAL

@anand1976
Copy link
Contributor

lgtm

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anand1976 is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@huachaohuang huachaohuang deleted the optimize-sst branch March 17, 2018 04:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants