-
Notifications
You must be signed in to change notification settings - Fork 6.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize overlap checking for external file ingestion #3564
Conversation
I have done some tests to show the comparison. During each test, 8 batches of files are ingested to a db. Files in a batch are sorted and not overlapped with each other, while files in different batches are overlapped with each other. Here are the average ingestion time and flame graph. As you can see, without the optimization, the average ingestion time increases significantly after the 6th batch, when the test starts to ingest a lot of files in L0. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@siying has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Travis TEST_GROUP 2 is failing due to out-of-space. Rebase and run that test again?
If there are a lot of overlapped files in L0, creating a merging iterator for all files in L0 to check overlap can be very slow, because we need to read and seek all files in L0. However, in that case, the ingested file is likely to overlap with some files in L0, so if we check those files one by one, wecan stop once we encounter overlap.
@huachaohuang has updated the pull request. |
@anand1976 Done, PTAL |
lgtm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@anand1976 is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
If there are a lot of overlapped files in L0, creating a merging iterator for
all files in L0 to check overlap can be very slow because we need to read and
seek all files in L0. However, in that case, the ingested file is likely to
overlap with some files in L0, so if we check those files one by one, we can stop
once we encounter overlap.
Ref: #3540