-
Notifications
You must be signed in to change notification settings - Fork 208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(storage): skip flush in rowset builder if it's empty, allow empty finish
of column builders.
#492
Conversation
… `finish` of column builders. Signed-off-by: arkbriar <arkbriar@gmail.com>
@@ -99,6 +99,11 @@ impl RowsetBuilder { | |||
} | |||
|
|||
pub async fn finish_and_flush(self) -> StorageResult<()> { | |||
// Skip flush when empty. | |||
if self.row_cnt == 0 { | |||
return Ok(()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer returning an error here. The caller might expect RowSetBuilder to create a directory for the rowset. But it turns out not to be existing.
By returning an error, we can see where we flushed an empty rowset, and therefore fix the upper logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's my fault... I'll just return an error here.
One possible loc is in the compactor. I'll try to reproduce.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with that we should return an error here to notify upper logic. Just a reminder: the rowset directory was created when memory table was created, which was refactored previously. So the only difference here is the empty rowset doesn't have data, rather than directory not exist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my perspective, the RowsetBuilder
should not contain any I/O related logic, and the logic like creating directory and flushing data should be moved to a separate module, maybe RowsetWriter
? So that we won't need to create a directory even the rowset is still in memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I make it panic here. No suitable error is found. I'll go deep into the original test and find out if there's something I've missed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my perspective, the
RowsetBuilder
should not contain any I/O related logic, and the logic like creating directory and flushing data should be moved to a separate module, maybeRowsetWriter
? So that we won't need to create a directory even the rowset is still in memory.
That's something I'd like to refactor :)
In fact, all I/O is done in flush_and_finish
for RowSetBuilder. I plan to let the caller to decide whether to write these files or not, instead of doing all I/O inside the RowSetBuilder
.
We already have a tracking issue for that #110
Signed-off-by: arkbriar <arkbriar@gmail.com>
Signed-off-by: arkbriar arkbriar@gmail.com
As mentioned in #489 , the panic is caused by invoking
finish_and_flush
on an empty rowset builder, then finishes an empty column builder. Thanks @skyzh for pointing out my problem.This PR fixes these problems by allowing finishing empty column builder and skip the
finish_and_flush
of rowset builder entirely if empty.