Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(storage): skip flush in rowset builder if it's empty, allow empty finish of column builders. #492

Merged
merged 2 commits into from
Feb 17, 2022

Conversation

arkbriar
Copy link
Contributor

Signed-off-by: arkbriar arkbriar@gmail.com

As mentioned in #489 , the panic is caused by invoking finish_and_flush on an empty rowset builder, then finishes an empty column builder. Thanks @skyzh for pointing out my problem.

This PR fixes these problems by allowing finishing empty column builder and skip the finish_and_flush of rowset builder entirely if empty.

… `finish` of column builders.

Signed-off-by: arkbriar <arkbriar@gmail.com>
@@ -99,6 +99,11 @@ impl RowsetBuilder {
}

pub async fn finish_and_flush(self) -> StorageResult<()> {
// Skip flush when empty.
if self.row_cnt == 0 {
return Ok(());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer returning an error here. The caller might expect RowSetBuilder to create a directory for the rowset. But it turns out not to be existing.

By returning an error, we can see where we flushed an empty rowset, and therefore fix the upper logic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's my fault... I'll just return an error here.

One possible loc is in the compactor. I'll try to reproduce.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with that we should return an error here to notify upper logic. Just a reminder: the rowset directory was created when memory table was created, which was refactored previously. So the only difference here is the empty rowset doesn't have data, rather than directory not exist.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my perspective, the RowsetBuilder should not contain any I/O related logic, and the logic like creating directory and flushing data should be moved to a separate module, maybe RowsetWriter? So that we won't need to create a directory even the rowset is still in memory.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I make it panic here. No suitable error is found. I'll go deep into the original test and find out if there's something I've missed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my perspective, the RowsetBuilder should not contain any I/O related logic, and the logic like creating directory and flushing data should be moved to a separate module, maybe RowsetWriter? So that we won't need to create a directory even the rowset is still in memory.

That's something I'd like to refactor :)

In fact, all I/O is done in flush_and_finish for RowSetBuilder. I plan to let the caller to decide whether to write these files or not, instead of doing all I/O inside the RowSetBuilder.

We already have a tracking issue for that #110

Signed-off-by: arkbriar <arkbriar@gmail.com>
@skyzh skyzh merged commit 7b79219 into risinglightdb:main Feb 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants