Skip to content

Recursively merge record batches#833

Merged
changhiskhan merged 5 commits intomainfrom
changhiskhan/merge-recordbatch
May 8, 2023
Merged

Recursively merge record batches#833
changhiskhan merged 5 commits intomainfrom
changhiskhan/merge-recordbatch

Conversation

@changhiskhan
Copy link
Copy Markdown
Contributor

If we're merging two record batches with struct fields then the existing merge method is insufficient. Instead we need to check for struct fields and do a recursive merge

@changhiskhan changhiskhan requested review from eddyxu and gsilvestrin May 8, 2023 03:42
Comment thread rust/src/arrow.rs Outdated
}
}

fn merge(left_fields: Vec<FieldRef>,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does it need the ownership of these fields?

Comment thread rust/src/arrow.rs Outdated
fn merge(left_fields: Vec<FieldRef>,
left_columns: Vec<ArrayRef>,
right_fields: Vec<FieldRef>,
right_columns: Vec<ArrayRef>) -> (Vec<FieldRef>, Vec<ArrayRef>) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is Vec<FieldRef> + Vec<ArrayRef> to be a RecordBatch?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, RecordBatch constructor takes list of tuple (field, array ref)

Comment thread rust/src/arrow.rs Outdated
let left_struct_array = as_struct_array(column);
let right_struct_array = as_struct_array(right_column.as_ref());
let (merged_fields, merged_columns) = merge(
left_sub_fields.iter().cloned().collect::<Vec<_>>(),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why need to copy the fields / array? Can it use a slice of array?

Comment thread rust/src/arrow.rs
)));
}
// otherwise, just use the field on the left hand side
_ => {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should you check left / right fields are not compabile?

Comment thread rust/src/arrow.rs Outdated

fn merge(left_fields: Vec<FieldRef>,
left_columns: Vec<ArrayRef>,
right_fields: Vec<FieldRef>,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also why separate columns and fields to two different params?

@changhiskhan changhiskhan merged commit abf5534 into main May 8, 2023
@changhiskhan changhiskhan deleted the changhiskhan/merge-recordbatch branch May 8, 2023 11:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants