Skip to content
This repository has been archived by the owner on Feb 18, 2024. It is now read-only.

Implement MutableStructArray #703

Open
hohav opened this issue Dec 23, 2021 · 3 comments
Open

Implement MutableStructArray #703

hohav opened this issue Dec 23, 2021 · 3 comments
Labels
feature A new feature

Comments

@hohav
Copy link
Contributor

hohav commented Dec 23, 2021

I'd like to work on implementing MutableStructArray. Any thoughts on how its API should look? The arrow-rs equivalent StructBuilder requires calling append on each child array, followed by append on the parent. Should we just follow the same pattern with push?

(Feel free to move this to Discussions, if that's a better place.)

@jorgecarleitao
Copy link
Owner

Hi @hohav , thanks for the suggestion. Could you describe what would be the use case?

I never seen the need for it other than the "struct of arrays" idiom. For that I would recommend this crate: https://github.com/DataEngineeringLabs/arrow2-derive - it allows deriving an arrow-compatible array of structs and use the struct directly. It has the advantage of making everything typed.

@hohav
Copy link
Contributor Author

hohav commented Dec 25, 2021

That's exactly my use case, but I can't currently use arrow2-derive because my structs are nested. And even if you added support for nested structs, I also need to omit certain fields conditionally, which would be hard to justify in a general-purpose macro.

I'd implemented this stuff in my own derive macro using arrow-rs before I came across arrow2. I was looking at porting things over more-or-less directly, but I'm open to another approach if that one's not going to be idiomatic for arrow2.

@jorgecarleitao
Copy link
Owner

Thanks a lot for the insight.

It is not that it would not be idiomatic, it is just because we need to downcast on every push. Essentially, the hot loops (i.e. when we push will be filled with conditionals branches, one per field). The equivalent typed approach would be equivalent to:

let mut f1: MutablePrimitiveArray<i32> = ...
let mut f2: MutableUtf8Array<i32> = ...

fn push(v1: i32, v2: Option<&str>) {
     f1.push(Some(v1));
     f2.push(v2);
}

Now that I think about this, we do have a use-case already also, I did not realize until now that it was the same: when reading from avro where the schema is only known at runtime, we use Vec<Box<dyn MutableArray>> over all fields to read from a row to a Vec<Arc<dyn Array>> (which is basically a MutableStructArray.

So, in summary: if the schema is known at compile time, we can derive it and avoid a MutableStructArray; otherwise, we need one. Therefore, I agree with you that this is a valid use-case.

@jorgecarleitao jorgecarleitao added the feature A new feature label Dec 30, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
feature A new feature
Projects
None yet
Development

No branches or pull requests

2 participants