-
Hi, If I do a Can I somehow specify row_groups or other loading options, choose a different dataformat, or give certain specs during parquet file creation in order to speed up the reading process when I include the uneven array? I am really only interested in one particular row (lets say row 533 of 1000 rows) and a subset of columns for this row... and one of those columns has an uneven array in it, as I said above. Any help appreciated. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 6 replies
-
I need to follow up on this when I have time to look things up, but I can provide some pointers in the meantime. There's another function, If you have a specific entry/row to read, or a specific range, That trimming, to produce a given entry range by reading as few row groups as possible, could be automated, but it hasn't (yet). It would be a good feature for us to add. But for now, if you can get that information for yourself from For columns, it looks like you've already found the |
Beta Was this translation helpful? Give feedback.
I need to follow up on this when I have time to look things up, but I can provide some pointers in the meantime. There's another function,
ak.metadata_from_parquet
, which reads the (small) metadata of a Parquet file but not the (large) data. In this metadata, there are fields fornum_entries
,num_row_groups
, and also row-group by row-group information about exactly which entries (rows) are in each row group.If you have a specific entry/row to read, or a specific range,
entry_start:entry_stop
, this can be expanded torow_group_start:row_group_stop
by rounding down the start index and rounding up the stop index. (There is no way to read one entry; row groups are the smallest granularity th…