-
Notifications
You must be signed in to change notification settings - Fork 73
Conversation
pkg/phlaredb/block_querier.go
Outdated
err := func(rg parquet.RowGroup) error { | ||
reader := parquet.NewGenericRowGroupReader[M](rg) | ||
defer runutil.CloseWithLogOnErr(util.Logger, reader, "closing parquet generic row group reader") | ||
err = func(rg parquet.RowGroup) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect we should also run this in parallel ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure. The problem is that it would cost us lots of memory, because page reader acqiures a buffer (of ReadBufferSize
size 2MB) per column chunk, and if RGs were read concurrently, the internal buffer pool would be useless.
Interestingly, I see no apparent reason why column chunks should not be read concurrently: https://github.com/segmentio/parquet-go/blob/5d42db8f0d4728c31759068f08da15df44c6cc7f/row_group.go#L320, since buffers are already allocated, it should not be this wasteful
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah column are read in parallel in that case, but I thought it would be good to process multiple rowgroup at once, not sure if it would help as you say let's see.
e475243
to
076cd28
Compare
type StoredString struct { | ||
ID uint64 `parquet:",delta"` | ||
String string `parquet:",dict"` | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't manage to find how the ID field is used
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
The change makes in-memory parquet reader to not use the stored schema for reconstruction via
reflect
, and use hand-written reconstruction instead.