Allow buffered input streams #23
Labels
acceptance: go ahead
Reviewed, implementation can start
area: performance
Performance improvements
help wanted
External contributions welcome
type: feature
New feature or request
Milestone
Is your feature request related to a problem? Please describe.
Current implementation reads the entire input to a string. This is not production-viable – very large files that we are targeting with all the performance improvements might not fit in memory. A first step would be to enable buffered reading – load a single page worth of input at a time. There are challenges here – it is possible for a single logical query step to span arbitrarily many blocks, e.g. JSON labels can be arbitrarily long.
Describe the solution you'd like
First of all, current implementations heavily rely on raw
AlignedSlice
data. This should be abstracted behind a buffered input that can yield slices on-demand.Two, the query engines need to be made aware of this. They currently rely on having all the data available to index into the slice and compare labels. The engines also need to communicate to the classifiers at which point it is safe to stop keeping old input blocks in memory – we always need the entire label before the currently looked-at colon to be buffered, but after we examine it, it can be discarded.
The text was updated successfully, but these errors were encountered: