Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conditional read on a fst file #30

Open
MarcusKlik opened this issue Feb 28, 2017 · 1 comment
Open

Conditional read on a fst file #30

MarcusKlik opened this issue Feb 28, 2017 · 1 comment

Comments

@MarcusKlik
Copy link
Collaborator

MarcusKlik commented Feb 28, 2017

By specifying a condition on one or more columns of the stored table, data can be read using far less memory than a full read combined with a selection of rows. Related to issue #15 and issue #16: data can be read using a stream object and selection can be done on chunks of data, rather than the complete data set. Restrictions:

  • Condition cannot contain aggregate statements that depend on the whole set, e.g. median(ColA) / sum(ColA).
  • Size of result is not known in advance, so a binding of smaller result sets is required (like data.table's rbindlist). This will have an effect on performance.
@MarcusKlik
Copy link
Collaborator Author

On the other hand, because we read in separate chunks anyway, a conditional read feature is well suited for a multi-threaded implementation, provided we can implement the conditional statements in C++.

@MarcusKlik MarcusKlik added this to the Interface milestone Apr 16, 2017
@MarcusKlik MarcusKlik removed this from the Interface milestone Sep 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant