Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A range can be specified with read.fst on sorted data frames #16

Open
MarcusKlik opened this issue Feb 1, 2017 · 0 comments
Open

A range can be specified with read.fst on sorted data frames #16

MarcusKlik opened this issue Feb 1, 2017 · 0 comments

Comments

@MarcusKlik
Copy link
Collaborator

When a sorted data set is stored as a fst binary file, sorting metadata is stored alongside the data. Using this metadata, a binary search can be performed on the key-columns before actually reading the data. For example, only 32 random seeks are needed in the binary file to search 4 billion rows for a begin- and end- value from a selected range. The performance penalty will be very small (seeking with modern SSD's is very fast).

@fstpackage fstpackage added this to the In read algorithms milestone Feb 1, 2017
@MarcusKlik MarcusKlik removed this from the Interface milestone Sep 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants