Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Will fst(s) be additive #24

Open
1beb opened this issue Feb 13, 2017 · 1 comment
Open

Will fst(s) be additive #24

1beb opened this issue Feb 13, 2017 · 1 comment

Comments

@1beb
Copy link

1beb commented Feb 13, 2017

Is it possible to append to an fst without having to load it (completely)?

@MarcusKlik
Copy link
Collaborator

MarcusKlik commented Feb 14, 2017

Thanks @1beb for the feature request. I'm planning on adding an fst.rbind method to the next version of the fst package. This method will only need to read some meta-data from the existing file, so appending will be very fast as per your request. Note however that fst uses a columnar binary file format. This means that added data will basically be stored as a separate chunk inside the 'fst' file format. This will have a marginal impact on performance when large chunks of data are appended. However, when many small chunks are added sequentially, the overall performance will suffer. A partial solution to this problem might be to define a fst.stream class (issue #15) which can be used to append data to an existing file through an internal buffer. When the number of chunks is known, you can also use a fst.lapply method to create a large on-disk data set from many smaller inputs (issue #18) (also to be developed) . This could also be done in parallel with a fst.parlapply method.

@MarcusKlik MarcusKlik self-assigned this Feb 14, 2017
@MarcusKlik MarcusKlik added this to the Interface milestone Apr 16, 2017
@MarcusKlik MarcusKlik removed this from the Interface milestone Sep 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants