Fill a data.table range with specific rows from read.fst #29

MarcusKlik · 2017-02-23T20:48:03Z

With this feature you can populate say row 1001:2000 in a 1e6 row data.table with a 1000 row read from fst.read. All this is done in memory. This feature is very useful for combining data from multiple (fst) sources into a single result table without having the overhead of copies. For example, when performing the merge sort algorithm on a set of data files, you need to

read first x rows from all files
sort the resulting table
write some rows to disk
read next x rows form file with smallest first chunk
sort resulting table
goto 3

This can be performed efficiently in R by using data.table's fast sorting and populating the result table in memory. With such an algorithm operating on a collection of fst files, we basically have a method of sorting arbitrary large fst files without running out of memory (and it can be done with multiple threads!).

The text was updated successfully, but these errors were encountered:

MarcusKlik added the enhancement label Mar 1, 2017

MarcusKlik mentioned this issue Apr 16, 2017

Currently planned milestones for fst #48

Closed

MarcusKlik added this to the Advanced operations milestone Apr 16, 2017

MarcusKlik mentioned this issue Dec 22, 2017

Planned milestones for future releases #117

Open

MarcusKlik removed this from the Advanced operations milestone Sep 7, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fill a data.table range with specific rows from read.fst #29

Fill a data.table range with specific rows from read.fst #29

MarcusKlik commented Feb 23, 2017 •

edited

Loading

Fill a data.table range with specific rows from read.fst #29

Fill a data.table range with specific rows from read.fst #29

Comments

MarcusKlik commented Feb 23, 2017 • edited Loading

MarcusKlik commented Feb 23, 2017 •

edited

Loading