You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
With this feature you can populate say row 1001:2000 in a 1e6 row data.table with a 1000 row read from fst.read. All this is done in memory. This feature is very useful for combining data from multiple (fst) sources into a single result table without having the overhead of copies. For example, when performing the merge sort algorithm on a set of data files, you need to
read first x rows from all files
sort the resulting table
write some rows to disk
read next x rows form file with smallest first chunk
sort resulting table
goto 3
This can be performed efficiently in R by using data.table's fast sorting and populating the result table in memory. With such an algorithm operating on a collection of fst files, we basically have a method of sorting arbitrary large fst files without running out of memory (and it can be done with multiple threads!).
The text was updated successfully, but these errors were encountered:
With this feature you can populate say row 1001:2000 in a 1e6 row
data.table
with a 1000 row read fromfst.read
. All this is done in memory. This feature is very useful for combining data from multiple (fst
) sources into a single result table without having the overhead of copies. For example, when performing the merge sort algorithm on a set of data files, you need toThis can be performed efficiently in R by using
data.table
's fast sorting and populating the result table in memory. With such an algorithm operating on a collection offst
files, we basically have a method of sorting arbitrary largefst
files without running out of memory (and it can be done with multiple threads!).The text was updated successfully, but these errors were encountered: