You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently rethinkdb export runs r.table(...).count() and then r.table(...). As it reads documents from the stream returned by r.table(...), it counts the number of documents; then it uses this value as the numerator of the progress fraction, with the result of r.table(...).count() as the denominator.
This means that the server must traverse the entire data set twice: once to count the number of documents, and again to stream them to the client. For a data set that doesn't fit into RAM, this could make rethinkdb export take twice as long as it needs to. Until the first traversal is finished, rethinkdb export will report 0% progress.
We should consider using a distribution query to estimate the denominator for the progress fraction instead. We can do this by running r.table(...).info()['doc_count_estimates'].sum(). This won't give exact results, but it will run in constant time instead of reading the entire table.
The text was updated successfully, but these errors were encountered:
Currently
rethinkdb export
runsr.table(...).count()
and thenr.table(...)
. As it reads documents from the stream returned byr.table(...)
, it counts the number of documents; then it uses this value as the numerator of the progress fraction, with the result ofr.table(...).count()
as the denominator.This means that the server must traverse the entire data set twice: once to count the number of documents, and again to stream them to the client. For a data set that doesn't fit into RAM, this could make
rethinkdb export
take twice as long as it needs to. Until the first traversal is finished,rethinkdb export
will report 0% progress.We should consider using a distribution query to estimate the denominator for the progress fraction instead. We can do this by running
r.table(...).info()['doc_count_estimates'].sum()
. This won't give exact results, but it will run in constant time instead of reading the entire table.The text was updated successfully, but these errors were encountered: