You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
PyMongoArrow is the recommended way to materialize MongoDB query result sets as contiguous-in-memory, typed arrays suited for in-memory analytical processing applications.
Some comments:
we have to implement unit tests on these methods before switching to be sure we don't break anything
As stated in the code, the current conversion from a list of mongo entries to a dataframe is not necessarily optimal:
dataset-viewer/libs/libcommon/src/libcommon/simple_cache.py
Lines 813 to 833 in 27edd1f
see also
dataset-viewer/libs/libcommon/src/libcommon/queue.py
Lines 994 to 1019 in 27edd1f
We might benefit from using https://github.com/mongodb-labs/mongo-arrow/tree/main/bindings/python for that (recommended way from the mongo team)
Some comments:
@pytest.mark.limit_memory()
, see https://bloomberg.github.io/memray/tutorials/additional_features.html#pytest-plugin) to ensure we reduce the memory footprintThe text was updated successfully, but these errors were encountered: