-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Documentation should describe advantages over DataFrame constructor (of Pandas) #107
Comments
Hi @sanjaydasgupta, thanks for the suggestion. I've opened https://jira.mongodb.org/browse/ARROW-129 to track the issue. Summarizing here as well: We should list the pros and cons of using this library versus using the PyMongo API directly, highlighting the benchmarks as well as the limitations. We should give examples showing how the same tasks could be accomplished with each. |
Hi @blink1073, thanks for your response. Here is some sample code that illustrates this direct approach to obtain a pandas DataFrame from the contents of a MongoDB collection:
The code above handles all Python types (including lists and dicts), and does a fair job of deducing the column data types. I hope this is helpful. |
It is, thank you! |
@sanjaydasgupta I agree with your suggestion to make a proper documentation of the advantages of using this library vs using direct pymongo API. I have experimented with both the libraries and What I was looking for is faster response. If you have a huge dataset then conversion pymongo cursor to list and then convert to dataframe is time taking process vs using the find_pandas_all API to directly have response in pandas. @blink1073 Can you help me to understand more clear if i am right? Time would be lesser in case of using pymongoarrow. Also any plan to work/supported with nested data structure directly without using any aggregation pipeline in between? Also, One observation objectid data are converting to bindata. Is it the case or it is just with me? If this then i think can open a new issue for that. Looking forward to have a response. Thank you. |
Hi @Khushali22, thank you for the further insight. I am currently working on nested data in #104. The object id representation is tracked in https://jira.mongodb.org/browse/ARROW-55. |
Added in the 1.0 release |
Converting the output of the pymongo "find()" method to a Pandas DataFrame can be done directly by the DataFrame constructor.
The output of the "find()" method is a Python list containing Python dictionary objects, and this kind of data collection can be directly handled by the DataFrame constructor.
Moreover, the Pandas DataFrame constructor can already handle data of all Python types (particularly lists and dictionaries).
In view of the above, there should be some discussion of the need for this library, and any advantages it may eventually have over the Pandas DataFrame constructor should be documented.
The text was updated successfully, but these errors were encountered: