Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Export gds result to polars instead of pandas #653

Open
Mintactus opened this issue May 7, 2024 · 4 comments
Open

Export gds result to polars instead of pandas #653

Mintactus opened this issue May 7, 2024 · 4 comments

Comments

@Mintactus
Copy link

https://pola.rs/

Polars is setting a brand new standard of data processing, it would be awsome to have it as an option for the output for a gds function.
It could be an parameter you can chose when when you build the gds client, exportType = [pandas, polars, apache arrow IPC, etc. ]

Not just having pandas who is depreciated

@gminneci
Copy link

Hi there, thank you for bringing this to our attention. It's great to see performance improving and community interest in new libraries - we constantly monitor requests like this one. Pandas is still used and loved by the majority of our customers, while Polars is emerging. We will evaluate whether it's worth integrating natively, but in the meantime we will suggest using polars.from_pandas as an efficient workaround.

@MichaelSchmidt1729
Copy link

+1 for exporting to polars

@Mats-SX
Copy link
Contributor

Mats-SX commented Jun 3, 2024

Moving this to the GDS Python Client repository. The GDS library itself is agnostic to Pandas/Polars. Exports are possible using Bolt or Arrow. The internals of GDS are not based on Arrow, but are our own custom implementation, with some third party data structures (not Arrow itself).

@Mats-SX Mats-SX transferred this issue from neo4j/graph-data-science Jun 3, 2024
@Mats-SX
Copy link
Contributor

Mats-SX commented Jun 3, 2024

The GDS Python Client wraps the Neo4j Python Driver (https://github.com/neo4j/neo4j-python-driver) which dictates the basis of the GDS Python Client's export functionality for Cypher queries, through the Neo4j Python Driver's to_df() method (docs).

To get this Cypher driver to export to Polars as well, I suggest raising an issue on that repository. I will also mention it via Neo4j-internal channels.

The GDS Python Client can also export using Apache Arrow via the GDS Arrow Server. This does not use the Neo4j Python Driver, but makes an independent connection to the GDS Arrow Server using an Arrow client based on the pyarrow library. The pyarrow library returns results from the Arrow stream as Table (docs) objects, which have a to_pandas() (docs) method.

As @gminneci mentions, Polars support reading from a Pandas DataFrame, so it possible to hook up the workflow.
It is not directly possible for the GDS Python Client to use a different method from the underlying pyarrow library.
It is not perfectly in line with the purpose of the GDS Python library to support conversion between two third-party data structures (pyarrow.Table and polars.DataFrame). If either of pyarrow or Polars would support this, it would be more convenient. As it stands, conversion goes via polars.from_pandas(), which is still a more appropriate location compared to the GDS Python Client.

We are naturally very happy to see the interest in GDS and its software parts (library, client, database) so we are not rejecting this feature request. However, in the presence of workarounds and no very low-hanging possibilities for uniform integration (other than bundling Polars and calling from_pandas() within this library, which doesn't seem so attractive), we're keeping this tracked with no immediate plan to address it.

Thank you for raising this issue! All the best
Mats

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants