Skip to content

Conversation

@Emrehzl94
Copy link
Contributor

This PR fixes serialization issues with the FlightClient in the GDSArrowClient class. Due to its non-serializability.

  • Added a _instantiate_flight_client() method to create the FlightClient with proper configuration.
  • Implemented a _client() method for lazy construction of the FlightClient, ensuring it’s only instantiated when needed.
  • Modified the __getstate__() method to exclude the FlightClient from the serialized state.

@netlify
Copy link

netlify bot commented Dec 18, 2024

Deploy Preview for neo4j-graph-data-science-client canceled.

Name Link
🔨 Latest commit f46e36f
🔍 Latest deploy log https://app.netlify.com/sites/neo4j-graph-data-science-client/deploys/6763f5c9b26fc000087739b5

Copy link
Contributor

@Mats-SX Mats-SX left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

Comment on lines +110 to +112
user_agent = f"neo4j-graphdatascience-v{__version__} pyarrow-v{arrow_version}"
if self._user_agent:
user_agent = self._user_agent
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so it could be

Suggested change
user_agent = f"neo4j-graphdatascience-v{__version__} pyarrow-v{arrow_version}"
if self._user_agent:
user_agent = self._user_agent
user_agent = (
self._user_agent
if self._user_agent
else f"neo4j-graphdatascience-v{__version__} pyarrow-v{arrow_version}"
)

but it's not necessary. also maybe this will trip up the typer -- not sure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah looks much better this way

Comment on lines +565 to +566
Lazy client construction to help pickle this class because a PyArrow
FlightClient is not serializable.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, but I am curious; in which situations do you need to serialise this class?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure at this point but we might have needed it this way on BigQuery Connector

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be good to know why, to stop future developers from removing this feature.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes sure, I will let you know why when I make sure about the need on the connector side

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the BigQuery connector, we use Spark jobs to process data, and Spark requires a serialized version of this class to distribute the job across different workers. Since FlightClient is not inherently serializable, we needed this lazy initialization.

@Mats-SX Mats-SX merged commit 6682607 into neo4j:main Dec 19, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants