-
Notifications
You must be signed in to change notification settings - Fork 54
fix: remove flight client from serialization #804
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
✅ Deploy Preview for neo4j-graph-data-science-client canceled.
|
Mats-SX
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
| user_agent = f"neo4j-graphdatascience-v{__version__} pyarrow-v{arrow_version}" | ||
| if self._user_agent: | ||
| user_agent = self._user_agent |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so it could be
| user_agent = f"neo4j-graphdatascience-v{__version__} pyarrow-v{arrow_version}" | |
| if self._user_agent: | |
| user_agent = self._user_agent | |
| user_agent = ( | |
| self._user_agent | |
| if self._user_agent | |
| else f"neo4j-graphdatascience-v{__version__} pyarrow-v{arrow_version}" | |
| ) |
but it's not necessary. also maybe this will trip up the typer -- not sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah looks much better this way
| Lazy client construction to help pickle this class because a PyArrow | ||
| FlightClient is not serializable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, but I am curious; in which situations do you need to serialise this class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure at this point but we might have needed it this way on BigQuery Connector
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be good to know why, to stop future developers from removing this feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes sure, I will let you know why when I make sure about the need on the connector side
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the BigQuery connector, we use Spark jobs to process data, and Spark requires a serialized version of this class to distribute the job across different workers. Since FlightClient is not inherently serializable, we needed this lazy initialization.
This PR fixes serialization issues with the FlightClient in the GDSArrowClient class. Due to its non-serializability.
_instantiate_flight_client()method to create theFlightClientwith proper configuration._client()method for lazy construction of theFlightClient, ensuring it’s only instantiated when needed.__getstate__()method to exclude theFlightClientfrom the serialized state.