Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support pickling of some objects #97

Merged
merged 6 commits into from
Aug 26, 2023

Conversation

GodTamIt
Copy link
Contributor

@GodTamIt GodTamIt commented Jul 21, 2023

Fixes #96, supporting serialization via pythonize.

The (personal) use-case here is to be able to send types across process boundaries (i.e. large batches of Documents across to a process pool for some kind of processing).

@GodTamIt GodTamIt force-pushed the godtamit-pickle-upstream branch 6 times, most recently from bf5d3d2 to c73cadb Compare July 21, 2023 22:59
@cjrh
Copy link
Collaborator

cjrh commented Jul 21, 2023

Hi @GodTamIt

Thank you for this! A few general comments:

The maintainers here have very little bandwidth, so larger PRs can take a lot of time to get reviewed. Don't get discouraged if it takes a while, this is normal. Generally speaking, smaller PRs travel exponentially faster than large PRs, in many repos not only this one.

The PR includes extra unnecessary changes like renaming Value to TvValue. I prefer large renames and similar refactors like this to be in separate PRs. (and if this particular rename was in its own PR, I would reject the PR as unnecessary).

I feel like the motivation for supporting pickle feels not yet strong enough to support a change of this size, and the taking on of new dependencies (in my opinion). A use-case would help to understand the motivation better. Is it to be able to move Fruit between processes/machines? Or a distributed system or similar? If we had our documentation already in place I'd ask you to add docs about the pickle feature which would help to explain more, but the docs aren't quite ready yet (#94).

This PR adds new dependencies. This will add to the time for review because I will go look at each of the dependencies to look for reasons to reject them. I have a high bar for taking on new dependencies. They are easy to add and very hard to remove. For instance, flexbuffers is a heavy C++ dependency we take on, just to support pickle. I would like us to try harder to make bincode work, or pythonize as suggested in #96? Or perhaps serde_pickle, etc.

I see you added the time crate, but seeing as we already have chrono, did you look at chrono::serde?

@GodTamIt
Copy link
Contributor Author

Hi @GodTamIt

Thank you for this! A few general comments:

The maintainers here have very little bandwidth, so larger PRs can take a lot of time to get reviewed. Don't get discouraged if it takes a while, this is normal. Generally speaking, smaller PRs travel exponentially faster than large PRs, in many repos not only this one.

The PR includes extra unnecessary changes like renaming Value to TvValue. I prefer large renames and similar refactors like this to be in separate PRs. (and if this particular rename was in its own PR, I would reject the PR as unnecessary).

Ack. I appreciate the feedback here, and I've removed these changes. In the future, I will try to limit the change to only the necessary changes.

I feel like the motivation for supporting pickle feels not yet strong enough to support a change of this size, and the taking on of new dependencies (in my opinion). A use-case would help to understand the motivation better. Is it to be able to move Fruit between processes/machines? Or a distributed system or similar? If we had our documentation already in place I'd ask you to add docs about the pickle feature which would help to explain more, but the docs aren't quite ready yet (#94).

The primary motivation here is to be able to send these types (Document, SearchResult, etc.) across a process boundary. For example, concurrent.futures.ProcessPoolExecutor to operate on huge batches of Documents.

This PR adds new dependencies. This will add to the time for review because I will go look at each of the dependencies to look for reasons to reject them. I have a high bar for taking on new dependencies. They are easy to add and very hard to remove. For instance, flexbuffers is a heavy C++ dependency we take on, just to support pickle. I would like us to try harder to make bincode work, or pythonize as suggested in #96? Or perhaps serde_pickle, etc.

Ah, I didn't see the pythonizesuggestion until after posting this PR. I've elected for that, as it seems to be far lighter as a dependency and something that works.

I see you added the time crate, but seeing as we already have chrono, did you look at chrono::serde?

tantivy opts for the time crate internally instead. However, since tantivy likes to store it as a nanosecond timestamp, I've changed the code to just use an i64.

Cargo.toml Outdated Show resolved Hide resolved
@GodTamIt
Copy link
Contributor Author

This change was stacked on top of #99 (for equality support) and will need #98 to land before it can go in.

Cargo.toml Outdated Show resolved Hide resolved
src/document.rs Outdated Show resolved Hide resolved
src/document.rs Outdated Show resolved Hide resolved
src/document.rs Outdated Show resolved Hide resolved
src/document.rs Outdated Show resolved Hide resolved
@GodTamIt
Copy link
Contributor Author

Thanks for the reviews! This can land whenever

@cjrh cjrh merged commit 05dde2d into quickwit-oss:master Aug 26, 2023
10 checks passed
@GodTamIt GodTamIt deleted the godtamit-pickle-upstream branch August 28, 2023 16:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support pickling of certain types
3 participants