Skip to content

Add pickling support for Python tokenizers#73

Merged
rth merged 8 commits intomasterfrom
pickle
Jun 12, 2020
Merged

Add pickling support for Python tokenizers#73
rth merged 8 commits intomasterfrom
pickle

Conversation

@rth
Copy link
Copy Markdown
Owner

@rth rth commented Jun 12, 2020

Partially addresses #25

This adds __getstate__ / __setstate__ methods to make pickling work, following discussion in PyO3/pyo3#100 and adapting the https://gist.github.com/ethanhs/fd4123487974c91c7e5960acc9aa2a77 example.

There is probably some way to add those methods via macros to avoid code repetition but I haven't figured it out yet.

Pickling support for stem and vectorize modules will be added in a follow up PR.

This also removes the parameter attributes from python wrappers e.g. RegexpTokenizer.pattern as they were anyway not synced with the Rust parameter struct (here RegexpTokenizer.inner.params.pattern), so changing them had no effect. If we want to make it work we could rather first make sure set_params / get_params methods are working as expected, and then implement them via __getattr__ / __setattr__.

Comment thread python/src/tokenize.rs
use vtext::tokenize::*;

#[pyclass]
#[pyclass(module = "vtext.tokenize")]
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Due to PyO3/pyo3#474 that should have been fixed, but I still see the same error.

@rth rth merged commit 172838c into master Jun 12, 2020
@rth rth deleted the pickle branch June 12, 2020 19:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant