New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FIX remove lambdas from text preprocessing #14430
FIX remove lambdas from text preprocessing #14430
Conversation
Thanks for looking into it!
Could you please run benchmarks/bench_text_vectorizers.py
before and after this PR and report results?
return lambda doc: self._char_wb_ngrams( | ||
preprocess(self.decode(doc))) | ||
return partial(_analyze, ngrams=self._char_wb_ngrams, | ||
preprocessor=preprocess, decoder=self.decode) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Side note: this is really a use-case for toolz.functoolz.compose
a shame that we can't use it.
Sure thing! Before
After
|
Ideally we'd have a non-regression test that checks that all build_*
methods result in objects that can be pickled and restored.
Thanks @deniederhut !
I'm not overly enthusiastic about the addition of _preprocess
and _analyze
function, but I don't see another way of fixing pickling.
Unless we ask people to use cloudpickle?
Please add an entry to the change log at |
Hm... Circle is showing
Does this need to be rebased? |
Please resolve conflicts (hopefully that would also fix CI by merging master in). |
Lambda functions are non-serializable under the stdlib pickle module. This commit replaces the lambdas found in three text preprocessing functions with hidden functions for chaining a sequence of preprocessing steps that can be partialed where appropriate. Closes scikit-learn#12833
The function has been modified, so testing for identity is no longer appropriate.
b6dc6fe
to
07d7cf4
Compare
Yup! That did the trick for the CI |
Thank you @deniederhut! |
Lambda functions are non-serializable under the stdlib pickle
module. This commit replaces the lambdas found in three text
preprocessing functions with hidden functions for chaining
a sequence of preprocessing steps that can be partialed where
appropriate.
Reference Issues/PRs
Closes #12833
What does this implement/fix? Explain your changes.
Instead of composing functions with lambdas, create chains
of preprocessing steps inside single functions that can
be decomposed with partialing.