-
-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Closed
Description
Describe the workflow you want to enable
Hi,
Refering to this message found for sklearn.feature_extraction.text.CountVectorizer
and sklearn.feature_extraction.text.TfidfVectorizer
:
The stop_words_ attribute can get large and increase the model size when pickling. This attribute is provided only for introspection and can be safely removed using delattr or set to None before pickling.
This attribute can indeed be very large when the dataset is big (leading to out of memory error sometimes). Besides, what it contains is not used by everyone (not always).
So, it would be nice to have this attribute as optional.
Describe your proposed solution
- Add the possibility to indicate that we don't want to build this attribute when initializing
CountVectorizer
andTfidfVectorizer