Path | pimlico.modules.features.vocab_builder |
Executable | yes |
Document this module
Name | Type(s) |
---|---|
term_features | TarredCorpus<TermFeatureListDocumentType> |
Name | Type(s) |
---|---|
term_vocab | ~pimlico.datatypes.dictionary.Dictionary |
feature_vocab | ~pimlico.datatypes.dictionary.Dictionary |
Name | Description | Type |
---|---|---|
feature_limit | Limit vocab size to this number of most common entries (after other filters) | int |
feature_max_prop | Include features that occur in max this proportion of documents | float |
term_max_prop | Include terms that occur in max this proportion of documents | float |
term_threshold | Minimum number of occurrences required of a term to be included | int |
feature_threshold | Minimum number of occurrences required of a feature to be included | int |
term_limit | Limit vocab size to this number of most common entries (after other filters) | int |