Skip to content

Latest commit

 

History

History
55 lines (43 loc) · 2.86 KB

pimlico.modules.features.vocab_builder.rst

File metadata and controls

55 lines (43 loc) · 2.86 KB

Term-feature corpus vocab builder

Path pimlico.modules.features.vocab_builder
Executable yes

Document this module

Inputs

Name Type(s)
term_features TarredCorpus<TermFeatureListDocumentType>

Outputs

Name Type(s)
term_vocab ~pimlico.datatypes.dictionary.Dictionary
feature_vocab ~pimlico.datatypes.dictionary.Dictionary

Options

Name Description Type
feature_limit Limit vocab size to this number of most common entries (after other filters) int
feature_max_prop Include features that occur in max this proportion of documents float
term_max_prop Include terms that occur in max this proportion of documents float
term_threshold Minimum number of occurrences required of a term to be included int
feature_threshold Minimum number of occurrences required of a feature to be included int
term_limit Limit vocab size to this number of most common entries (after other filters) int