Path | pimlico.modules.corpora.corpus_stats |
Executable | yes |
Some basic statistics about tokenized corpora
Counts the number of tokens, sentences and distinct tokens in a corpus.
Name | Type(s) |
---|---|
corpus | TarredCorpus<TokenizedDocumentType> |
Name | Type(s) |
---|---|
stats | ~pimlico.datatypes.files.NamedFile |