Skip to content

Latest commit

 

History

History
37 lines (28 loc) · 1.65 KB

pimlico.modules.corpora.list_filter.rst

File metadata and controls

37 lines (28 loc) · 1.65 KB

Corpus document list filter

Path pimlico.modules.corpora.list_filter
Executable yes

Similar to :mod:pimlico.modules.corpora.split, but instead of taking a random split of the dataset, splits it according to a given list of documents, putting those in the list in one set and the rest in another.

Inputs

Name Type(s)
corpus TarredCorpus <pimlico.datatypes.tar.TarredCorpus>
list StringList <pimlico.datatypes.base.StringList>

Outputs

Name Type(s)
set1 same as input corpus <pimlico.datatypes.base.TypeFromInput>
set2 same as input corpus <pimlico.datatypes.base.TypeFromInput>