You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Similar to :mod:pimlico.modules.corpora.split, but instead of taking a random split of the dataset, splits it according to a given list of documents, putting those in the list in one set and the rest in another.
Inputs
Name
Type(s)
corpus
TarredCorpus <pimlico.datatypes.tar.TarredCorpus>
list
StringList <pimlico.datatypes.base.StringList>
Outputs
Name
Type(s)
set1
same as input corpus <pimlico.datatypes.base.TypeFromInput>
set2
same as input corpus <pimlico.datatypes.base.TypeFromInput>