Skip to content

Latest commit

 

History

History
52 lines (31 loc) · 1014 Bytes

corpora.concat.conf.rst

File metadata and controls

52 lines (31 loc) · 1014 Bytes

concat

This is one of the test pipelines included in Pimlico's repository. See test-pipelines for more details.

Config file

The complete config file for this test pipeline:

[pipeline]
name=concat
release=latest

# Take input from some prepared Pimlico datasets
[europarl1]
type=pimlico.datatypes.corpora.GroupedCorpus
data_point_type=RawTextDocumentType
dir=%(test_data_dir)s/datasets/text_corpora/europarl

[europarl2]
type=pimlico.datatypes.corpora.GroupedCorpus
data_point_type=RawTextDocumentType
dir=%(test_data_dir)s/datasets/text_corpora/europarl2


[concat]
type=pimlico.modules.corpora.concat
input_corpora=europarl1,europarl2

[output]
type=pimlico.modules.corpora.format

Modules

The following Pimlico module types are used in this pipeline:

  • pimlico.modules.corpora.concat
  • pimlico.modules.corpora.format