Skip to content

Commit

Permalink
feat(pipeline): create datasets-tokenize.yaml for tokenization in pip…
Browse files Browse the repository at this point in the history
…eline
  • Loading branch information
entelecheia committed Jul 23, 2023
1 parent 070740b commit 743b8d7
Showing 1 changed file with 25 additions and 0 deletions.
25 changes: 25 additions & 0 deletions config/pipeline/datasets-tokenize.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
defaults:
- datasets
- /pipe@pipe_sample: dataset_sample

use_task_as_initial_object: true
steps:
- uses: pipe_load
with:
dataset_path: datasets/processed/kakao
verbose: true
- uses: pipe_sample
with:
num_samples: 11
randomize: false
verbose: true
- uses: pipe_tokenize
with:
tokenizer_config_name: simple
text_col: bodyText
token_col: tokenizedText
verbose: true
- uses: pipe_save
with:
dataset_path: datasets/processed/kakao_tokenized
verbose: true

0 comments on commit 743b8d7

Please sign in to comment.