Skip to content

Commit

Permalink
feat(corprep): add tokenizer_config_name and token_col to dataset tok…
Browse files Browse the repository at this point in the history
…enize configuration
  • Loading branch information
entelecheia committed Jul 23, 2023
1 parent ea92375 commit 4691e71
Showing 1 changed file with 2 additions and 0 deletions.
2 changes: 2 additions & 0 deletions src/corprep/conf/pipe/dataset_tokenize.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,11 @@ defaults:

run: corprep.datasets.tokenize.tokenize_dataset
run_with:
tokenizer_config_name: simple
num_proc: 1
batched: true
text_col: bodyText
token_col: tokenizedText
verbose: ${..verbose}
use_pipe_obj: true
return_pipe_obj: false

0 comments on commit 4691e71

Please sign in to comment.