You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In demo code: python tools/tokenizer.py --raw_data_name your_raw_data_file_name(without suffix) --input_file_type 'text' or 'json' or 'jsonl' --bin your_output_bin_path
'text' should be 'txt'
same to the demo code: python tools/tokenizer.py --raw_data_name raw_data --input_file_type 'text' --bin cn/output.bin
Both in chinese and engish,and other language if existed
Environment
any browser
Other information
No response
The text was updated successfully, but these errors were encountered:
Describe the bug
In demo code: python tools/tokenizer.py --raw_data_name your_raw_data_file_name(without suffix) --input_file_type 'text' or 'json' or 'jsonl' --bin your_output_bin_path
'text' should be 'txt'
same to the demo code: python tools/tokenizer.py --raw_data_name raw_data --input_file_type 'text' --bin cn/output.bin
Both in chinese and engish,and other language if existed
Environment
any browser
Other information
No response
The text was updated successfully, but these errors were encountered: