New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
There is no configs/train/finetune-7b.yaml #30
Comments
I believe the proper filename is |
Thank you @sbmsr . Would you mind help me with another issue? I tried using the Training Data Without P3 and Full Dataset with P3 in the Map (num_proc=64): 99%|██████████████████████████████████▊| 761922/765889 [05:46<00:00, 17174.63 examples/s]
multiprocess.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/anaconda3/envs/gpt4all/lib/python3.10/site-packages/multiprocess/pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "/anaconda3/envs/gpt4all/lib/python3.10/site-packages/datasets/utils/py_utils.py", line 1349, in _write_generator_to_queue
for i, result in enumerate(func(**kwargs)):
File "/anaconda3/envs/gpt4all/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 3329, in _map_single
batch = apply_function_on_filtered_inputs(
File "/anaconda3/envs/gpt4all/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 3210, in apply_function_on_filtered_inputs
processed_inputs = function(*fn_args, *additional_args, **fn_kwargs)
File "/llm/gpt4all/data.py", line 84, in <lambda>
lambda ele: tokenize_inputs(config, tokenizer, ele),
File "/llm/gpt4all/data.py", line 19, in tokenize_inputs
input_len = len(input_tokens)
File "/anaconda3/envs/gpt4all/lib/python3.10/site-packages/torch/_tensor.py", line 908, in __len__
raise TypeError("len() of a 0-d tensor")
TypeError: len() of a 0-d tensor
""" Solved in #53 |
There is no configs/train/finetune-7b.yaml in the repo. Which is required in the Readme for train.
The text was updated successfully, but these errors were encountered: