[Question]How to solve [datasets.builder.DatasetGenerationError: An error occurred while generating the dataset] #35

Shawnzheng011019 · 2023-10-23T02:48:05Z

What is your question?

Traceback (most recent call last):
File "C:\Users\shawn\anaconda3\envs\pytorch\lib\site-packages\datasets\builder.py", line 1618, in _prepare_split_single
writer = writer_class(
File "C:\Users\shawn\anaconda3\envs\pytorch\lib\site-packages\datasets\arrow_writer.py", line 334, in init
self.stream = self._fs.open(fs_token_paths[2][0], "wb")
File "C:\Users\shawn\anaconda3\envs\pytorch\lib\site-packages\fsspec\spec.py", line 1309, in open
f = self._open(
File "C:\Users\shawn\anaconda3\envs\pytorch\lib\site-packages\fsspec\implementations\local.py", line 180, in _open
return LocalFileOpener(path, mode, fs=self, **kwargs)
File "C:\Users\shawn\anaconda3\envs\pytorch\lib\site-packages\fsspec\implementations\local.py", line 298, in init
self._open()
File "C:\Users\shawn\anaconda3\envs\pytorch\lib\site-packages\fsspec\implementations\local.py", line 303, in _open
self.f = open(self.path, mode=self.mode)
FileNotFoundError: [Errno 2] No such file or directory: 'C:/Users/shawn/.cache/huggingface/datasets/named_entity_recognition_dataset_builder/default-c270794ce0d
23d06/0.0.0/db737b9bb893f20fb03d04403a30bf7c033256c212b7e9f0ebc6e9c958535c51.incomplete/named_entity_recognition_dataset_builder-train-00000-00000-of-NNNNN.arro
w'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "C:\Users\shawn\anaconda3\envs\pytorch\lib\runpy.py", line 197, in _run_module_as_main
return run_code(code, main_globals, None,
File "C:\Users\shawn\anaconda3\envs\pytorch\lib\runpy.py", line 87, in run_code
exec(code, run_globals)
File "C:\Users\shawn\anaconda3\envs\pytorch\Scripts\adaseq.exe_main.py", line 7, in
File "C:\Users\shawn\anaconda3\envs\pytorch\lib\site-packages\adaseq\main.py", line 13, in run
main(prog='adaseq')
File "C:\Users\shawn\anaconda3\envs\pytorch\lib\site-packages\adaseq\commands_init.py", line 29, in main
args.func(args)
File "C:\Users\shawn\anaconda3\envs\pytorch\lib\site-packages\adaseq\commands\train.py", line 84, in train_model_from_args
train_model(
File "C:\Users\shawn\anaconda3\envs\pytorch\lib\site-packages\adaseq\commands\train.py", line 156, in train_model
trainer = build_trainer_from_partial_objects(
File "C:\Users\shawn\anaconda3\envs\pytorch\lib\site-packages\adaseq\commands\train.py", line 185, in build_trainer_from_partial_objects
dm = DatasetManager.from_config(task=config.task, **config.dataset)
File "C:\Users\shawn\anaconda3\envs\pytorch\lib\site-packages\adaseq\data\dataset_manager.py", line 182, in from_config
hfdataset = hf_load_dataset(path, name=name, **kwargs)
File "C:\Users\shawn\anaconda3\envs\pytorch\lib\site-packages\datasets\load.py", line 1797, in load_dataset
builder_instance.download_and_prepare(
File "C:\Users\shawn\anaconda3\envs\pytorch\lib\site-packages\datasets\builder.py", line 909, in download_and_prepare
self._download_and_prepare(
File "C:\Users\shawn\anaconda3\envs\pytorch\lib\site-packages\datasets\builder.py", line 1670, in _download_and_prepare
super()._download_and_prepare(
File "C:\Users\shawn\anaconda3\envs\pytorch\lib\site-packages\datasets\builder.py", line 1004, in _download_and_prepare
self._prepare_split(split_generator, **prepare_split_kwargs)
File "C:\Users\shawn\anaconda3\envs\pytorch\lib\site-packages\datasets\builder.py", line 1508, in _prepare_split
for job_id, done, content in self._prepare_split_single(
File "C:\Users\shawn\anaconda3\envs\pytorch\lib\site-packages\datasets\builder.py", line 1665, in _prepare_split_single
raise DatasetGenerationError("An error occurred while generating the dataset") from e
datasets.builder.DatasetGenerationError: An error occurred while generating the dataset

What have you tried?

set http proxy and successfully conneted to Youtube.

Code (if necessary)

No response

What's your environment?

AdaSeq Version (e.g., 1.0 or master):
ModelScope Version (e.g., 1.0 or master):
PyTorch Version (e.g., 1.12.1):
OS (e.g., Ubuntu 20.04):
Python version:
CUDA/cuDNN version:
GPU models and configuration:
Any other relevant information:

Code of Conduct

I agree to follow this project's Code of Conduct

Shawnzheng011019 · 2023-10-23T02:49:15Z

environment was set automatically by the file requiremets.txt

ykallan · 2023-12-16T17:50:36Z

同样遇到这个问题，看起来应该是adaseq加载数据集的时候，可能处理逻辑有问题，加载数据集的格式

···text
data_type: json_spans
···

可能有点问题

PPPP-kaqiu · 2024-03-12T13:25:07Z

是因为数据集找不到或者数据集不是标准的解析格式，可以按照toy msra的加载代码重写一下数据加载

houyuchao · 2024-03-19T09:20:08Z

@PPPP-kaqiu 你重新写了吗？可以分享一下吗

lichen146 · 2024-04-26T09:10:51Z

@Shawnzheng011019 请问解决了吗，大哥

PPPP-kaqiu · 2024-04-26T09:45:56Z

完全按照hf dataset的格式写数据加载脚本，yaml的数据加载就只写数据那个文件夹就好了

lichen146 · 2024-04-26T09:48:25Z

@PPPP-kaqiu 加个微信吧大哥，求教啊WX：Xugeyuan923

houyuchao · 2024-07-21T09:30:15Z

完全按照hf dataset的格式写数据加载脚本，yaml的数据加载就只写数据那个文件夹就好了

大神您好可以分享一下怎么解决的吗

Shawnzheng011019 added the question Further information is requested label Oct 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question]How to solve [datasets.builder.DatasetGenerationError: An error occurred while generating the dataset] #35

[Question]How to solve [datasets.builder.DatasetGenerationError: An error occurred while generating the dataset] #35

Shawnzheng011019 commented Oct 23, 2023

Shawnzheng011019 commented Oct 23, 2023

ykallan commented Dec 16, 2023

PPPP-kaqiu commented Mar 12, 2024

houyuchao commented Mar 19, 2024

lichen146 commented Apr 26, 2024

PPPP-kaqiu commented Apr 26, 2024

lichen146 commented Apr 26, 2024

houyuchao commented Jul 21, 2024

[Question]How to solve [datasets.builder.DatasetGenerationError: An error occurred while generating the dataset] #35

[Question]How to solve [datasets.builder.DatasetGenerationError: An error occurred while generating the dataset] #35

Comments

Shawnzheng011019 commented Oct 23, 2023

What is your question?

What have you tried?

Code (if necessary)

What's your environment?

Code of Conduct

Shawnzheng011019 commented Oct 23, 2023

ykallan commented Dec 16, 2023

PPPP-kaqiu commented Mar 12, 2024

houyuchao commented Mar 19, 2024

lichen146 commented Apr 26, 2024

PPPP-kaqiu commented Apr 26, 2024

lichen146 commented Apr 26, 2024

houyuchao commented Jul 21, 2024