You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The following is an error when I ran lm-evaluation-harness (jp-stable/JGLUE) and the MARC-ja dataset did not download correctly. This turned out to be the root cause of poor network conditions and failed downloads.
Selected Tasks: ['jsquad-1.1-0.3', 'jcommonsenseqa-1.1-0.3', 'jnli-1.1-0.3', 'marc_ja-1.1-0.3']
Using device 'cuda'
You are using config.init_device='cpu', but you can also use config.init_device="meta" with Composer + FSDP for fast initialization.
/home/shunk031/lm-evaluation-harness/lm_eval/tasks/ja/jsquad.py:75: FutureWarning: load_metric is deprecated and will be removed in the next major version of datasets. Use 'evaluate.load' instead, from the new library 🤗 Evaluate: https:/
/huggingface.co/docs/evaluate
self.jasquad_metric = datasets.load_metric(jasquad.__file__)
Downloading data files: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 8501.97it/s]
Extracting data files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 2054.69it/s]
Traceback (most recent call last):
File "/home/shunk031/lm-evaluation-harness/main.py", line 122, in <module>
main()
File "/home/shunk031/lm-evaluation-harness/main.py", line 91, in main
results = evaluator.simple_evaluate(
File "/home/shunk031/lm-evaluation-harness/lm_eval/utils.py", line 185, in _wrapper
return fn(*args, **kwargs)
File "/home/shunk031/lm-evaluation-harness/lm_eval/evaluator.py", line 82, in simple_evaluate
task_dict = lm_eval.tasks.get_task_dict(tasks)
File "/home/shunk031/lm-evaluation-harness/lm_eval/tasks/__init__.py", line 373, in get_task_dict
task_name_dict = {
File "/home/shunk031/lm-evaluation-harness/lm_eval/tasks/__init__.py", line 374, in <dictcomp>
task_name: get_task(task_name)()
File "/home/shunk031/lm-evaluation-harness/lm_eval/base.py", line 430, in __init__
self.download(data_dir, cache_dir, download_mode)
File "/home/shunk031/lm-evaluation-harness/lm_eval/base.py", line 459, in download
self.dataset = datasets.load_dataset(
File "/home/shunk031/lm-evaluation-harness/.venv/lib/python3.10/site-packages/datasets/load.py", line 2133, in load_dataset
builder_instance.download_and_prepare(
File "/home/shunk031/lm-evaluation-harness/.venv/lib/python3.10/site-packages/datasets/builder.py", line 954, in download_and_prepare
self._download_and_prepare(
File "/home/shunk031/lm-evaluation-harness/.venv/lib/python3.10/site-packages/datasets/builder.py", line 1717, in _download_and_prepare
super()._download_and_prepare(
File "/home/shunk031/lm-evaluation-harness/.venv/lib/python3.10/site-packages/datasets/builder.py", line 1027, in _download_and_prepare
split_generators = self._split_generators(dl_manager, **split_generators_kwargs)
File "/root/.cache/huggingface/modules/datasets_modules/datasets/shunk031--JGLUE/eed55a4f1c560114b29786d11eed4fc793f35c3b2aa9efdf5352c0bd85016b36/JGLUE.py", line 535, in _split_generators
return self.__split_generators_marc_ja(dl_manager)
File "/root/.cache/huggingface/modules/datasets_modules/datasets/shunk031--JGLUE/eed55a4f1c560114b29786d11eed4fc793f35c3b2aa9efdf5352c0bd85016b36/JGLUE.py", line 503, in __split_generators_marc_ja
split_dfs = preprocess_for_marc_ja(
File "/root/.cache/huggingface/modules/datasets_modules/datasets/shunk031--JGLUE/eed55a4f1c560114b29786d11eed4fc793f35c3b2aa9efdf5352c0bd85016b36/JGLUE.py", line 405, in preprocess_for_marc_ja
df = df[["review_body", "star_rating", "review_id"]]
File "/home/shunk031/lm-evaluation-harness/.venv/lib/python3.10/site-packages/pandas/core/frame.py", line 3767, in __getitem__
indexer = self.columns._get_indexer_strict(key, "columns")[1]
File "/home/shunk031/lm-evaluation-harness/.venv/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 5877, in _get_indexer_strict
self._raise_if_missing(keyarr, indexer, axis_name)
File "/home/shunk031/lm-evaluation-harness/.venv/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 5938, in _raise_if_missing
raise KeyError(f"None of [{key}] are in the [{axis_name}]")
KeyError: "None of [Index(['review_body', 'star_rating', 'review_id'], dtype='object')] are in the [columns]"
Since this error alone is not enough to determine if the data has not been loaded correctly, a more detailed condition is needed by displaying the contents of the data frame.
The following is an error when I ran lm-evaluation-harness (jp-stable/JGLUE) and the MARC-ja dataset did not download correctly. This turned out to be the root cause of poor network conditions and failed downloads.
Since this error alone is not enough to determine if the data has not been loaded correctly, a more detailed condition is needed by displaying the contents of the data frame.
Related #9 .
The text was updated successfully, but these errors were encountered: