Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve error message #8

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

improve error message #8

wants to merge 2 commits into from

Conversation

shunk031
Copy link
Owner

@shunk031 shunk031 commented Jul 28, 2023

Closed #7. Related #9 .

The contents of the downloaded data are displayed so that it can be easily confirmed that the data is corrupt.

Traceback (most recent call last):
  File "/home/shunk031/.cache/huggingface/modules/datasets_modules/datasets/shunk031--JGLUE/eed55a4f1c560114b29786d11eed4fc793f35c3b2aa9efdf5352c0bd85016b36/JGLUE.py", line 406, in preprocess_for_marc_ja
    df = df[["review_body", "star_rating", "review_id"]]
  File "/home/shunk031/lm-evaluation-harness/.venv/lib/python3.10/site-packages/pandas/core/frame.py", line 3767, in __getitem__
    indexer = self.columns._get_indexer_strict(key, "columns")[1]
  File "/home/shunk031/lm-evaluation-harness/.venv/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 5877, in _get_indexer_strict
    self._raise_if_missing(keyarr, indexer, axis_name)
  File "/home/shunk031/lm-evaluation-harness/.venv/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 5938, in _raise_if_missing
    raise KeyError(f"None of [{key}] are in the [{axis_name}]")
KeyError: "None of [Index(['review_body', 'star_rating', 'review_id'], dtype='object')] are in the [columns]"

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/shunk031/lm-evaluation-harness/main.py", line 122, in <module>
    main()
  File "/home/shunk031/lm-evaluation-harness/main.py", line 91, in main
    results = evaluator.simple_evaluate(
  File "/home/shunk031/lm-evaluation-harness/lm_eval/utils.py", line 185, in _wrapper
    return fn(*args, **kwargs)
  File "/home/shunk031/lm-evaluation-harness/lm_eval/evaluator.py", line 82, in simple_evaluate
    task_dict = lm_eval.tasks.get_task_dict(tasks)
  File "/home/shunk031/lm-evaluation-harness/lm_eval/tasks/__init__.py", line 373, in get_task_dict
    task_name_dict = {
  File "/home/shunk031/lm-evaluation-harness/lm_eval/tasks/__init__.py", line 374, in <dictcomp>
    task_name: get_task(task_name)()
  File "/home/shunk031/lm-evaluation-harness/lm_eval/base.py", line 430, in __init__
    self.download(data_dir, cache_dir, download_mode)
  File "/home/shunk031/lm-evaluation-harness/lm_eval/base.py", line 459, in download
    self.dataset = datasets.load_dataset(
  File "/home/shunk031/lm-evaluation-harness/.venv/lib/python3.10/site-packages/datasets/load.py", line 2133, in load_dataset
    builder_instance.download_and_prepare(
  File "/home/shunk031/lm-evaluation-harness/.venv/lib/python3.10/site-packages/datasets/builder.py", line 954, in download_and_prepare
    self._download_and_prepare(
  File "/home/shunk031/lm-evaluation-harness/.venv/lib/python3.10/site-packages/datasets/builder.py", line 1717, in _download_and_prepare
    super()._download_and_prepare(
  File "/home/shunk031/lm-evaluation-harness/.venv/lib/python3.10/site-packages/datasets/builder.py", line 1027, in _download_and_prepare
    split_generators = self._split_generators(dl_manager, **split_generators_kwargs)
  File "/home/shunk031/.cache/huggingface/modules/datasets_modules/datasets/shunk031--JGLUE/eed55a4f1c560114b29786d11eed4fc793f35c3b2aa9efdf5352c0bd85016b36/JGLUE.py", line 542, in _split_generators
    return self.__split_generators_marc_ja(dl_manager)
  File "/home/shunk031/.cache/huggingface/modules/datasets_modules/datasets/shunk031--JGLUE/eed55a4f1c560114b29786d11eed4fc793f35c3b2aa9efdf5352c0bd85016b36/JGLUE.py", line 510, in __split_generators_marc_ja
    split_dfs = preprocess_for_marc_ja(
  File "/home/shunk031/.cache/huggingface/modules/datasets_modules/datasets/shunk031--JGLUE/eed55a4f1c560114b29786d11eed4fc793f35c3b2aa9efdf5352c0bd85016b36/JGLUE.py", line 408, in preprocess_for_marc_ja
    raise ValueError(
ValueError: Invalid data loaded from /home/shunk031/.cache/huggingface/datasets/downloads/9607c6909a47d484324aa65d0f7523465575084911d938d039939eafe706542f:
              <?xml version="1.0" encoding="UTF-8"?>
0  <Error><Code>AccessDenied</Code><Message>Acces...

@shunk031
Copy link
Owner Author

CI fails because the MARC-ja dataset cannot be downloaded at this time (ref. #9 ).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Hard to understand error when MARC-ja dataset is not downloaded correctly
1 participant