Worksheets.codalab.org down - prohibits HELM from completing #1930

JasperHG90 · 2023-10-22T15:30:26Z

Hi!

I'm trying to run HELM with MMLU scenarios. It appears that https://worksheets.codalab.org/ is down, which is causing HELM to fail when using this scenario. I'm not sure if this is your data or that belonging to the scenario's authors, so I thought I'd post it here in case it is helm-related.

Best,

J.

msaroufim · 2023-10-22T21:11:00Z

Hi @yifanmai this is critical to fix for the LLM competition since we'd need to remove all MMLU datasets perturbations and CNN/DM from our configuration which doesn't feel so great before the Wednesday deadline

# {description: "summarization_cnndm:model=neurips/local,max_eval_instances=9",priority: 1}
# "data_augmentation=canonical"

As a backup I can remove MMLU but we'll be defining the datasets a day before the competition deadline not great

percyliang · 2023-10-22T21:48:02Z

Azure unfortunately disabled the CodaLab server due to a technical glitch; trying to get support to bring it back. In the meantime, perhaps we can send you the relevant files?

msaroufim · 2023-10-22T21:55:21Z

Would it be possible to temporarily change the download link to your own mirror? It'd be much more convenient for the leaderboard and competitors to just reinstall helm from source rather then have to manually download a dataset and place it in the right place. Although tbh either would be preferrable to delaying the competition since submissions are due in 3 days on Oct 25

JasperHG90 · 2023-10-23T04:06:12Z

@percyliang if you could share the files with me that would be appreciated, thanks!

yifanmai · 2023-10-23T05:39:30Z

Deploying a hotfix shortly #1931

yifanmai · 2023-10-23T05:56:59Z

I have mirrored the files and updated main to use the new URLs. Please try pulling main and re-running.

As an aside, because of #1932, helm-run may have downloaded and cached empty versions of these files, i.e. you may need to run the following to get rid of the empty files:

# fixes error in dialect_perturbation
rm -rf benchmark_output/perturbations/dialect
# fixes error in summarization_scenario
rm -rf benchmark_output/scenarios/summarization
# fixes error in summarization_metrics
rm -rf benchmark_output/v1/eval_cache

You'll need to run the respective command if you see one of these error messages:

  File "/home/yifanmai/oss/helm/src/helm/benchmark/augmentations/dialect_perturbation.py", line 129, in load_mapping_dict
    return json.load(f)
  File "/usr/lib/python3.8/json/__init__.py", line 293, in load
    return loads(fp.read(),
  File "/usr/lib/python3.8/json/__init__.py", line 357, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python3.8/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python3.8/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

  File "/home/yifanmai/oss/helm/src/helm/benchmark/scenarios/summarization_scenario.py", line 125, in get_instances
    dataset, article_key, summary_key = self._load_dataset(self.dataset_name, output_path)
  File "/home/yifanmai/oss/helm/src/helm/benchmark/scenarios/summarization_scenario.py", line 111, in _load_dataset
    dataset = self._download_dataset(url, "xsum-sampled", output_path)
  File "/home/yifanmai/oss/helm/src/helm/benchmark/scenarios/summarization_scenario.py", line 99, in _download_dataset
    dataset = pickle.load(fin)
EOFError: Ran out of input

  File "/home/yifanmai/oss/helm/src/helm/benchmark/metrics/summarization_metrics.py", line 198, in evaluate_generation
    self._load_qafacteval(eval_cache_path)
  File "/home/yifanmai/oss/helm/src/helm/benchmark/metrics/summarization_metrics.py", line 85, in _load_qafacteval
    qafacteval_scores = pickle.load(fin)
EOFError: Ran out of input

JasperHG90 · 2023-10-23T06:59:21Z

Thanks! I can try it out tonight.

anmolagarwal999 · 2023-10-23T10:12:09Z

@yifanmai The fix still does not work for me , even after removing cached data. I get the below error. EDIT: Manually downloading the file from Cloud and placing it correctly works.

Error when running summarization_cnndm:temperature=0.3,device=cpu,model=neurips_local:
Traceback (most recent call last):
  File "/home/anmol/nips_challenge/efficiency_challenge_repo/external_repos/helm_tracking_remote/helm/src/helm/benchmark/runner.py", line 173, in run_all
    self.run_one(run_spec)
  File "/home/anmol/nips_challenge/efficiency_challenge_repo/external_repos/helm_tracking_remote/helm/src/helm/benchmark/runner.py", line 221, in run_one
    instances = scenario.get_instances(scenario_output_path)
  File "/home/anmol/nips_challenge/efficiency_challenge_repo/external_repos/helm_tracking_remote/helm/src/helm/benchmark/scenarios/summarization_scenario.py", line 137, in get_instances
    dataset, article_key, summary_key = self._load_dataset(self.dataset_name, output_path)
  File "/home/anmol/nips_challenge/efficiency_challenge_repo/external_repos/helm_tracking_remote/helm/src/helm/benchmark/scenarios/summarization_scenario.py", line 128, in _load_dataset
    dataset = self._download_dataset(url, "cnndm", output_path)
  File "/home/anmol/nips_challenge/efficiency_challenge_repo/external_repos/helm_tracking_remote/helm/src/helm/benchmark/scenarios/summarization_scenario.py", line 102, in _download_dataset
    dataset = pickle.load(fin)
_pickle.UnpicklingError: invalid load key, '<'.

100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  2.02it/s]
} [0.541s]
Traceback (most recent call last):
  File "/home/anmol/anaconda3/envs/wizard_coder/bin/helm-run", line 8, in <module>
    sys.exit(main())
  File "/home/anmol/nips_challenge/efficiency_challenge_repo/external_repos/helm_tracking_remote/helm/src/helm/common/hierarchical_logger.py", line 104, in wrapper
    return fn(*args, **kwargs)
  File "/home/anmol/nips_challenge/efficiency_challenge_repo/external_repos/helm_tracking_remote/helm/src/helm/benchmark/run.py", line 309, in main
    run_benchmarking(
  File "/home/anmol/nips_challenge/efficiency_challenge_repo/external_repos/helm_tracking_remote/helm/src/helm/benchmark/run.py", line 111, in run_benchmarking
    runner.run_all(run_specs)
  File "/home/anmol/nips_challenge/efficiency_challenge_repo/external_repos/helm_tracking_remote/helm/src/helm/benchmark/runner.py", line 182, in run_all
    raise RunnerError(f"Failed runs: [{failed_runs_str}]")
helm.benchmark.runner.RunnerError: Failed runs: ["summarization_cnndm:temperature=0.3,device=cpu,model=neurips_local"]

agoncharenko1992 · 2023-10-23T11:12:54Z

@yifanmai I have the same error as above

anmolagarwal999 · 2023-10-23T11:28:22Z

@yifanmai I have the same error as above

@agoncharenko1992 Manually downloading the file from Cloud and placing it correctly in the correct folder (benchmark_output/scenarios/summarization/data) works.

yifanmai · 2023-10-23T17:09:45Z

Thanks for the bug report; will investigate shortly.

yifanmai · 2023-10-23T17:21:59Z

This should be fixed by #1935. You may have to delete the file to redownload: rm -rf benchmark_output/scenarios/summarization

pranavjain · 2023-10-25T21:04:50Z

CodaLab is back online now. Please do let us know if you are still facing issues.

msaroufim added bug Something isn't working competition Support for the NeurIPS Large Language Model Efficiency Challenge labels Oct 22, 2023

msaroufim mentioned this issue Oct 22, 2023

[Time sensitive] Site down codalab/codalab-worksheets#4558

Closed

msaroufim added the p1 Priority 1 (Required for release) label Oct 22, 2023

This was referenced Oct 23, 2023

Move some perturbation, scenario and metrics datasets from CodaLab to GCS #1931

Merged

ensure_file_downloaded writes an empty file if the source URL is not found #1932

Closed

yifanmai closed this as completed Oct 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Worksheets.codalab.org down - prohibits HELM from completing #1930

Worksheets.codalab.org down - prohibits HELM from completing #1930

JasperHG90 commented Oct 22, 2023 •

edited

msaroufim commented Oct 22, 2023 •

edited

percyliang commented Oct 22, 2023

msaroufim commented Oct 22, 2023 •

edited

JasperHG90 commented Oct 23, 2023

yifanmai commented Oct 23, 2023

yifanmai commented Oct 23, 2023

JasperHG90 commented Oct 23, 2023

anmolagarwal999 commented Oct 23, 2023 •

edited

agoncharenko1992 commented Oct 23, 2023

anmolagarwal999 commented Oct 23, 2023

yifanmai commented Oct 23, 2023

yifanmai commented Oct 23, 2023

pranavjain commented Oct 25, 2023

Worksheets.codalab.org down - prohibits HELM from completing #1930

Worksheets.codalab.org down - prohibits HELM from completing #1930

Comments

JasperHG90 commented Oct 22, 2023 • edited

msaroufim commented Oct 22, 2023 • edited

percyliang commented Oct 22, 2023

msaroufim commented Oct 22, 2023 • edited

JasperHG90 commented Oct 23, 2023

yifanmai commented Oct 23, 2023

yifanmai commented Oct 23, 2023

JasperHG90 commented Oct 23, 2023

anmolagarwal999 commented Oct 23, 2023 • edited

agoncharenko1992 commented Oct 23, 2023

anmolagarwal999 commented Oct 23, 2023

yifanmai commented Oct 23, 2023

yifanmai commented Oct 23, 2023

pranavjain commented Oct 25, 2023

JasperHG90 commented Oct 22, 2023 •

edited

msaroufim commented Oct 22, 2023 •

edited

msaroufim commented Oct 22, 2023 •

edited

anmolagarwal999 commented Oct 23, 2023 •

edited