You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
in the examples, I encountered the following error seeming to suggest I cannot load from the bucket provided in c4.py
Traceback (most recent call last):
File ".local/lib/python3.8/site-packages/paxml/main.py", line 407, in <module>
app.run(main, flags_parser=absl_flags.flags_parser)
File "/usr/local/lib/python3.8/dist-packages/absl/app.py", line 308, in run
_run_main(main, args)
File "/usr/local/lib/python3.8/dist-packages/absl/app.py", line 254, in _run_main
sys.exit(main(argv))
File ".local/lib/python3.8/site-packages/paxml/main.py", line 382, in main
run(experiment_config=experiment_config,
File ".local/lib/python3.8/site-packages/paxml/main.py", line 336, in run
search_space = tuning_lib.get_search_space(experiment_config)
File "/home/robertli/.local/lib/python3.8/site-packages/paxml/tuning_lib.py", line 81, in get_search_space
search_space = pg.hyper.trace(inspect_search_space, require_hyper_name=True)
File "/home/robertli/.local/lib/python3.8/site-packages/pyglove/core/hyper/dynamic_evaluation.py", line 586, in trace
fun()
File "/home/robertli/.local/lib/python3.8/site-packages/paxml/tuning_lib.py", line 77, in inspect_search_space
_ = instantiate(d)
File "/home/robertli/.local/lib/python3.8/site-packages/praxis/base_hyperparams.py", line 1103, in instantiate
return config.Instantiate(**kwargs)
File "/home/robertli/.local/lib/python3.8/site-packages/praxis/base_hyperparams.py", line 601, in Instantiate
return self.cls(self, **kwargs)
File "/home/robertli/.local/lib/python3.8/site-packages/paxml/seqio_input.py", line 443, in __init__
self._dataset = self._get_dataset()
File "/home/robertli/.local/lib/python3.8/site-packages/paxml/seqio_input.py", line 551, in _get_dataset
ds = self._get_backing_ds(
File "/home/robertli/.local/lib/python3.8/site-packages/paxml/seqio_input.py", line 686, in _get_backing_ds
ds = self.mixture_or_task.get_dataset(
File "/home/robertli/.local/lib/python3.8/site-packages/seqio/dataset_providers.py", line 1205, in get_dataset
len(self.source.list_shards(split=split)) >= shard_info.num_shards)
File "/home/robertli/.local/lib/python3.8/site-packages/seqio/dataset_providers.py", line 455, in list_shards
return [_get_filename(info) for info in self.tfds_dataset.files(split)]
File "/home/robertli/.local/lib/python3.8/site-packages/seqio/utils.py", line 152, in files
split_info = self.builder.info.splits[split]
File "/home/robertli/.local/lib/python3.8/site-packages/seqio/utils.py", line 129, in builder
LazyTfdsLoader._MEMOIZED_BUILDERS[builder_key] = tfds.builder(
File "/usr/lib/python3.8/contextlib.py", line 75, in inner
return func(*args, **kwds)
File "/home/robertli/.local/lib/python3.8/site-packages/tensorflow_datasets/core/logging/__init__.py", line 169, in __call__
return function(*args, **kwargs)
File "/home/robertli/.local/lib/python3.8/site-packages/tensorflow_datasets/core/load.py", line 202, in builder
return read_only_builder.builder_from_files(str(name), **builder_kwargs)
File "/usr/lib/python3.8/contextlib.py", line 75, in inner
return func(*args, **kwds)
File "/home/robertli/.local/lib/python3.8/site-packages/tensorflow_datasets/core/read_only_builder.py", line 259, in builder_from_files
builder_dir = _find_builder_dir(name, **builder_kwargs)
File "/home/robertli/.local/lib/python3.8/site-packages/tensorflow_datasets/core/read_only_builder.py", line 327, in _find_builder_dir
builder_dir = _find_builder_dir_single_dir(
File "/home/robertli/.local/lib/python3.8/site-packages/tensorflow_datasets/core/read_only_builder.py", line 417, in _find_builder_dir_single_dir
found_version_str = _get_version_str(
File "/home/robertli/.local/lib/python3.8/site-packages/tensorflow_datasets/core/read_only_builder.py", line 484, in _get_version_str
all_versions = version_lib.list_all_versions(os.fspath(builder_dir))
File "/home/robertli/.local/lib/python3.8/site-packages/tensorflow_datasets/core/utils/version.py", line 193, in list_all_versions
if not root_dir.exists():
File "/home/robertli/.local/lib/python3.8/site-packages/etils/epath/gpath.py", line 130, in exists
return self._backend.exists(self._path_str)
File "/home/robertli/.local/lib/python3.8/site-packages/etils/epath/backend.py", line 204, in exists
return self.gfile.exists(path)
File "/home/robertli/.local/lib/python3.8/site-packages/tensorflow/python/lib/io/file_io.py", line 288, in file_exists_v2
_pywrap_file_io.FileExists(compat.path_to_bytes(path))
tensorflow.python.framework.errors_impl.PermissionDeniedError: Error executing an HTTP request: HTTP response code 403 with body '{
"error": {
"code": 403,
"message": "991053624826-compute@developer.gserviceaccount.com does not have storage.objects.get access to the Google Cloud Storage object. Permission 'storage.objects.get' denied on resource (or it may not exist).",
"errors": [
{
"message": "991053624826-compute@developer.gserviceaccount.com does not have storage.objects.get access to the Google Cloud Storage object. Permission 'storage.objects.get' denied on resource (or it may not exist)."'
when reading metadata of gs://mlperf-llm-public2/c4/en
I wonder if this is because I haven't configured something correctly, because the bucket seems like a public one.
I tried using the TFDS default bucket (gs://tfds-data/datasets) instead of gs://mlperf-llm-public2 and this problem doesn't arise, but it requires me to choose among available versions of c4 (not 3.0.4). Even then, I cannot proceed because it gives me some other error.
Thanks in advance for your attention and help!
The text was updated successfully, but these errors were encountered:
Just a quick note that I don't think the perms on gs://mlperf-llm-public2 are configured properly for public access — I can access buckets like gs://t5-data/vocabs/ no problem, but not this one. I get a similar error as above when trying to grab the spm file per the README (gs://mlperf-llm-public2/vocab/c4_en_301_5Mexp2_spm.model).
Sorry to interrupt! When running
in the examples, I encountered the following error seeming to suggest I cannot load from the bucket provided in c4.py
I wonder if this is because I haven't configured something correctly, because the bucket seems like a public one.
I tried using the TFDS default bucket (
gs://tfds-data/datasets
) instead ofgs://mlperf-llm-public2
and this problem doesn't arise, but it requires me to choose among available versions of c4 (not 3.0.4). Even then, I cannot proceed because it gives me some other error.Thanks in advance for your attention and help!
The text was updated successfully, but these errors were encountered: