Skip to content

resume failed checking for s3endpoint when localstorage/file served #538

Description

@austingnanaraj

reference #530

when served path only /mnt/e/retinanet_checkpoints
ConnectionError: ObjStoreLibStorage preflight failed: cannot reach bucket '/mnt/e/retinanet_checkpoints' via s3dlio at endpoint 'http://:9020'. Underlying error: RuntimeError: list_objects_v2 failed: failed to construct S3 request — check AWS_REGION, AWS_ENDPOINT_URL, and credential environment variables (AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY)

  • Check AWS_ENDPOINT_URL (current: 'http://:9020').
  • Check credentials are valid for that endpoint.
  • Check that bucket '/mnt/ecs/retinanet_checkpoints' exists at the endpoint.
    File "/root/storage/.venv/lib/python3.12/site-packages/dlio_benchmark/checkpointing/pytorch_obj_store_checkpointing.py", line 90, in get_instance

when served file://mnt/e/retinanet_checkpoints
Error Details
What Happened:
You specified checkpoint folder with file:// protocol (NFS storage)
DLIO used storage.storage_type=s3 globally for all storage
s3dlio tried to parse file:///mnt/e/... as an S3 bucket
URI parser failed - can't extract bucket name from file:// URI
All 32 MPI ranks crashed during initialization

ConnectionError: ObjStoreLibStorage preflight failed: cannot reach bucket 'file:///mnt/ecs/retinanet_checkpoints' via s3dlio at endpoint 'http://[REDACTED_IP]:9020'. Underlying error: RuntimeError: Bucket name cannot be empty in URI: s3://file:///mnt/e/retinanet_checkpoints/

  • Check AWS_ENDPOINT_URL (current: 'http://[REDACTED_IP]:9020').
  • Check credentials are valid for that endpoint.
  • Check that bucket 'file:///mnt/e/retinanet_checkpoints' exists at the endpoint.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Fields

No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions