# Quick Start

## Create Configuration File

First, create a configuration file to define your storage providers. The default configuration file is located at `~/.msc_config.yaml`, but you can specify a different path using the `MSC_CONFIG` environment variable.

```yaml
profiles:
  s3-iad-webdataset:
    storage_provider:
      type: s3
      options:
        region_name: us-east-1
        base_path: webdataset_samples
    credentials_provider:
      type: S3Credentials
      options:
        access_key: *****
        secret_key: *****
  s3-pdx-zarr:
    storage_provider:
      type: s3
      options:
        region_name: us-west-2
        base_path: zarr_examples
    credentials_provider:
      type: S3Credentials
      options:
        access_key: *****
        secret_key: *****
```

## List Files

Once your configuration is in place, you can access files using `msc.open` and `msc.glob` functions.

In [2]:
import multistorageclient as msc

files = msc.glob("msc://s3-iad-webdataset/*.tar")
files[:10]

['msc://s3-iad-webdataset/dataset_000.tar',
 'msc://s3-iad-webdataset/dataset_001.tar',
 'msc://s3-iad-webdataset/dataset_002.tar',
 'msc://s3-iad-webdataset/dataset_003.tar',
 'msc://s3-iad-webdataset/dataset_004.tar',
 'msc://s3-iad-webdataset/dataset_005.tar',
 'msc://s3-iad-webdataset/dataset_006.tar',
 'msc://s3-iad-webdataset/dataset_007.tar',
 'msc://s3-iad-webdataset/dataset_008.tar',
 'msc://s3-iad-webdataset/dataset_009.tar']

## Open File - Read

In [3]:
with msc.open("msc://s3-iad-webdataset/dataset_000.tar", "rb") as fp:
    content = fp.read()

print(f"File Size = {len(content)}, Content = {content[:80]}...")

File Size = 62986240, Content = b'././@PaxHeader\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'...


## Open File - Write

In [4]:
# 32mb file
body = b"A" * 32 * 1024 * 1024

with msc.open("msc://s3-iad-webdataset/testfile.bin", "wb") as fp:
    fp.write(body)

with msc.open("msc://s3-iad-webdataset/testfile.bin", "rb") as fp:
    content = fp.read()

print(f"File Size = {len(content)}, Content = {content[:80]}...")

File Size = 33554432, Content = b'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA'...


In [5]:
msc.glob("msc://s3-iad-webdataset/*.bin")

['msc://s3-iad-webdataset/testfile.bin']

# Framework Integration

## Use Zarr

In [1]:
import zarr
import numpy as np

# Create a zarr array and store the data in S3 bucket
zarr_array = zarr.create(shape=(4, 4), dtype="float32", store="msc://s3-pdx-zarr/array.zarr", overwrite=True)

print(f"zarr_array = {zarr_array}")

# Open the zarr array on S3
zarr_array_opened = zarr.open("msc://s3-pdx-zarr/array.zarr")

print(f"zarr_array_opened = {zarr_array_opened}")

# Create a zarr group with two arrays
zarr_group = zarr.open_group("msc://s3-pdx-zarr/group.zarr", mode="w")
zarr_group.create_dataset("array1", shape=(4, 4), dtype="float32", data=np.eye(4), overwrite=True)
zarr_group.create_dataset("array2", shape=(8, 8), dtype="float64", overwrite=True)

print(f"zarr_group = {zarr_group}")
print(f"zarr_group.array1: {zarr_group['array1'][:]}")

# Open the zarr group on S3
zarr_group_opened = zarr.open("msc://s3-pdx-zarr/group.zarr")

print(f"zarr_group_opened structure: {zarr_group_opened}")
print(f"zarr_group_opened.array1: {zarr_group_opened['array1'][:]}")

zarr_array = <zarr.core.Array (4, 4) float32>
zarr_array_opened = <zarr.core.Array (4, 4) float32>
zarr_group = <zarr.hierarchy.Group '/'>
zarr_group.array1: [[1. 0. 0. 0.]
 [0. 1. 0. 0.]
 [0. 0. 1. 0.]
 [0. 0. 0. 1.]]
zarr_group_opened structure: <zarr.hierarchy.Group '/'>
zarr_group_opened.array1: [[1. 0. 0. 0.]
 [0. 1. 0. 0.]
 [0. 0. 1. 0.]
 [0. 0. 0. 1.]]
