# Cloud Storage Workflows

This notebook covers working with cloud storage:
- Uploading to S3, GCS, and Azure
- S3-compatible storage (MinIO, source.coop)
- Authentication options

**Note**: Cloud operations require appropriate credentials and network access.

## Uploading to S3

Use the `.upload()` method to write directly to cloud storage.

In [None]:
# Upload to S3 (requires AWS credentials)
# Uncomment to run:

# gpio.read('data/sample.parquet') \
#     .add_bbox() \
#     .sort_hilbert() \
#     .upload('s3://my-bucket/data/buildings.parquet')

## Using AWS Profiles

For non-default credentials, use the `profile` parameter.

In [None]:
# Upload with specific AWS profile
# Uncomment to run:

# gpio.read('data/sample.parquet') \
#     .add_bbox() \
#     .sort_hilbert() \
#     .upload(
#         's3://my-bucket/data/buildings.parquet',
#         profile='my-aws-profile'
#     )

## S3-Compatible Storage

For MinIO, source.coop, or other S3-compatible storage, use the `s3_endpoint` parameter.

In [None]:
# Upload to MinIO
# Uncomment to run:

# gpio.read('data/sample.parquet') \
#     .add_bbox() \
#     .sort_hilbert() \
#     .upload(
#         's3://my-bucket/data/buildings.parquet',
#         s3_endpoint='minio.example.com:9000',
#         s3_use_ssl=False  # For local MinIO without SSL
#     )

In [None]:
# Upload to source.coop
# Uncomment to run:

# gpio.read('data/sample.parquet') \
#     .add_bbox() \
#     .sort_hilbert() \
#     .upload(
#         's3://my-repo/data/buildings.parquet',
#         s3_endpoint='data.source.coop',
#         s3_use_ssl=True
#     )

## Upload with Compression Options

In [None]:
# Full control over output format
# Uncomment to run:

# gpio.read('data/sample.parquet') \
#     .add_bbox() \
#     .sort_hilbert() \
#     .upload(
#         's3://my-bucket/data/buildings.parquet',
#         compression='ZSTD',
#         compression_level=15,
#         row_group_size_mb=128,
#         profile='my-aws'
#     )

## Processing Pipeline with Cloud Upload

In [None]:
# Complete workflow: read, filter, transform, upload
# Uncomment to run:

# gpio.read('data/sample.parquet') \
#     .extract(limit=1000) \
#     .add_bbox() \
#     .add_h3(resolution=9) \
#     .sort_hilbert() \
#     .upload('s3://my-bucket/processed/buildings.parquet')

## Reading Remote Files

For reading from remote storage, use the CLI which has full remote file support.

```bash
# Read from S3 and process
gpio add bbox s3://bucket/input.parquet | gpio sort hilbert - output.parquet

# Read from HTTPS
gpio extract --limit 1000 https://example.com/data.parquet output.parquet
```

In [None]:
# For Python workflows, download first then process

# Uncomment to run:
# subprocess.run([
#     'gpio', 'extract', '--limit', '1000',
#     's3://bucket/large.parquet', '/tmp/local.parquet'
# ])

# Then use Python API
# gpio.read('/tmp/local.parquet').add_bbox().sort_hilbert().write('/tmp/processed.parquet')

## GCS and Azure

GCS and Azure Blob Storage are also supported.

In [None]:
# Upload to GCS
# Uncomment to run:

# gpio.read('data/sample.parquet') \
#     .add_bbox() \
#     .sort_hilbert() \
#     .upload('gs://my-bucket/data/buildings.parquet')

In [None]:
# Upload to Azure Blob Storage
# Uncomment to run:

# gpio.read('data/sample.parquet') \
#     .add_bbox() \
#     .sort_hilbert() \
#     .upload('az://container/data/buildings.parquet')

## Summary

| Storage | URL Format | Auth |
|---------|------------|------|
| AWS S3 | `s3://bucket/path` | AWS credentials, `profile` param |
| S3-compatible | `s3://bucket/path` + `s3_endpoint` | AWS credentials |
| GCS | `gs://bucket/path` | GCS credentials |
| Azure | `az://container/path` | Azure credentials |

## More Resources

- [Remote Files Guide](https://geoparquet.io/guide/remote-files/) - CLI remote file support
- [Upload Guide](https://geoparquet.io/cli/upload/) - CLI upload options