Skip to content

feat(uploads): transparent presigned direct-to-storage upload#130

Merged
zfarrell merged 7 commits into
mainfrom
worktree-jazzy-meandering-pond
Jun 28, 2026
Merged

feat(uploads): transparent presigned direct-to-storage upload#130
zfarrell merged 7 commits into
mainfrom
worktree-jazzy-meandering-pond

Conversation

@zfarrell

@zfarrell zfarrell commented Jun 28, 2026

Copy link
Copy Markdown
Contributor

Upload files directly to storage

Adds first-class file uploads to the Python SDK. upload_file handles the entire flow for you — it streams your data straight to object storage and returns the finalized result, with no extra round-trips through the API:

from hotdata import ApiClient, Configuration, UploadsApi

uploads = UploadsApi(ApiClient(Configuration(...)))
result = uploads.upload_file("data.parquet")

Highlights

  • Accepts a file path, raw bytes, or any seekable binary file object
  • Automatically picks a single request for small files and concurrent multipart uploads for large ones
  • Optional progress callback, with tunable part size, concurrency, and per-part retries
  • Typed errors under UploadError for easy failure handling
  • upload_stream covers the fallback for non-seekable streams

See the README and CHANGELOG for full details.

Comment thread hotdata/uploads.py Outdated
Comment on lines +801 to +805
if progress is not None:
with lock:
done[0] = min(done[0] + length, total)
current = done[0]
progress(current, total)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: (not blocking) The progress snapshot is taken under lock, but progress(current, total) is invoked outside it. Two workers can compute current (e.g. 10 then 20) under the lock, release, and then race to call the callback — delivering (20) before (10). That contradicts the documented "monotonically non-decreasing" guarantee for bytes_done_total (see the UploadProgress docstring, lines 117–122). Moving the call inside the with lock: block makes delivery order match the computed order:

Suggested change
if progress is not None:
with lock:
done[0] = min(done[0] + length, total)
current = done[0]
progress(current, total)
if progress is not None:
with lock:
done[0] = min(done[0] + length, total)
progress(done[0], total)

The callback is already expected to be cheap (and the docstring tells callers to do their own locking), so serializing the calls is a fair trade for the ordering guarantee.

claude[bot]
claude Bot previously approved these changes Jun 28, 2026

@claude claude Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thorough, well-documented, and well-tested. Header isolation, exactly-once finalize, short-read/over-read guards, and concurrency bounds are all covered by tests. One non-blocking nit on progress-callback ordering inline.

@claude claude Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the direct-to-storage upload flow in hotdata/uploads.py and supporting changes. The prior progress-ordering nit is resolved (callback now fires inside the lock). Concurrency bounds, header isolation, short-read guards, and the retries-disabled finalize all look correct. LGTM.

@zfarrell zfarrell merged commit f630bd7 into main Jun 28, 2026
5 checks passed
@zfarrell zfarrell deleted the worktree-jazzy-meandering-pond branch June 28, 2026 03:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant