Skip to content

Conversation

@joshi4
Copy link
Contributor

@joshi4 joshi4 commented Oct 14, 2025

Fix unbounded memory growth in multipart upload

  • Replace buffered chunk reading with streaming ReadableStream (64 KiB increments)
  • Replace axios with fetch for native streaming support with duplex mode
  • Throttle UI progress updates to 200ms intervals to prevent stdout buffer bloat
  • Fix file descriptor leak by properly closing FsFile in all code paths
  • Fetch fresh presigned URL on each retry attempt to handle expiration
  • Add bail logic for non-retryable 4xx errors (except 408/429)
  • Stop progress bar on error to restore terminal state

Memory usage now O(1) per concurrent part instead of O(part_size),
enabling large file uploads on resource-constrained machines.

- Replace buffered chunk reading with streaming ReadableStream (64 KiB increments)
- Replace axios with fetch for native streaming support with duplex mode
- Throttle UI progress updates to 200ms intervals to prevent stdout buffer bloat
- Fix file descriptor leak by properly closing FsFile in all code paths
- Fetch fresh presigned URL on each retry attempt to handle expiration
- Add bail logic for non-retryable 4xx errors (except 408/429)
- Stop progress bar on error to restore terminal state

Memory usage now O(1) per concurrent part instead of O(part_size),
enabling large file uploads on resource-constrained machines.

Amp-Thread-ID: https://ampcode.com/threads/T-2c59c1ba-dcaa-4c60-89c8-59c601116572
Co-authored-by: Amp <amp@ampcode.com>
@semanticdiff-com
Copy link

semanticdiff-com bot commented Oct 14, 2025

Review changes with  SemanticDiff

Changed Files
File Status
  src/lib/vm/image/upload.ts  31% smaller

Copy link
Contributor Author

joshi4 commented Oct 14, 2025

This stack of pull requests is managed by Graphite. Learn more about stacking.

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Summary

Replaces axios with native fetch and refactors chunk reading to fix unbounded memory growth in multipart uploads. The PR improves chunk size calculation (64 MiB default, up to 10k parts), adds file descriptor cleanup, throttles progress bar updates to 200ms intervals, fetches fresh presigned URLs on each retry attempt, and adds bail logic for non-retryable 4xx errors.

Key changes:

  • Memory footprint reduced from O(250 MiB × concurrency) to O(64 MiB × concurrency)
  • File descriptor leak fixed with proper finally cleanup in readChunk
  • Progress bar throttling prevents stdout buffer bloat during large uploads
  • Fresh URL fetching on retry handles presigned URL expiration
  • Non-retryable 4xx errors (except 408/429) now bail immediately instead of wasting retries

Issues found:

  • readChunk still buffers entire 64 MiB chunk in memory before upload starts, defeating the streaming optimization mentioned in PR description
  • setTimeout imported but unused

Confidence Score: 4/5

  • Safe to merge with one logical issue: chunk reading still buffers full 64 MiB in memory
  • The PR successfully fixes file descriptor leaks, adds proper error handling, and significantly reduces memory usage from 250 MiB to 64 MiB per concurrent part. However, the core memory optimization is incomplete - readChunk still loads entire chunks into RAM before upload. True streaming would use ReadableStream to read in small increments (e.g. 64 KiB). All other changes (throttled progress updates, fresh URL fetching, 4xx bail logic) are solid improvements.
  • Pay close attention to src/lib/vm/image/upload.ts:267-275 - the chunk reading implementation needs true streaming for O(1) memory usage

Important Files Changed

File Analysis

Filename Score Overview
src/lib/vm/image/upload.ts 4/5 Replaces axios with fetch, adds streaming chunk reading, throttles progress updates, fixes FD leak, fetches fresh URLs on retry, and adds 4xx bail logic

Sequence Diagram

sequenceDiagram
    participant CLI
    participant API as API Server
    participant R2 as R2 Storage
    participant Disk as Local File

    CLI->>API: POST /v1/vms/images/start_upload
    API-->>CLI: image_id
    
    CLI->>Disk: stat(filePath)
    Disk-->>CLI: fileSize
    
    Note over CLI: Calculate chunk size (64 MiB default)<br/>and number of parts
    
    loop For each part (with concurrency limit)
        loop Retry up to 5 times
            CLI->>API: POST /v1/vms/images/{image_id}/upload<br/>(fetch fresh presigned URL)
            API-->>R2: Generate presigned URL
            API-->>CLI: upload_url
            
            CLI->>Disk: readChunk(start, length, onProgress)
            Disk-->>CLI: chunk bytes (64 MiB max)
            
            CLI->>R2: PUT upload_url (fetch)<br/>body: chunk bytes
            R2-->>CLI: 200 OK or error
            
            alt Non-retryable 4xx error (not 408/429)
                CLI->>CLI: bail (stop retrying)
            else Retryable error
                Note over CLI: Reset part progress<br/>Wait with exponential backoff
            end
        end
        
        Note over CLI: Update progress bar<br/>(throttled to 200ms)
    end
    
    CLI->>Disk: Open file for streaming
    Disk-->>CLI: file stream
    
    Note over CLI: Calculate SHA256 hash<br/>using streaming
    
    CLI->>API: PUT /v1/vms/images/{image_id}/complete_upload<br/>{sha256_hash}
    API-->>CLI: Upload verified
    
    CLI-->>CLI: Display success message
Loading

1 file reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

- Change default chunk size from 250 MiB to 64 MiB
- Increase max parts from 100 to 10,000 (ObjectStore limit)
- Handle small files correctly (use file size for files ≤ 64 MiB)
- Last chunk correctly sized with Math.min to handle file boundary
- Use seek() + read() instead of non-existent readAt() method

Memory usage: 64 MiB × concurrency instead of 250 MiB

Amp-Thread-ID: https://ampcode.com/threads/T-2c59c1ba-dcaa-4c60-89c8-59c601116572
Co-authored-by: Amp <amp@ampcode.com>
@joshi4 joshi4 force-pushed the fix-memory-consumption-upload branch from c3c1b37 to 1897dc9 Compare October 14, 2025 20:57
@joshi4 joshi4 force-pushed the fix-memory-consumption-upload branch from 9dbb555 to 2762f99 Compare October 14, 2025 21:15
@joshi4 joshi4 force-pushed the fix-memory-consumption-upload branch from 2762f99 to f750f80 Compare October 14, 2025 21:20
Copy link
Contributor

@andreaanez andreaanez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - only comment is to check the behavior on completion for part upload retries

Copy link
Contributor Author

joshi4 commented Oct 14, 2025

seems like r2 will use the latest chunk for the partID.

@joshi4 joshi4 merged commit 85daa4c into main Oct 14, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants