-
Notifications
You must be signed in to change notification settings - Fork 9
feat: support streaming uploads #287
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR successfully implements streaming uploads across the codebase to avoid buffering large files in memory. The changes convert all upload flows from using readFile() (which loads entire files into memory) to using createReadStream() with Readable.toWeb() to create Web ReadableStreams. This is a significant improvement for handling large file uploads, as demonstrated by the test examples showing successful uploads of 375 MiB and 865 MiB files.
Key Changes
- Introduced
UploadDatatype that accepts bothUint8ArrayandReadableStream<Uint8Array>for backward compatibility - Replaced all
readFile()calls for CAR files with streaming usingcreateReadStream()andReadable.toWeb() - Updated tests to verify that streams (not byte arrays) are passed to upload functions
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
src/core/upload/synapse.ts |
Added UploadData type definition and updated parameter documentation to support both Uint8Array and streaming uploads |
src/core/upload/index.ts |
Exported UploadData type and updated executeUpload function signature to accept streaming data |
src/common/upload-flow.ts |
Updated performUpload function to accept UploadData type for streaming support |
src/add/add.ts |
Replaced readFile() with streaming via createReadStream() and Readable.toWeb(), using stat() to get file size |
src/import/import.ts |
Replaced readFile() with streaming for CAR file uploads and updated spinner message to indicate streaming |
src/filecoin-pin-store.ts |
Updated CAR file upload to use streaming with explanatory comment about avoiding memory buffering |
upload-action/src/filecoin.js |
Converted to streaming uploads in GitHub Action context |
src/test/unit/add.test.ts |
Added stat() mock for file size retrieval, kept obsolete readFile mock that can be cleaned up |
src/test/unit/import.test.ts |
Updated test assertions to verify ReadableStream is passed instead of Uint8Array |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // We need to partially mock fs/promises to keep real file operations for test setup | ||
| // but mock readFile for the CAR reading part | ||
| // but mock readFile/stat for the CAR handling part | ||
| vi.mock('node:fs/promises', async () => { | ||
| const actual = await vi.importActual<typeof import('node:fs/promises')>('node:fs/promises') | ||
| return { | ||
| ...actual, | ||
| stat: vi.fn((path: string) => { | ||
| if (path === '/tmp/test.car') { | ||
| return Promise.resolve({ | ||
| size: 1024, | ||
| } as any) | ||
| } | ||
| return actual.stat(path as any) | ||
| }), |
Copilot
AI
Dec 9, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment says "mock readFile/stat for the CAR handling part", but readFile is no longer used for CAR handling after switching to streaming. Only stat is now used to get the file size. The readFile mock (lines 106-113) is obsolete and can be removed.
| *Expected duration:* | ||
|
|
||
| This is a function of the size of the input file and the hardware. Typical DAGification of files and directories is relatively quick as it's simply a matter of chunking and hashing using common algorithms. The most time-consuming part is the generation of the ["Piece CID](glossary.md#piece-cid) of the whole CAR on the client side prior to upload, where a a 1Gb input can take upwards of a minute. As the car is being created, it can be streamed to an SP, which is most likely the bottleneck. | ||
| Depends on input size, local disk/CPU, and uplink speed. DAGification itself is mostly chunk-and-hash and is fast on modern hardware. As we build the CAR, we stream it directly to the SP (no full buffering); [Synapse](glossary.md#synapse) overlaps DAG creation, [Piece CID](glossary.md#piece-cid) calculation, and upload, so on a fast machine the bottleneck is usually your upload bandwidth. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not actually true though, there's still an interim createCarFromPath going on here; no buffering in memory but "as we build the CAR, we stream it" isn't how it works.
But, I've been intending to fix this. It means changing how we generate the CARs but we don't actually need the CAR to be properly ordered or have a correct header. Let me follow-up with a PR to this cause it's been on my mental list to solve since the begining, without actually describing it in an issue (I think).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually no, it's a bit more complicated because of metadata needing the root CID, but that will become easily fixable soon. I've documented it all in #288; so this PR is fine if you just tweak this sentence and we can use #288 as a follow-up when we can get to it (when FilOzone/synapse-sdk#494 is done).
|
|
||
| ## Streaming Uploads | ||
|
|
||
| [Filecoin Pin](#filecoin-pin) streams CAR data to [Service Providers](#service-provider) via [Synapse](#synapse), so files are not buffered fully in memory during upload. The maximum supported piece size is bounded by the Synapse SDK and SP configuration (tracked in https://github.com/FilOzone/synapse-sdk/issues/110). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
needs tweaking as well, in the browser we buffer up to 1GiB (1016MiB)
in fact, I don't think we even have early limits on it, so potentially someone could request an add of >1GiB and it'd just buffer that up and not error until we eventually try to upload() it; which could be nasty for a browser environment. We could add some checks in there if we want to be safe now (keeping in mind that final CAR size is going to be larger than input size, but even a check of inputsize<MAX_UPLOAD_SIZE would help). But also we could just put it off until #288 which will give us the check real-time as we're making the CAR from the input and hit that limit.
Ensures all data upload flows stream content to avoid buffering large data in memory
fixes #190