feat: support streaming uploads #287

SgtPooki · 2025-12-09T22:22:02Z

Ensures all data upload flows stream content to avoid buffering large data in memory

fixes #190

> node dist/cli.js add ~/Downloads/Russell-dempsey-pinception-full-demo-not-chopped.mov --private-key $PRIVATE_KEY_SUB
--provider-id 2
┌  Filecoin Pin Add
│
◇  ✓ File validated (375.0 MiB)
│
◇  ✓ Connected to calibration
│
◇  ✓ Minimum payment setup verified (~0.066 USDFC required)
│
◇  ✓ File packed with root CID: bafybeiazl5i5zl3l3sz77vg6wn23b6ua6ducrhnnf4c243p2inbgilaloy
│
◇  ✓ IPFS content ready (375.0 MiB)
│
◇  ✓ Payment capacity verified for 375.0 MiB
│
◇  ✓ Storage context ready
│
│  Storage Context
│
│    Data Set ID: 48
│    Provider: ezpdpz-calib2
│
◇  ━━━ Add Complete ━━━
│
│  Network: calibration
│
│  Add Details
│    File: /Users/sgtpooki/Downloads/Russell-dempsey-pinception-full-demo-not-chopped.mov
│    Size: 375.0 MiB
│    Root CID: bafybeiazl5i5zl3l3sz77vg6wn23b6ua6ducrhnnf4c243p2inbgilaloy
│
│  Filecoin Storage
│    Piece CID: bafkzcibf5cjmcqqy3hgewxp5rlwzskltxnrzj56whz2ovkskzhicshwf2lzl4s2jz4ja
│    Piece ID: 11
│    Data Set ID: 48
│
│  Storage Provider
│    Provider ID: 2
│    Name: ezpdpz-calib2
│    Direct Download URL: https://calib2.ezpdpz.net/piece/bafkzcibf5cjmcqqy3hgewxp5rlwzskltxnrzj56whz2ovkskzhicshwf2lzl4s2jz4ja
│
└  Add completed successfully

> node dist/cli.js add ~/Downloads/FreeCAD_0.20.2-2022-12-27-conda-macOS-x86_64-py310.dmg --private-key $PRIVATE_KEY_SUB --provider-id 4
┌  Filecoin Pin Add
│
◇  ✓ File validated (865.6 MiB)
│
◇  ✓ Connected to calibration
│
◇  ✓ Minimum payment setup verified (~0.066 USDFC required)
│
◇  ✓ File packed with root CID: bafybeieqy35c5ndilmdrqfdxiqtbky6ku3xt2kbkwukopneik2kuwhgjp4
│
◇  ✓ IPFS content ready (865.7 MiB)
│
◇  ✓ Payment capacity verified for 865.7 MiB
│
◇  ✓ Storage context ready
│
│  Storage Context
│
│    Data Set ID: undefined
│    Provider: infrafolio-calib
│
◇  ━━━ Add Complete ━━━
│
│  Network: calibration
│
│  Add Details
│    File: /Users/sgtpooki/Downloads/FreeCAD_0.20.2-2022-12-27-conda-macOS-x86_64-py310.dmg
│    Size: 865.7 MiB
│    Root CID: bafybeieqy35c5ndilmdrqfdxiqtbky6ku3xt2kbkwukopneik2kuwhgjp4
│
│  Filecoin Storage
│    Piece CID: bafkzcibf62fjesyzkznfr2n2vl7iqcqdfuorrt7e6rtoaqds6gku73xusv5vvnfvcutq
│    Piece ID: 0
│    Data Set ID: 6218
│
│  Storage Provider
│    Provider ID: 4
│    Name: infrafolio-calib
│    Direct Download URL: https://caliberation-pdp.infrafolio.com/piece/bafkzcibf62fjesyzkznfr2n2vl7iqcqdfuorrt7e6rtoaqds6gku73xusv5vvnfvcutq
│
└  Add completed successfully

Copilot

Pull request overview

This PR successfully implements streaming uploads across the codebase to avoid buffering large files in memory. The changes convert all upload flows from using readFile() (which loads entire files into memory) to using createReadStream() with Readable.toWeb() to create Web ReadableStreams. This is a significant improvement for handling large file uploads, as demonstrated by the test examples showing successful uploads of 375 MiB and 865 MiB files.

Key Changes

Introduced UploadData type that accepts both Uint8Array and ReadableStream<Uint8Array> for backward compatibility
Replaced all readFile() calls for CAR files with streaming using createReadStream() and Readable.toWeb()
Updated tests to verify that streams (not byte arrays) are passed to upload functions

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
`src/core/upload/synapse.ts`	Added `UploadData` type definition and updated parameter documentation to support both Uint8Array and streaming uploads
`src/core/upload/index.ts`	Exported `UploadData` type and updated `executeUpload` function signature to accept streaming data
`src/common/upload-flow.ts`	Updated `performUpload` function to accept `UploadData` type for streaming support
`src/add/add.ts`	Replaced `readFile()` with streaming via `createReadStream()` and `Readable.toWeb()`, using `stat()` to get file size
`src/import/import.ts`	Replaced `readFile()` with streaming for CAR file uploads and updated spinner message to indicate streaming
`src/filecoin-pin-store.ts`	Updated CAR file upload to use streaming with explanatory comment about avoiding memory buffering
`upload-action/src/filecoin.js`	Converted to streaming uploads in GitHub Action context
`src/test/unit/add.test.ts`	Added `stat()` mock for file size retrieval, kept obsolete `readFile` mock that can be cleaned up
`src/test/unit/import.test.ts`	Updated test assertions to verify `ReadableStream` is passed instead of `Uint8Array`

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-09T22:28:59Z

src/test/unit/add.test.ts

 // We need to partially mock fs/promises to keep real file operations for test setup
-// but mock readFile for the CAR reading part
+// but mock readFile/stat for the CAR handling part
 vi.mock('node:fs/promises', async () => {
  const actual = await vi.importActual<typeof import('node:fs/promises')>('node:fs/promises')
  return {
    ...actual,
+    stat: vi.fn((path: string) => {
+      if (path === '/tmp/test.car') {
+        return Promise.resolve({
+          size: 1024,
+        } as any)
+      }
+      return actual.stat(path as any)
+    }),


The comment says "mock readFile/stat for the CAR handling part", but readFile is no longer used for CAR handling after switching to streaming. Only stat is now used to get the file size. The readFile mock (lines 106-113) is obsolete and can be removed.

rvagg · 2025-12-10T03:39:05Z

documentation/behind-the-scenes-of-adding-a-file.md

 *Expected duration:*

-This is a function of the size of the input file and the hardware. Typical DAGification of files and directories is relatively quick as it's simply a matter of chunking and hashing using common algorithms. The most time-consuming part is the generation of the ["Piece CID](glossary.md#piece-cid) of the whole CAR on the client side prior to upload, where a a 1Gb input can take upwards of a minute.  As the car is being created, it can be streamed to an SP, which is most likely the bottleneck.
+Depends on input size, local disk/CPU, and uplink speed. DAGification itself is mostly chunk-and-hash and is fast on modern hardware. As we build the CAR, we stream it directly to the SP (no full buffering); [Synapse](glossary.md#synapse) overlaps DAG creation, [Piece CID](glossary.md#piece-cid) calculation, and upload, so on a fast machine the bottleneck is usually your upload bandwidth.


This is not actually true though, there's still an interim createCarFromPath going on here; no buffering in memory but "as we build the CAR, we stream it" isn't how it works.

But, I've been intending to fix this. It means changing how we generate the CARs but we don't actually need the CAR to be properly ordered or have a correct header. Let me follow-up with a PR to this cause it's been on my mental list to solve since the begining, without actually describing it in an issue (I think).

Actually no, it's a bit more complicated because of metadata needing the root CID, but that will become easily fixable soon. I've documented it all in #288; so this PR is fine if you just tweak this sentence and we can use #288 as a follow-up when we can get to it (when FilOzone/synapse-sdk#494 is done).

rvagg · 2025-12-10T04:15:31Z

documentation/glossary.md


+## Streaming Uploads
+
+[Filecoin Pin](#filecoin-pin) streams CAR data to [Service Providers](#service-provider) via [Synapse](#synapse), so files are not buffered fully in memory during upload. The maximum supported piece size is bounded by the Synapse SDK and SP configuration (tracked in https://github.com/FilOzone/synapse-sdk/issues/110).


needs tweaking as well, in the browser we buffer up to 1GiB (1016MiB)

in fact, I don't think we even have early limits on it, so potentially someone could request an add of >1GiB and it'd just buffer that up and not error until we eventually try to upload() it; which could be nasty for a browser environment. We could add some checks in there if we want to be safe now (keeping in mind that final CAR size is going to be larger than input size, but even a check of inputsize<MAX_UPLOAD_SIZE would help). But also we could just put it off until #288 which will give us the check real-time as we're making the CAR from the input and hit that limit.

feat: support streaming uploads

62236c5

FilOzzy added team/filecoin-pin "Filecoin Pin" project is a stakeholder for this work. team/fs-wg FOC working group is a stakeholder for this work, and thus wants to track it on their project board. labels Dec 9, 2025

FilOzzy added this to FS Dec 9, 2025

github-project-automation bot moved this to 📌 Triage in FS Dec 9, 2025

SgtPooki mentioned this pull request Dec 9, 2025

Support large files in filecoin-pin #190

Open

3 tasks

SgtPooki requested a review from Copilot December 9, 2025 22:23

SgtPooki self-assigned this Dec 9, 2025

SgtPooki requested a review from rvagg December 9, 2025 22:23

Copilot started reviewing on behalf of SgtPooki December 9, 2025 22:23 View session

Copilot AI reviewed Dec 9, 2025

View reviewed changes

docs: update docs to clarify streaming and filesize

78a818c

rvagg reviewed Dec 10, 2025

View reviewed changes

rvagg mentioned this pull request Dec 10, 2025

True Streaming CAR Uploads #288

Open

rvagg reviewed Dec 10, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: support streaming uploads #287

feat: support streaming uploads #287

Uh oh!

SgtPooki commented Dec 9, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Dec 9, 2025

Uh oh!

rvagg Dec 10, 2025

Uh oh!

rvagg Dec 10, 2025

Uh oh!

rvagg Dec 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants


		## Streaming Uploads

		[Filecoin Pin](#filecoin-pin) streams CAR data to [Service Providers](#service-provider) via [Synapse](#synapse), so files are not buffered fully in memory during upload. The maximum supported piece size is bounded by the Synapse SDK and SP configuration (tracked in https://github.com/FilOzone/synapse-sdk/issues/110).

feat: support streaming uploads #287

Are you sure you want to change the base?

feat: support streaming uploads #287

Uh oh!

Conversation

SgtPooki commented Dec 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Key Changes

Reviewed changes

Uh oh!

Copilot AI Dec 9, 2025

Choose a reason for hiding this comment

Uh oh!

rvagg Dec 10, 2025

Choose a reason for hiding this comment

Uh oh!

rvagg Dec 10, 2025

Choose a reason for hiding this comment

Uh oh!

rvagg Dec 10, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

SgtPooki commented Dec 9, 2025 •

edited

Loading