Skip to content

Conversation

@SgtPooki
Copy link
Collaborator

@SgtPooki SgtPooki commented Dec 9, 2025

Ensures all data upload flows stream content to avoid buffering large data in memory

fixes #190

> node dist/cli.js add ~/Downloads/Russell-dempsey-pinception-full-demo-not-chopped.mov --private-key $PRIVATE_KEY_SUB
--provider-id 2
┌  Filecoin Pin Add
│
◇  ✓ File validated (375.0 MiB)
│
◇  ✓ Connected to calibration
│
◇  ✓ Minimum payment setup verified (~0.066 USDFC required)
│
◇  ✓ File packed with root CID: bafybeiazl5i5zl3l3sz77vg6wn23b6ua6ducrhnnf4c243p2inbgilaloy
│
◇  ✓ IPFS content ready (375.0 MiB)
│
◇  ✓ Payment capacity verified for 375.0 MiB
│
◇  ✓ Storage context ready
│
│  Storage Context
│
│    Data Set ID: 48
│    Provider: ezpdpz-calib2
│
◇  ━━━ Add Complete ━━━
│
│  Network: calibration
│
│  Add Details
│    File: /Users/sgtpooki/Downloads/Russell-dempsey-pinception-full-demo-not-chopped.mov
│    Size: 375.0 MiB
│    Root CID: bafybeiazl5i5zl3l3sz77vg6wn23b6ua6ducrhnnf4c243p2inbgilaloy
│
│  Filecoin Storage
│    Piece CID: bafkzcibf5cjmcqqy3hgewxp5rlwzskltxnrzj56whz2ovkskzhicshwf2lzl4s2jz4ja
│    Piece ID: 11
│    Data Set ID: 48
│
│  Storage Provider
│    Provider ID: 2
│    Name: ezpdpz-calib2
│    Direct Download URL: https://calib2.ezpdpz.net/piece/bafkzcibf5cjmcqqy3hgewxp5rlwzskltxnrzj56whz2ovkskzhicshwf2lzl4s2jz4ja
│
└  Add completed successfully
> node dist/cli.js add ~/Downloads/FreeCAD_0.20.2-2022-12-27-conda-macOS-x86_64-py310.dmg --private-key $PRIVATE_KEY_SUB --provider-id 4
┌  Filecoin Pin Add
│
◇  ✓ File validated (865.6 MiB)
│
◇  ✓ Connected to calibration
│
◇  ✓ Minimum payment setup verified (~0.066 USDFC required)
│
◇  ✓ File packed with root CID: bafybeieqy35c5ndilmdrqfdxiqtbky6ku3xt2kbkwukopneik2kuwhgjp4
│
◇  ✓ IPFS content ready (865.7 MiB)
│
◇  ✓ Payment capacity verified for 865.7 MiB
│
◇  ✓ Storage context ready
│
│  Storage Context
│
│    Data Set ID: undefined
│    Provider: infrafolio-calib
│
◇  ━━━ Add Complete ━━━
│
│  Network: calibration
│
│  Add Details
│    File: /Users/sgtpooki/Downloads/FreeCAD_0.20.2-2022-12-27-conda-macOS-x86_64-py310.dmg
│    Size: 865.7 MiB
│    Root CID: bafybeieqy35c5ndilmdrqfdxiqtbky6ku3xt2kbkwukopneik2kuwhgjp4
│
│  Filecoin Storage
│    Piece CID: bafkzcibf62fjesyzkznfr2n2vl7iqcqdfuorrt7e6rtoaqds6gku73xusv5vvnfvcutq
│    Piece ID: 0
│    Data Set ID: 6218
│
│  Storage Provider
│    Provider ID: 4
│    Name: infrafolio-calib
│    Direct Download URL: https://caliberation-pdp.infrafolio.com/piece/bafkzcibf62fjesyzkznfr2n2vl7iqcqdfuorrt7e6rtoaqds6gku73xusv5vvnfvcutq
│
└  Add completed successfully

@FilOzzy FilOzzy added team/filecoin-pin "Filecoin Pin" project is a stakeholder for this work. team/fs-wg FOC working group is a stakeholder for this work, and thus wants to track it on their project board. labels Dec 9, 2025
@FilOzzy FilOzzy added this to FS Dec 9, 2025
@github-project-automation github-project-automation bot moved this to 📌 Triage in FS Dec 9, 2025
@SgtPooki SgtPooki requested a review from Copilot December 9, 2025 22:23
@SgtPooki SgtPooki self-assigned this Dec 9, 2025
@SgtPooki SgtPooki requested a review from rvagg December 9, 2025 22:23
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR successfully implements streaming uploads across the codebase to avoid buffering large files in memory. The changes convert all upload flows from using readFile() (which loads entire files into memory) to using createReadStream() with Readable.toWeb() to create Web ReadableStreams. This is a significant improvement for handling large file uploads, as demonstrated by the test examples showing successful uploads of 375 MiB and 865 MiB files.

Key Changes

  • Introduced UploadData type that accepts both Uint8Array and ReadableStream<Uint8Array> for backward compatibility
  • Replaced all readFile() calls for CAR files with streaming using createReadStream() and Readable.toWeb()
  • Updated tests to verify that streams (not byte arrays) are passed to upload functions

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
src/core/upload/synapse.ts Added UploadData type definition and updated parameter documentation to support both Uint8Array and streaming uploads
src/core/upload/index.ts Exported UploadData type and updated executeUpload function signature to accept streaming data
src/common/upload-flow.ts Updated performUpload function to accept UploadData type for streaming support
src/add/add.ts Replaced readFile() with streaming via createReadStream() and Readable.toWeb(), using stat() to get file size
src/import/import.ts Replaced readFile() with streaming for CAR file uploads and updated spinner message to indicate streaming
src/filecoin-pin-store.ts Updated CAR file upload to use streaming with explanatory comment about avoiding memory buffering
upload-action/src/filecoin.js Converted to streaming uploads in GitHub Action context
src/test/unit/add.test.ts Added stat() mock for file size retrieval, kept obsolete readFile mock that can be cleaned up
src/test/unit/import.test.ts Updated test assertions to verify ReadableStream is passed instead of Uint8Array

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 92 to +105
// We need to partially mock fs/promises to keep real file operations for test setup
// but mock readFile for the CAR reading part
// but mock readFile/stat for the CAR handling part
vi.mock('node:fs/promises', async () => {
const actual = await vi.importActual<typeof import('node:fs/promises')>('node:fs/promises')
return {
...actual,
stat: vi.fn((path: string) => {
if (path === '/tmp/test.car') {
return Promise.resolve({
size: 1024,
} as any)
}
return actual.stat(path as any)
}),
Copy link

Copilot AI Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment says "mock readFile/stat for the CAR handling part", but readFile is no longer used for CAR handling after switching to streaming. Only stat is now used to get the file size. The readFile mock (lines 106-113) is obsolete and can be removed.

Copilot uses AI. Check for mistakes.
*Expected duration:*

This is a function of the size of the input file and the hardware. Typical DAGification of files and directories is relatively quick as it's simply a matter of chunking and hashing using common algorithms. The most time-consuming part is the generation of the ["Piece CID](glossary.md#piece-cid) of the whole CAR on the client side prior to upload, where a a 1Gb input can take upwards of a minute. As the car is being created, it can be streamed to an SP, which is most likely the bottleneck.
Depends on input size, local disk/CPU, and uplink speed. DAGification itself is mostly chunk-and-hash and is fast on modern hardware. As we build the CAR, we stream it directly to the SP (no full buffering); [Synapse](glossary.md#synapse) overlaps DAG creation, [Piece CID](glossary.md#piece-cid) calculation, and upload, so on a fast machine the bottleneck is usually your upload bandwidth.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not actually true though, there's still an interim createCarFromPath going on here; no buffering in memory but "as we build the CAR, we stream it" isn't how it works.

But, I've been intending to fix this. It means changing how we generate the CARs but we don't actually need the CAR to be properly ordered or have a correct header. Let me follow-up with a PR to this cause it's been on my mental list to solve since the begining, without actually describing it in an issue (I think).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually no, it's a bit more complicated because of metadata needing the root CID, but that will become easily fixable soon. I've documented it all in #288; so this PR is fine if you just tweak this sentence and we can use #288 as a follow-up when we can get to it (when FilOzone/synapse-sdk#494 is done).


## Streaming Uploads

[Filecoin Pin](#filecoin-pin) streams CAR data to [Service Providers](#service-provider) via [Synapse](#synapse), so files are not buffered fully in memory during upload. The maximum supported piece size is bounded by the Synapse SDK and SP configuration (tracked in https://github.com/FilOzone/synapse-sdk/issues/110).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs tweaking as well, in the browser we buffer up to 1GiB (1016MiB)

in fact, I don't think we even have early limits on it, so potentially someone could request an add of >1GiB and it'd just buffer that up and not error until we eventually try to upload() it; which could be nasty for a browser environment. We could add some checks in there if we want to be safe now (keeping in mind that final CAR size is going to be larger than input size, but even a check of inputsize<MAX_UPLOAD_SIZE would help). But also we could just put it off until #288 which will give us the check real-time as we're making the CAR from the input and hit that limit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

team/filecoin-pin "Filecoin Pin" project is a stakeholder for this work. team/fs-wg FOC working group is a stakeholder for this work, and thus wants to track it on their project board.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support large files in filecoin-pin

4 participants