Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Sync Writer API #69

Closed
Gozala opened this issue Mar 24, 2022 · 2 comments
Closed

Proposal: Sync Writer API #69

Gozala opened this issue Mar 24, 2022 · 2 comments

Comments

@Gozala
Copy link
Contributor

Gozala commented Mar 24, 2022

For dotStorage upload@2.0 we use CARs more or less as network packets. Specifically we want to allocate n bytes for a packet and keep writing blocks until desired size is reached. Then send that CAR out and start writing new blocks into a new CAR.

Unfortunately current writer API is not ideal for such a use case and imposes unecessary asynchrony. Here is the code I found myself writing.

const encode = async ({ blocks, roots = [CID.parse("bafkqaaa")] }) => {
    const { writer, out } = CAR.CarWriter.create(roots);
    for (const block of blocks) {
      writer.put(block);
    }
    writer.close();

    const chunks = [];
    let byteLength = 0;
    for await (const chunk of out) {
      chunks.push(chunk);
      byteLength += chunk.byteLength;
    }

    const buffer = new Uint8Array(byteLength);
    let byteOffset = 0;
    for (const chunk of chunks) {
      buffer.set(chunk, byteOffset);
      byteOffset += chunk.byteLength;
    }
    return buffer;
  };

It would be nice if we had another writer API to support such a use case, one that would not impose async reads from the output. Maybe something like:

declare function createWriter(buffer:ArrayBuffer, options?:WriterOptions): SyncBlockWriter

interface WriterOptions {
   roots?: CID[] // defaults to []
   byteOffset?: number // defaults to 0
   byteLength?: number // defaults to buffer.byteLength
   rootsCapacity?: number // defaults to total byteLength used by passed roots
}

interface SyncBlockWriter {
  /**
   * Throws an error if total root count is greater than root capacity specified at creation.
   */
  addRoots(roots:CID[]): void
  /**
   * Throws an error if buffer does not have enough capacity.
   */
  write(block: Block):void
  /**
   * Returns `Uint8Array` view into provided `ArrayBuffer` containing CAR bytes. Returned `Uint8Array` is a view
   * into a provided `ArrayBuffer` that was written.
   */
  close(): Uint8Array
}
@Gozala
Copy link
Contributor Author

Gozala commented Mar 24, 2022

I closed #68 as this provides more adequate API with a lot more predictable memory profile.

@Gozala
Copy link
Contributor Author

Gozala commented Mar 24, 2022

Quoting relevant comment from #68

It wouldn't really even need a byteLength argument because it's going to be holding on to a list of Uint8Arrays anyway and it just needs to create the header when it gets the roots and spit that out first. Nothing about the bytes that it buffers is impacted by the length of the header. You just get a one-shot addRoots() call to open the flood gates.

This is interesting point! In our use cases we generally know how many roots we'll have ahead of time even though we do not know what they're going to be. So it's fairly easy to tell how much memory needs to be allocated to fit the roots. And doing so would allow writing into buffer without having to hold onto blocks freeing memory sooner.

However if number of roots is unknown ahead of writes not having to allocate space is definitely going to be a better option. Given known use cases, I would still suggest going with proposed API one with unknown roots could easily be build as layer on top when/if desired.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant