Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write cache large blobs in chunks to avoid write file limits in Node #9355

Merged
merged 5 commits into from
Nov 6, 2023

Conversation

JakeLane
Copy link
Contributor

@JakeLane JakeLane commented Nov 2, 2023

↪️ Pull Request

In Node, there's a limit to the size we can read at once (2 GiB). We are hitting that for large monorepos at Atlassian. This makes the cache unusable for repeated usage, as the node API for readFile will always return an error.

A mitigation for this is to chunk the files in ~2GiB segments. This means the upper limit of blob size will be the buffer size limit.

On my machine, the buffer size limit is:

node -e "console.log(require('buffer').constants.MAX_LENGTH / 1.074e+9 + 'GiB')"
3.999038450651769GiB

This gives us some headroom to resolve large blob size issues.

@@ -120,9 +124,9 @@ export function loadGraphs(cacheDir: string): {|
return {assetGraph, bundleGraph, requestTracker, bundleInfo};
}

function loadLargeBlobRequestRequestSync(cacheDir, node) {
async function loadLargeBlobRequestRequestSync(cache, node) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why this was originally written to be sync, but if we're making it async then we should rename the function.

Copy link
Member

@mischnic mischnic Nov 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I recall correctly, I made loadGraphs sync just to make it easier to use from the Node REPL. So no big deal

@@ -12,7 +12,7 @@ export interface Cache {
setBlob(key: string, contents: Buffer | string): Promise<void>;
hasLargeBlob(key: string): Promise<boolean>;
getLargeBlob(key: string): Promise<Buffer>;
setLargeBlob(key: string, contents: Buffer | string): Promise<void>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this ever actually happens, but this is technically a breaking change for this package. Is it definitely required?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not required but I figured it would be a nice clean up to do, since it's adding complexity. Is there a convention for tightening APIs like this?

I'll remove the API change anyway, since it's relatively easy to implement

writePromises.push(
this.fs.writeFile(
this.#getFilePath(key, i),
contents.subarray(i * WRITE_LIMIT_CHUNK, (i + 1) * WRITE_LIMIT_CHUNK),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be a micro-optimization, but should we avoid this work when chunks.length === 1 seeing as that's most of the time?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah fair call, there's no copy cost as it references the same memory in the buffer, but there's no guarantee the work is trivial. Will change it when updating the rest of the code

@JakeLane JakeLane force-pushed the jlane2/chunk-large-blogs-to-disk branch from 862e9fe to 3a79744 Compare November 2, 2023 04:51
Copy link
Member

@lettertwo lettertwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 🚀

@mischnic mischnic merged commit c54ed43 into v2 Nov 6, 2023
15 of 16 checks passed
@mischnic mischnic deleted the jlane2/chunk-large-blogs-to-disk branch November 6, 2023 17:11
@alshdavid alshdavid mentioned this pull request Nov 13, 2023
@irismoini irismoini mentioned this pull request Dec 14, 2023
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants