Skip to content

refactor(container-runtime): Stop estimating batch size when accumulating a local batch #24309

Merged
markfields merged 10 commits intomicrosoft:mainfrom
markfields:cr/batch-too-large
Apr 12, 2025
Merged

refactor(container-runtime): Stop estimating batch size when accumulating a local batch #24309
markfields merged 10 commits intomicrosoft:mainfrom
markfields:cr/batch-too-large

Conversation

@markfields
Copy link
Copy Markdown
Member

@markfields markfields commented Apr 10, 2025

Description

We are preparing to update LocalBatchMessage to hold the original runtime op, not the serialized op. So it will no longer be practical to estimate the content size of each op (we don't want to do an extra JSON stringify just for this purpose).

The downside is if compression is disabled and someone submits an op that will likely push the batch over the threshold, they won't get that error under that submit callstack like they used to. Instead it will only surface when the container closes during flush. If this is a problem at some point, we may be able to keep a lambda that will create the callstack from the original closure, like this.

Reviewer Guidance

Here's a comparison of the places we would fail before / after this change:

Before:

  • If no compression, then if est socket size is too big, throw when submitting. (in Outbox.addMessageToBatchManager)
    • "If no compression" because hardLimit would be infinity if compression was enabled
  • If yes compression (and it happens), then throw if it was insufficient (in Outbox.virtualizeBatch)

Now:

  • If yes compression (and it happens), then throw if it was insufficient (in Outbox.virtualizeBatch) -- unchanged from before
  • Otherwise (no compression happened, or it did and appeared sufficient), if est socket size is too big, throw before send (in Outbox.sendBatch)
    • This case used to be possible even after compression if there were soooo many empty placeholders that it pushed it over the limit. The log I'm deleting from there ("LargeBatch") is hit for some internal apps on old versions, when we did empty placeholders. The log has not been hit on any recent version.
    • But now, if compression is sufficient or unnecessary, this will surely pass as well. So this is just guarding the case where compression was disabled, so it's quite equivalent to the original check during submit.

Copilot AI review requested due to automatic review settings April 10, 2025 06:20
@github-actions github-actions Bot added area: runtime Runtime related issues base: main PRs targeted against main branch labels Apr 10, 2025
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 10 out of 10 changed files in this pull request and generated no comments.

Comments suppressed due to low confidence (2)

packages/runtime/container-runtime/src/opLifecycle/outbox.ts:539

  • Please clarify the rationale behind not using socketSize in this check or update the logic to consistently rely on socketSize, ensuring the error handling is clear.
//* TODO: Why isn't this socketSize?

packages/runtime/container-runtime/src/test/opLifecycle/outbox.spec.ts:464

  • Implement tests for batch size estimation to validate the new behavior, ensuring that any edge cases arising from the removal of the estimation logic are properly covered.
//* TODO: Implement tests like these

@github-actions github-actions Bot added the area: tests Tests to add, test infrastructure improvements, etc label Apr 11, 2025
/* reentrant */ false,
),
true,
batchManager.push(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now instead of asserting, it would fail if it somehow hit an error while pushing?

Copy link
Copy Markdown
Member Author

@markfields markfields Apr 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now push doesn't do anything but put it in an array, so it wouldn't really ever throw. The change is just that there's no return value anymore (nothing to say - it always adds it)

assert.equal(batchManager.contentSizeInBytes, smallMessageSize * batchManager.length);
batchManager.push(smallMessage(), /* reentrant */ false);
batchManager.push(smallMessage(), /* reentrant */ false);
batchManager.push(smallMessage(), /* reentrant */ false);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this test and Batch metadata is set correctly [${includeBatchId ? "with" : "without"} batchId] doing the same thing now?
I don't see where you are checking the batch content size in this test.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Removed this and added a similar test to cover where this code under test moved to: localBatchToOutboundBatch

Copy link
Copy Markdown
Contributor

@MarioJGMsoft MarioJGMsoft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

@markfields markfields merged commit 68d81eb into microsoft:main Apr 12, 2025
33 checks passed
@markfields markfields deleted the cr/batch-too-large branch April 13, 2025 00:33
noencke added a commit to noencke/FluidFramework that referenced this pull request Apr 14, 2025
markfields added a commit to markfields/FluidFramework that referenced this pull request Apr 15, 2025
markfields added a commit that referenced this pull request Apr 15, 2025
… accumulating a local batch (#24363)

_Re-do of #24309 with less strict assertion conditions in
`messageSize.spec.ts`. See df114eac27269bd2c7b90c1e8125034c68e96c98_

## PR Description (from #24309)

We are preparing to update `LocalBatchMessage` to hold the original
runtime op, not the serialized op. So it will no longer be practical to
estimate the content size of each op (we don't want to do an extra JSON
stringify just for this purpose).

Here's a comparison of the places we would fail before / after this
change:

Before:
* If no compression, then if est socket size is too big, throw when
submitting. (in `Outbox.addMessageToBatchManager`)
* "If no compression" because `hardLimit` would be infinity if
compression was enabled
* If yes compression (and it happens), then throw if it was insufficient
(in `Outbox.virtualizeBatch`)

Now:
* If yes compression (and it happens), then throw if it was insufficient
(in `Outbox.virtualizeBatch`) -- _unchanged from before_
* Otherwise (no compression happened, or it did and appeared
sufficient), if est socket size is too big, throw before send (in
`Outbox.sendBatch`)
* This case used to be possible even after compression if there were
soooo many empty placeholders that it pushed it over the limit. The log
I'm deleting from there ("LargeBatch") is hit for some internal apps on
old versions, when we did empty placeholders. The log has not been hit
on any recent version.
* But now, if compression is sufficient or unnecessary, this will surely
pass as well. So this is just guarding the case where compression was
disabled, so it's quite equivalent to the original check during submit.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: runtime Runtime related issues area: tests Tests to add, test infrastructure improvements, etc base: main PRs targeted against main branch Feature_StagingMode

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants