GC: Persist SweepTimeout to summary and support using a shorter duration for testing#12331
Closed
markfields wants to merge 5 commits into
Closed
GC: Persist SweepTimeout to summary and support using a shorter duration for testing#12331markfields wants to merge 5 commits into
markfields wants to merge 5 commits into
Conversation
markfields
commented
Oct 7, 2022
| const pendingRuntimeState = context.pendingLocalState as IPendingRuntimeState | undefined; | ||
| const baseSnapshot: ISnapshotTree | undefined = pendingRuntimeState?.baseSnapshot ?? context.baseSnapshot; | ||
|
|
||
| //* Test to prove this doesn't get persisted as undefined? |
Member
Author
There was a problem hiding this comment.
This is a todo, might or might not do it, it would be an e2e test. When this is undefined due to storage not existing yet, it shouldn't be persisted because only Summarizer persists these values.
markfields
commented
Oct 7, 2022
| // If we're using TestOverrides to achieve a short Sweep Timeout, lock the buffer to 1ms. Otherwise 1 day. | ||
| this.sweepTimeoutBufferMs = | ||
| (this.maxSnapshotCacheDurationMs === 0) | ||
| ? 1 |
Contributor
There was a problem hiding this comment.
Why is this 1?
Member
Author
There was a problem hiding this comment.
Just felt like it should be non-zero 🤷♂️
| /** If this is present, the session for this container will expire after this time and the container will close */ | ||
| readonly sessionExpiryTimeoutMs?: number; | ||
| /** Maximum duration a snapshot may be cached and then loaded from for this container */ | ||
| readonly maxSnapshotCacheDurationMs?: number | "none"; |
Contributor
There was a problem hiding this comment.
Why are there two different types?
Member
Author
There was a problem hiding this comment.
There are 3 actually:
- undefined - old files created when the value was hardcoded to 5 days (or 2 days for even older files)
- none - files created at a hypothetical future time where the driver didn't implement the policy so GC shouldn't run
- number - files created by this code where the driver does implement the policy and passed in the policy's value
markfields
commented
Oct 12, 2022
| * Validates this scenario: When a GC node (data store or attachment blob) becomes inactive, i.e, it has been | ||
| * unreferenced for a certain amount of time, using the node results in an error telemetry. | ||
| */ | ||
| describeNoCompat("GC inactive nodes tests", (getTestObjectProvider) => { |
Member
Author
There was a problem hiding this comment.
This file is WIP - I will not be replacing the existing tests, but rather copying them first or updating them to run in 2 modes.
Member
Author
|
Replaced by #12419 |
markfields
added a commit
that referenced
this pull request
Oct 15, 2022
## Background As-is, the Sweep Timeout will always be at least 6 days due to hardcoded Snapshot Cache limit (5 days) and Buffer (1 day). This means it's impossible to test the Sweep Timer and the code that runs after something is SweepReady! There have been multiple attempts on my part to address this: #11084, #12044, #12331 Much of the difficulty has arisen from the challenge of aligning driver policy with GC config. I'm finally punting on that and tracking it as a separate item ([AB#2238](https://dev.azure.com/fluidframework/235294da-091d-4c29-84fc-cdfc3d90890b/_workitems/edit/2238)). ## This change It's pretty simple: * Persist SweepTimeoutMs in the summary * For new containers, that can be overridden via the `"Fluid.GarbageCollection.TestOverride.SweepTimeoutMs"` config setting * Default behavior for new containers (and for backfilling existing containers with no value persisted) is SweepTimeout + 6 days (same as it has been)
sharptrip
pushed a commit
to sharptrip/FluidFramework
that referenced
this pull request
Oct 17, 2022
## Background As-is, the Sweep Timeout will always be at least 6 days due to hardcoded Snapshot Cache limit (5 days) and Buffer (1 day). This means it's impossible to test the Sweep Timer and the code that runs after something is SweepReady! There have been multiple attempts on my part to address this: microsoft#11084, microsoft#12044, microsoft#12331 Much of the difficulty has arisen from the challenge of aligning driver policy with GC config. I'm finally punting on that and tracking it as a separate item ([AB#2238](https://dev.azure.com/fluidframework/235294da-091d-4c29-84fc-cdfc3d90890b/_workitems/edit/2238)). ## This change It's pretty simple: * Persist SweepTimeoutMs in the summary * For new containers, that can be overridden via the `"Fluid.GarbageCollection.TestOverride.SweepTimeoutMs"` config setting * Default behavior for new containers (and for backfilling existing containers with no value persisted) is SweepTimeout + 6 days (same as it has been)
markfields
added a commit
to markfields/FluidFramework
that referenced
this pull request
Oct 17, 2022
## Background As-is, the Sweep Timeout will always be at least 6 days due to hardcoded Snapshot Cache limit (5 days) and Buffer (1 day). This means it's impossible to test the Sweep Timer and the code that runs after something is SweepReady! There have been multiple attempts on my part to address this: microsoft#11084, microsoft#12044, microsoft#12331 Much of the difficulty has arisen from the challenge of aligning driver policy with GC config. I'm finally punting on that and tracking it as a separate item ([AB#2238](https://dev.azure.com/fluidframework/235294da-091d-4c29-84fc-cdfc3d90890b/_workitems/edit/2238)). ## This change It's pretty simple: * Persist SweepTimeoutMs in the summary * For new containers, that can be overridden via the `"Fluid.GarbageCollection.TestOverride.SweepTimeoutMs"` config setting * Default behavior for new containers (and for backfilling existing containers with no value persisted) is SweepTimeout + 6 days (same as it has been)
markfields
added a commit
that referenced
this pull request
Oct 17, 2022
_back-port of #12419 to internal.2.0 release_ ## Background As-is, the Sweep Timeout will always be at least 6 days due to hardcoded Snapshot Cache limit (5 days) and Buffer (1 day). This means it's impossible to test the Sweep Timer and the code that runs after something is SweepReady! There have been multiple attempts on my part to address this: #11084, #12044, #12331 Much of the difficulty has arisen from the challenge of aligning driver policy with GC config. I'm finally punting on that and tracking it as a separate item ([AB#2238](https://dev.azure.com/fluidframework/235294da-091d-4c29-84fc-cdfc3d90890b/_workitems/edit/2238)). ## This change It's pretty simple: * Persist SweepTimeoutMs in the summary * For new containers, that can be overridden via the `"Fluid.GarbageCollection.TestOverride.SweepTimeoutMs"` config setting * Default behavior for new containers (and for backfilling existing containers with no value persisted) is SweepTimeout + 6 days (same as it has been)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Important update
This PR seems to have a fatal problem: The initial summary (created while detached) cannot include the driver's snapshot cache policy. So we have to persist some placeholder value and depend on a later summary to fill it in. But this means that there is no authoritative value, as different sessions could have different policies in effect.
Discussing this with Navin, we'll see what the outcome is. One possibility is to close this in favor of #12419
Description
This does a few key things:
It does not change the logic around overriding Session Expiry, and uses a very simple scheme for overriding buffer, tying it to whether SnapshotCache is 0 or not.
We can iterate on strategy for setting these values for new containers over time - the important part is to get the values persisted and always use those values for existing containers.
See previous PR #12044
Reviewer Guidance
Does this introduce a breaking change?
Any relevant logs or outputs
Other information or known dependencies