Skip to content

Comments

GC: Fix GC deleted stats not correctly updated for data stores#18890

Merged
agarwal-navin merged 4 commits intomicrosoft:mainfrom
agarwal-navin:fixGCDeleteStats
Dec 21, 2023
Merged

GC: Fix GC deleted stats not correctly updated for data stores#18890
agarwal-navin merged 4 commits intomicrosoft:mainfrom
agarwal-navin:fixGCDeleteStats

Conversation

@agarwal-navin
Copy link
Contributor

Bug

#18506 added deleted stats to the GarbageCollection_end logs. However, the calculation for getting deleted data stores count is incorrect. It calls this.runtime.getNodeType(nodeId) to get the node's type which checks if the data store with the given nodeId exists. Since the data store has been deleted, it ends up returning the type as other.

Fix

In case of deleted nodes, the garbage collector gets the node type itself by looking the path. This should be fine because it is only used for stats and for the current node types (data store, DDS and blob), this will work fine.

@agarwal-navin agarwal-navin requested review from a team as code owners December 18, 2023 21:08
@github-actions github-actions bot added area: runtime Runtime related issues area: tests Tests to add, test infrastructure improvements, etc base: main PRs targeted against main branch labels Dec 18, 2023
@msfluid-bot
Copy link
Collaborator

msfluid-bot commented Dec 18, 2023

@fluid-example/bundle-size-tests: +1.21 KB
Metric NameBaseline SizeCompare SizeSize Diff
aqueduct.js 742.82 KB 743.03 KB +208 Bytes
connectionState.js 680 Bytes 680 Bytes No change
containerRuntime.js 541.65 KB 541.84 KB +186 Bytes
loader.js 166.8 KB 166.82 KB +22 Bytes
map.js 560.19 KB 560.38 KB +197 Bytes
matrix.js 658.42 KB 658.61 KB +197 Bytes
odspDriver.js 90.24 KB 90.26 KB +22 Bytes
odspPrefetchSnapshot.js 41.82 KB 41.83 KB +11 Bytes
sharedString.js 677.29 KB 677.48 KB +197 Bytes
sharedTree.js 790.56 KB 790.74 KB +186 Bytes
Total Size 4.29 MB 4.29 MB +1.21 KB

Baseline commit: 90c4ad2

Generated by 🚫 dangerJS against c937ba4

loaderProps: { configProvider: mockConfigProvider(settings) },
};
mainContainer = await provider.makeTestContainer(testContainerConfig);
mainDataStore = (await mainContainer.getEntryPoint()) as ITestDataObject;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename to mainDataObject

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same for summarizerDataStore below

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

beforeEach(async function () {
provider = getTestObjectProvider({ syncSummarizer: true });
// These tests validate the GC stats in summary by calling summarize directly on the container runtime.
// These tests validate the GC stats in summary by calling summarize directly on the mainContainer runtime.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should not say "directly on the mainContainer". Maybe update this comment to say more about disabling summarizer heurestics and only calling it/GC explicitly, on a separate container.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the comment.

const summarizerDataStore = (await summarizerContainer.getEntryPoint()) as ITestDataObject;
summarizerRuntime = summarizerDataStore._context.containerRuntime as ContainerRuntime;
summarizerDataStore._root.set("write", "mode");
await waitForContainerWriteModeConnectionWrite(summarizerContainer);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this necessary? ensureSynchronized calls throughout should be sufficient, I would think. Seems fine to leave it since it's more explicit, but wanted to ask.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is needed. I added a comment to explain why.

);

// Deleted stats. Wait for sweep timeout and send an op to update the current reference timestamp. Usually,
// GC wouldn't run without ops so this step is not needed but its needed here because we are explicitly
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo

Suggested change
// GC wouldn't run without ops so this step is not needed but its needed here because we are explicitly
// GC wouldn't run without ops so this step is not needed but it's needed here because we are explicitly

mainDataStore._root.set("update", "timestamp");
await provider.ensureSynchronized();

mainContainer.close();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's this for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment to explain why.

expectedGCStats.dataStoreCount -= 2;
expectedGCStats.unrefDataStoreCount -= 2;
expectedGCStats.deletedDataStoreCount += 2;
expectedGCStats.updatedDataStoreCount = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it makes more sense to include deletions in the "updated" count? It's fine either way as long as we remember how it works when looking at the data :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The updated count tells whether a node's reference state changed. It's mainly used to validate that data store are not re-summarized unexpectedly here.
If a data store is deleted, it's reference state doesn't change and it won't be re-summarized so there is no need to update it.
Like you said, we could update its meaning so that deleted in included but I don't think it adds anything yet.

@agarwal-navin agarwal-navin merged commit a61a831 into microsoft:main Dec 21, 2023
agarwal-navin added a commit to agarwal-navin/FluidFramework that referenced this pull request Jan 4, 2024
…soft#18890)

## Bug
microsoft#18506 added deleted stats to the GarbageCollection_end logs. However,
the calculation for getting deleted data stores count is incorrect. It
calls `this.runtime.getNodeType(nodeId)` to get the node's type which
checks if the data store with the given nodeId exists. Since the data
store has been deleted, it ends up returning the type as other.

## Fix
In case of deleted nodes, the garbage collector gets the node type
itself by looking the path. This should be fine because it is only used
for stats and for the current node types (data store, DDS and blob),
this will work fine.
agarwal-navin added a commit that referenced this pull request Jan 4, 2024
…enable sweep (#19087)

This PR ports the following changes to internal-8.0 branch. internal-8.0
contains changes that will help enable sweep and these are follow ups
that did not make it to the branch.
- [Initialize GC state as soon as possible and update it on
(re)connecti… · 40ad2d4
(github.com)](40ad2d4)
- [GC: Fix GC deleted stats not correctly updated for data stores by
agarwal-navin · Pull Request #18890 · microsoft/FluidFramework
(github.com)](#18890)
- [GC: Added whether tombstone is enabled to GarbageCollection_end event
by agarwal-navin · Pull Request #18927 · microsoft/FluidFramework
(github.com)](https://github.com/microsoft/FluidFramework/pull/18927/files)
agarwal-navin added a commit to agarwal-navin/FluidFramework that referenced this pull request Jan 10, 2024
…soft#18890)

## Bug
microsoft#18506 added deleted stats to the GarbageCollection_end logs. However,
the calculation for getting deleted data stores count is incorrect. It
calls `this.runtime.getNodeType(nodeId)` to get the node's type which
checks if the data store with the given nodeId exists. Since the data
store has been deleted, it ends up returning the type as other.

## Fix
In case of deleted nodes, the garbage collector gets the node type
itself by looking the path. This should be fine because it is only used
for stats and for the current node types (data store, DDS and blob),
this will work fine.
agarwal-navin added a commit that referenced this pull request Jan 23, 2024
…enable sweep (#19204)

This PR ports the following changes to internal-7.4 branch. internal-7.4
contains changes that will help enable sweep and these are follow ups
that did not make it to the branch.
- [Initialize GC state as soon as possible and update it on
(re)connecti… · 40ad2d4
(github.com)](40ad2d4)
- [GC: Fix GC deleted stats not correctly updated for data stores by
agarwal-navin · Pull Request #18890 · microsoft/FluidFramework
(github.com)](#18890)
- [GC: Added whether tombstone is enabled to GarbageCollection_end event
by agarwal-navin · Pull Request #18927 · microsoft/FluidFramework
(github.com)](https://github.com/microsoft/FluidFramework/pull/18927/files)
markfields pushed a commit to markfields/FluidFramework that referenced this pull request Feb 12, 2024
…enable sweep (microsoft#19204)

This PR ports the following changes to internal-7.4 branch. internal-7.4
contains changes that will help enable sweep and these are follow ups
that did not make it to the branch.
- [Initialize GC state as soon as possible and update it on
(re)connecti… · microsoft/FluidFramework@40ad2d4
(github.com)](microsoft@40ad2d4)
- [GC: Fix GC deleted stats not correctly updated for data stores by
agarwal-navin · Pull Request microsoft#18890 · microsoft/FluidFramework
(github.com)](microsoft#18890)
- [GC: Added whether tombstone is enabled to GarbageCollection_end event
by agarwal-navin · Pull Request microsoft#18927 · microsoft/FluidFramework
(github.com)](https://github.com/microsoft/FluidFramework/pull/18927/files)
@agarwal-navin agarwal-navin deleted the fixGCDeleteStats branch July 15, 2024 22:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: runtime Runtime related issues area: tests Tests to add, test infrastructure improvements, etc base: main PRs targeted against main branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants