Skip to content
This repository has been archived by the owner on Aug 2, 2022. It is now read-only.

Fix a leak where FileWatchers are added but never removed #36

Merged

Conversation

jschmitz28
Copy link
Contributor

Description

Fix a leak where FileWatchers are added but never removed upon being expired from the cache, and some minor updates to privatize the cache by calling the public API in consumers

Prior to this change, FileWatchers to track when a KNN index is deleted are added to the ResourceWatcherService but never removed even after indexes have been deleted, which causes the set of watchers to grow unbounded over time. This change updates the logic to clean up FileWatchers upon cache eviction.

Jeff Schmitz added 4 commits February 4, 2020 11:53
…expired from the cache, and some minor updates to privatize the cache by calling the public API in consumers

Prior to this change, FileWatchers to track when a KNN index is deleted are added to the ResourceWatcherService but never
removed even after indexes have been deleted, which causes the set of watchers to grow unbounded over time. This change
updates the logic to clean up FileWatchers upon cache eviction.
Copy link
Member

@vamshin vamshin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good Catch. thanks for the PR.


private static class KNNIndexCacheEntry {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: java doc for class

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will add

private void onRemoval(RemovalNotification<String, KNNIndexCacheEntry> removalNotification) {
KNNIndexCacheEntry knnIndexCacheEntry = removalNotification.getValue();

knnIndexCacheEntry.getFileWatcherHandle().stop();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add try/finally block and move knnIndexCacheEntry.getFileWatcherHandle().stop(); to finally block to ensure the watcher is removed as file is deleted at this point?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clarify what would be wrapped in the try{}? knnIndexCacheEntry.getFileWatcherHandle().stop() should not throw - it just removes the watcher from the set, and doesn't declare any kind of exceptions.

Copy link
Member

@vamshin vamshin Feb 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was referring to ensuring knnIndexCacheEntry.getFileWatcherHandle().stop() being called when cache invalidate happens. some thing like this

try {
        KNNIndexCacheEntry knnIndexCacheEntry = removalNotification.getValue();
               executor.execute(() -> knnIndexCacheEntry.getKnnIndex().close());
} finally {
       knnIndexCacheEntry.getFileWatcherHandle().stop();
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a perceived difference in behavior here? Both should be called regardless - in the case of the PR, FileWatcher.stop is called first, and should doesn't have an expected failure mode, and then we submit a task to gc the index.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree there is no failure mode at this point. Ideally we want to ensure both gc() and stop() always get called in this function. We can defer this until we use functions that throws.

// the entry
fileWatcher.init();

final KNNIndex knnIndex = KNNIndex.loadIndex(indexPathUrl, algoParams);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should first do resourceWatcherService.add and then load the index.

  • resourceWatcherService.add seems to already call init() on the filewatcher. So no explicit call to init()
  • this will also address the case of index not loaded if file is only present

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clarify what you mean by the second case?

For the first case, the reason why I'm calling loadIndex in between FileWatcher.init and resourceWatcherService.add is because I'm still not totally confident about the guarantee of no search threads executing while the file is deleted.

By calling init() first, we know with certainty whether the file existed at the time we started loading the graph, and if it did not exist, we are guaranteed that loadIndex will fail. If we also start the monitor before loading the index, then in the case that we're wrong about the guarantee mentioned above, we're effectively creating a multi-second window (depending on index loading time) for which the watcher could notify about a file delete, and invalidate() would cause a no-op.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense. Thanks

Comment on lines +201 to +204
// TODO verify that this is safe - ideally we'd explicitly ensure that the FileWatcher is only checked
// after the guava cache has finished loading the key to avoid a race condition where the watcher
// causes us to invalidate an entry before the key has been fully loaded.
final WatcherHandle<FileWatcher> watcherHandle = resourceWatcherService.add(fileWatcher);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think guava cache takes the lock when loading the entry for the key. So when invalidate happens, it should happen after the graph is loaded?
https://github.com/google/guava/blob/master/guava/src/com/google/common/cache/Cache.java#L91

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the following gist. Guava invalidate(key) does not block but is instead a no-op when executed if guava hasn't fully loaded the key into the cache: https://gist.github.com/jschmitz28/c8ff2d3b5856f625cf1f5a98095b7a51

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. You mean invalidating entry would not interrupt loading graph but no-op would make us miss gc() on the graph. Agree this minimizes the risk of race condition assuming the index deletion happens during search.

Copy link
Member

@vamshin vamshin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for the changes

Copy link
Member

@jmazanec15 jmazanec15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thanks for the fix!

@vamshin vamshin merged commit 777db2a into opendistro-for-elasticsearch:master Feb 5, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants