New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use a dedicated thread pool for searchable snapshot cache prewarming #59313
Merged
tlrx
merged 3 commits into
elastic:master
from
tlrx:use-dedicated-thread-pool-for-prewarming
Jul 15, 2020
Merged
Use a dedicated thread pool for searchable snapshot cache prewarming #59313
tlrx
merged 3 commits into
elastic:master
from
tlrx:use-dedicated-thread-pool-for-prewarming
Jul 15, 2020
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
tlrx
added
:Distributed/Snapshot/Restore
Anything directly related to the `_snapshot/*` APIs
v7.8.2
v7.9.0
v8.0.0
labels
Jul 9, 2020
Pinging @elastic/es-distributed (:Distributed/Snapshot/Restore) |
tlrx
added
>enhancement
and removed
Team:Distributed
Meta label for distributed team
labels
Jul 9, 2020
DaveCTurner
approved these changes
Jul 14, 2020
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks David! |
tlrx
added a commit
to tlrx/elasticsearch
that referenced
this pull request
Jul 15, 2020
…lastic#59313) Since elastic#58728 writing operations on searchable snapshot directory cache files are executed in an asynchronous manner using a dedicated thread pool. The thread pool used is searchable_snapshots which has been created to execute prewarming tasks. Reusing the same thread pool wasn't a good idea as it can lead to deadlock situations. One of these situation arose in a test failure where the thread pool was full of prewarming tasks, all waiting for a cache file to be accessible, while the cache file was being evicted by the cache service. But such an eviction can only be processed when all read/write operations on the cache file are completed and in this case the deadlock occurred because the cache file was actively being read by a concurrent search which also won the privilege to write the range of bytes in cache... and this writing operation could never have been completed because of the prewarming tasks making no progress and filling up the thread pool. This commit renames the searchable_snapshots thread pool to searchable_snapshots_cache_fetch_async. Assertions are added to assert that cache writes are executed using this thread pool and to assert that read on cached index inputs are executed using a different thread pool to avoid potential deadlock situations. This commit also adds a searchable_snapshots_cache_prewarming that is used to execute prewarming tasks. It also converts the existing cache prewarming test into a more complte integration test that creates multiple searchable snapshot indices concurrently with randomized thread pool sizes, and verifies that all files have been correctly prewarmed.
This was referenced Jul 15, 2020
tlrx
added a commit
that referenced
this pull request
Jul 15, 2020
…59313) (#59590) Since #58728 writing operations on searchable snapshot directory cache files are executed in an asynchronous manner using a dedicated thread pool. The thread pool used is searchable_snapshots which has been created to execute prewarming tasks. Reusing the same thread pool wasn't a good idea as it can lead to deadlock situations. One of these situation arose in a test failure where the thread pool was full of prewarming tasks, all waiting for a cache file to be accessible, while the cache file was being evicted by the cache service. But such an eviction can only be processed when all read/write operations on the cache file are completed and in this case the deadlock occurred because the cache file was actively being read by a concurrent search which also won the privilege to write the range of bytes in cache... and this writing operation could never have been completed because of the prewarming tasks making no progress and filling up the thread pool. This commit renames the searchable_snapshots thread pool to searchable_snapshots_cache_fetch_async. Assertions are added to assert that cache writes are executed using this thread pool and to assert that read on cached index inputs are executed using a different thread pool to avoid potential deadlock situations. This commit also adds a searchable_snapshots_cache_prewarming that is used to execute prewarming tasks. It also converts the existing cache prewarming test into a more complte integration test that creates multiple searchable snapshot indices concurrently with randomized thread pool sizes, and verifies that all files have been correctly prewarmed.
tlrx
added a commit
that referenced
this pull request
Jul 15, 2020
…59313) (#59595) Since #58728 writing operations on searchable snapshot directory cache files are executed in an asynchronous manner using a dedicated thread pool. The thread pool used is searchable_snapshots which has been created to execute prewarming tasks. Reusing the same thread pool wasn't a good idea as it can lead to deadlock situations. One of these situation arose in a test failure where the thread pool was full of prewarming tasks, all waiting for a cache file to be accessible, while the cache file was being evicted by the cache service. But such an eviction can only be processed when all read/write operations on the cache file are completed and in this case the deadlock occurred because the cache file was actively being read by a concurrent search which also won the privilege to write the range of bytes in cache... and this writing operation could never have been completed because of the prewarming tasks making no progress and filling up the thread pool. This commit renames the searchable_snapshots thread pool to searchable_snapshots_cache_fetch_async. Assertions are added to assert that cache writes are executed using this thread pool and to assert that read on cached index inputs are executed using a different thread pool to avoid potential deadlock situations. This commit also adds a searchable_snapshots_cache_prewarming that is used to execute prewarming tasks. It also converts the existing cache prewarming test into a more complte integration test that creates multiple searchable snapshot indices concurrently with randomized thread pool sizes, and verifies that all files have been correctly prewarmed.
malpani
pushed a commit
to malpani/elasticsearch
that referenced
this pull request
Jul 17, 2020
…lastic#59313) Since elastic#58728 writing operations on searchable snapshot directory cache files are executed in an asynchronous manner using a dedicated thread pool. The thread pool used is searchable_snapshots which has been created to execute prewarming tasks. Reusing the same thread pool wasn't a good idea as it can lead to deadlock situations. One of these situation arose in a test failure where the thread pool was full of prewarming tasks, all waiting for a cache file to be accessible, while the cache file was being evicted by the cache service. But such an eviction can only be processed when all read/write operations on the cache file are completed and in this case the deadlock occurred because the cache file was actively being read by a concurrent search which also won the privilege to write the range of bytes in cache... and this writing operation could never have been completed because of the prewarming tasks making no progress and filling up the thread pool. This commit renames the searchable_snapshots thread pool to searchable_snapshots_cache_fetch_async. Assertions are added to assert that cache writes are executed using this thread pool and to assert that read on cached index inputs are executed using a different thread pool to avoid potential deadlock situations. This commit also adds a searchable_snapshots_cache_prewarming that is used to execute prewarming tasks. It also converts the existing cache prewarming test into a more complte integration test that creates multiple searchable snapshot indices concurrently with randomized thread pool sizes, and verifies that all files have been correctly prewarmed.
henningandersen
added a commit
to henningandersen/elasticsearch
that referenced
this pull request
Oct 20, 2020
The number of prewarming workers to spawn was taken from the wrong thread pool info, fixed to size number of workers after the prewarming thread pool. Relates elastic#59313
henningandersen
added a commit
to henningandersen/elasticsearch
that referenced
this pull request
Oct 20, 2020
The number of prewarming workers to spawn was taken from the wrong thread pool info, fixed to size number of workers after the prewarming thread pool. Relates elastic#59313
henningandersen
added a commit
that referenced
this pull request
Oct 22, 2020
The number of prewarming workers to spawn was taken from the wrong thread pool info, fixed to size number of workers after the prewarming thread pool. Relates #59313
henningandersen
added a commit
that referenced
this pull request
Oct 22, 2020
The number of prewarming workers to spawn was taken from the wrong thread pool info, fixed to size number of workers after the prewarming thread pool. Relates #59313
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
:Distributed/Snapshot/Restore
Anything directly related to the `_snapshot/*` APIs
>enhancement
v7.8.2
v7.9.0
v8.0.0-alpha1
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Since #58728 writing operations on searchable snapshot directory cache files are executed in an asynchronous manner using a dedicated thread pool. The thread pool used is
searchable_snapshots
which has been created to execute prewarming tasks.Reusing the same thread pool wasn't a good idea as it can lead to deadlock situations. One of these situation arose in a test failure where the thread pool was full of prewarming tasks, all waiting for a cache file to be accessible, while the cache file was being evicted by the cache service. But such an eviction can only be processed when all read/write operations on the cache file are completed and in this case the deadlock occurred because the cache file was actively being read by a concurrent search which also won the privilege to write the range of bytes in cache... and this writing operation could never have been completed
because of the prewarming tasks making no progress and filling up the thread pool.
This pull request renames the
searchable_snapshots
thread pool tosearchable_snapshots_cache_fetch_async
. Assertions are added to assert that cache writes are executed using this thread pool and to assert that read on cached index inputs are executed using a different thread pool to avoid potential deadlock situations.This pull request also adds a
searchable_snapshots_cache_prewarming
that is used to execute prewarming tasks. It also converts the existing cache prewarming test into a more complte integration test that creates multiple searchable snapshot indices concurrently with randomized thread pool sizes, and verifies that all files have been correctly prewarmed.After this pull request other improvements could be done: