-
Notifications
You must be signed in to change notification settings - Fork 6.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix a TSAN-reported bug caused by concurrent accesss to std::deque #9686
Conversation
This pull request was exported from Phabricator. Differential Revision: D34780309 |
db/db_impl/db_impl_write.cc
Outdated
alive_log_files_.back().AddSize(log_entry.size()); | ||
} else { | ||
assert(!two_write_queues_); | ||
InstrumentedMutexLock wl(&mutex_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems likely to be expensive on a hot path (every WAL write). Can we save alive_log_files_.back()
to another field each time we push_back()
? You could probably even create a custom wrapper class that handles that, though that's arguably more complex than this direct approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea of caching sounds good, and it is worth implementing for non-two-write-queues case. For two-write-queues, the alive_log_files_
is already protected by log_mutex_
.
If we cache alive_log_files_.back()
by value, then we still need to sync the cached copy with alive_log_files_.back()
at a later point when we need to acquire the mutex. We cannot directly cache &alive_log_files_.back()
by address because it can be copied when deque grows and shrinks.
If we do caching, then I would vote for changing the type of alive_log_files_
from std::deque<LogFiileNumberSize>
to std:deque<std::unique_ptr<LogFileNumberSize>>
, caching the last element via a member of type LogFileNumberSize* alive_log_files_tail_copy_
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We cannot directly cache &alive_log_files_.back() by address because it can be copied when deque grows and shrinks.
Yes we can. A primary feature of deque is that you can save and access through the reference (pointer).
https://en.cppreference.com/w/cpp/container/deque: In addition, insertion and deletion at either end of a deque never invalidates pointers or references to the rest of the elements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, then we can avoid calling back()
here
This pull request was exported from Phabricator. Differential Revision: D34780309 |
803da10
to
fbb19b8
Compare
This pull request was exported from Phabricator. Differential Revision: D34780309 |
fbb19b8
to
e0c1614
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally LGTM
// log_mutex_. If we have acquired at least one of the two mutexes, then we | ||
// can check that alive_log_files_tail_ != alive_log_files_.rend() | ||
if (with_db_mutex || with_log_mutex) { | ||
assert(alive_log_files_tail_ == alive_log_files_.rbegin()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comments mention rend(). Is that what you meant to check here (to make sure modifying *alive_log_files_tail_
is safe)?
Would it make sense to check alive_log_files_tail_ == alive_log_files_.rbegin()
without an if
but with #if
not TSAN?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SG. Will update to something like
#if TSAN
if (with_db_mutex || with_log_mutex) {
#endif
assert(alive_log_files_tail_ is valid);
#if TSAN
}
#endif
e0c1614
to
c5122ea
Compare
This pull request was exported from Phabricator. Differential Revision: D34780309 |
…acebook#9686) Summary: Pull Request resolved: facebook#9686 According to https://www.cplusplus.com/reference/deque/deque/back/, " The container is accessed (neither the const nor the non-const versions modify the container). The last element is potentially accessed or modified by the caller. Concurrently accessing or modifying other elements is safe. " Also according to https://www.cplusplus.com/reference/deque/deque/pop_front/, " The container is modified. The first element is modified. Concurrently accessing or modifying other elements is safe (although see iterator validity above). " In RocksDB, we never pop the last element of `DBImpl::alive_log_files_`. We have been exploiting this fact and the above two properties when ensuring correctness when `DBImpl::alive_log_files_` may be accessed concurrently. Specifically, it can be accessed in the write path when db mutex is released. Sometimes, the log_mute_ is held. It can also be accessed in `FindObsoleteFiles()` when db mutex is always held. It can also be accessed during recovery when db mutex is also held. Given the fact that we never pop the last element of alive_log_files_, we currently do not acquire additional locks when accessing it in `WriteToWAL()` as follows ``` alive_log_files_.back().AddSize(log_entry.size()); ``` This is problematic. Check source code of deque.h ``` back() _GLIBCXX_NOEXCEPT { __glibcxx_requires_nonempty(); ... } pop_front() _GLIBCXX_NOEXCEPT { ... if (this->_M_impl._M_start._M_cur != this->_M_impl._M_start._M_last - 1) { ... ++this->_M_impl._M_start._M_cur; } ... } ``` `back()` will actually call `__glibcxx_requires_nonempty()` first. If `__glibcxx_requires_nonempty()` is enabled and not an empty macro, it will call `empty()` ``` bool empty() { return this->_M_impl._M_finish == this->_M_impl._M_start; } ``` You can see that it will access `this->_M_impl._M_start`, racing with `pop_front()`. Therefore, TSAN will actually catch the bug in this case. To be able to use TSAN on our library and unit tests, we should always coordinate concurrent accesses to STL containers properly. We need to pass information about db mutex and log mutex into `WriteToWAL()`, otherwise it's impossible to know which mutex to acquire inside the function. To fix this, we can catch the tail of `alive_log_files_` by reference, so that we do not have to call `back()` in `WriteToWAL()`. Reviewed By: pdillinger Differential Revision: D34780309 fbshipit-source-id: 47db7b044a19d3f1f40ece1a974e30e52a777c0b
This pull request was exported from Phabricator. Differential Revision: D34780309 |
c5122ea
to
f74de1f
Compare
…9686) Summary: Pull Request resolved: #9686 According to https://www.cplusplus.com/reference/deque/deque/back/, " The container is accessed (neither the const nor the non-const versions modify the container). The last element is potentially accessed or modified by the caller. Concurrently accessing or modifying other elements is safe. " Also according to https://www.cplusplus.com/reference/deque/deque/pop_front/, " The container is modified. The first element is modified. Concurrently accessing or modifying other elements is safe (although see iterator validity above). " In RocksDB, we never pop the last element of `DBImpl::alive_log_files_`. We have been exploiting this fact and the above two properties when ensuring correctness when `DBImpl::alive_log_files_` may be accessed concurrently. Specifically, it can be accessed in the write path when db mutex is released. Sometimes, the log_mute_ is held. It can also be accessed in `FindObsoleteFiles()` when db mutex is always held. It can also be accessed during recovery when db mutex is also held. Given the fact that we never pop the last element of alive_log_files_, we currently do not acquire additional locks when accessing it in `WriteToWAL()` as follows ``` alive_log_files_.back().AddSize(log_entry.size()); ``` This is problematic. Check source code of deque.h ``` back() _GLIBCXX_NOEXCEPT { __glibcxx_requires_nonempty(); ... } pop_front() _GLIBCXX_NOEXCEPT { ... if (this->_M_impl._M_start._M_cur != this->_M_impl._M_start._M_last - 1) { ... ++this->_M_impl._M_start._M_cur; } ... } ``` `back()` will actually call `__glibcxx_requires_nonempty()` first. If `__glibcxx_requires_nonempty()` is enabled and not an empty macro, it will call `empty()` ``` bool empty() { return this->_M_impl._M_finish == this->_M_impl._M_start; } ``` You can see that it will access `this->_M_impl._M_start`, racing with `pop_front()`. Therefore, TSAN will actually catch the bug in this case. To be able to use TSAN on our library and unit tests, we should always coordinate concurrent accesses to STL containers properly. We need to pass information about db mutex and log mutex into `WriteToWAL()`, otherwise it's impossible to know which mutex to acquire inside the function. To fix this, we can catch the tail of `alive_log_files_` by reference, so that we do not have to call `back()` in `WriteToWAL()`. Reviewed By: pdillinger Differential Revision: D34780309 fbshipit-source-id: 1def9821f0c437f2736c6a26445d75890377889b
…9686) Summary: Pull Request resolved: #9686 According to https://www.cplusplus.com/reference/deque/deque/back/, " The container is accessed (neither the const nor the non-const versions modify the container). The last element is potentially accessed or modified by the caller. Concurrently accessing or modifying other elements is safe. " Also according to https://www.cplusplus.com/reference/deque/deque/pop_front/, " The container is modified. The first element is modified. Concurrently accessing or modifying other elements is safe (although see iterator validity above). " In RocksDB, we never pop the last element of `DBImpl::alive_log_files_`. We have been exploiting this fact and the above two properties when ensuring correctness when `DBImpl::alive_log_files_` may be accessed concurrently. Specifically, it can be accessed in the write path when db mutex is released. Sometimes, the log_mute_ is held. It can also be accessed in `FindObsoleteFiles()` when db mutex is always held. It can also be accessed during recovery when db mutex is also held. Given the fact that we never pop the last element of alive_log_files_, we currently do not acquire additional locks when accessing it in `WriteToWAL()` as follows ``` alive_log_files_.back().AddSize(log_entry.size()); ``` This is problematic. Check source code of deque.h ``` back() _GLIBCXX_NOEXCEPT { __glibcxx_requires_nonempty(); ... } pop_front() _GLIBCXX_NOEXCEPT { ... if (this->_M_impl._M_start._M_cur != this->_M_impl._M_start._M_last - 1) { ... ++this->_M_impl._M_start._M_cur; } ... } ``` `back()` will actually call `__glibcxx_requires_nonempty()` first. If `__glibcxx_requires_nonempty()` is enabled and not an empty macro, it will call `empty()` ``` bool empty() { return this->_M_impl._M_finish == this->_M_impl._M_start; } ``` You can see that it will access `this->_M_impl._M_start`, racing with `pop_front()`. Therefore, TSAN will actually catch the bug in this case. To be able to use TSAN on our library and unit tests, we should always coordinate concurrent accesses to STL containers properly. We need to pass information about db mutex and log mutex into `WriteToWAL()`, otherwise it's impossible to know which mutex to acquire inside the function. To fix this, we can catch the tail of `alive_log_files_` by reference, so that we do not have to call `back()` in `WriteToWAL()`. Reviewed By: pdillinger Differential Revision: D34780309 fbshipit-source-id: 1def9821f0c437f2736c6a26445d75890377889b
Summary:
According to https://www.cplusplus.com/reference/deque/deque/back/,
"
The container is accessed (neither the const nor the non-const versions modify the container).
The last element is potentially accessed or modified by the caller. Concurrently accessing or modifying other elements is safe.
"
Also according to https://www.cplusplus.com/reference/deque/deque/pop_front/,
"
The container is modified.
The first element is modified. Concurrently accessing or modifying other elements is safe (although see iterator validity above).
"
In RocksDB, we never pop the last element of
DBImpl::alive_log_files_
. We have beenexploiting this fact and the above two properties when ensuring correctness when
DBImpl::alive_log_files_
may be accessed concurrently. Specifically, it can be accessedin the write path when db mutex is released. Sometimes, the log_mute_ is held. It can also be accessed in
FindObsoleteFiles()
when db mutex is always held. It can also be accessed
during recovery when db mutex is also held.
Given the fact that we never pop the last element of alive_log_files_, we currently do not
acquire additional locks when accessing it in
WriteToWAL()
as followsThis is problematic.
Check source code of deque.h
back()
will actually call__glibcxx_requires_nonempty()
first.If
__glibcxx_requires_nonempty()
is enabled and not an empty macro,it will call
empty()
You can see that it will access
this->_M_impl._M_start
, racing withpop_front()
.Therefore, TSAN will actually catch the bug in this case.
To be able to use TSAN on our library and unit tests, we should always coordinate
concurrent accesses to STL containers properly.
We need to pass information about db mutex and log mutex into
WriteToWAL()
, otherwiseit's impossible to know which mutex to acquire inside the function.
Differential Revision: D34780309