Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Guard max_latency_ with latencies_mutex_ #878

Merged
merged 3 commits into from
Mar 16, 2021
Merged

Guard max_latency_ with latencies_mutex_ #878

merged 3 commits into from
Mar 16, 2021

Conversation

tjablin
Copy link
Contributor

@tjablin tjablin commented Mar 16, 2021

There was a race in the computation of atomic max and std::memory_order_release
is not a valid order for atomic loads. The code isn't performance critical, so
it is easier and safer to use mutexes instead.

There was a race in the computation of atomic max and std::memory_order_release
is not a valid order for atomic loads. The code isn't performance critical, so
it is easier and safer to use mutexes instead.
@github-actions
Copy link

github-actions bot commented Mar 16, 2021

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

@@ -372,7 +372,7 @@ class AsyncLog {
std::condition_variable all_latencies_recorded_;
uint64_t latencies_first_sample_sequence_id_ = 0;
std::vector<QuerySampleLatency> latencies_;
std::atomic<QuerySampleLatency> max_latency_{0};
QuerySampleLatency max_latency_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we initialize this to 0?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -421,7 +417,8 @@ PerfClock::time_point AsyncLog::GetMaxCompletionTime() {
}

QuerySampleLatency AsyncLog::GetMaxLatencySoFar() {
return max_latency_.load(std::memory_order_release);
std::unique_lock<std::mutex> lock(latencies_mutex_);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious: why does this need to hold mutex?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If GetMaxLatencySoFar were called in one thread and RecordSampleCompletion were called concurrently in another thread, there would be a race. Looking at the surrounding code, I don't think that is possible, but if a variable is guarded by a mutex in one context, I think it ought to have an equivalent guard in all contexts to be safe. This function isn't performance critical, so I'm not worried about grabbing a lock here, particularly since I am pretty certain there will be no contention.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense

@guschmue guschmue merged commit 63c7df5 into mlcommons:master Mar 16, 2021
@github-actions github-actions bot locked and limited conversation to collaborators Mar 16, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants