New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
authorize: refactor store locking #2151
Conversation
Code Climate has analyzed commit e0933b5 and detected 0 issues on this pull request. View more on Code Climate. |
Codecov Report
@@ Coverage Diff @@
## master #2151 +/- ##
========================================
- Coverage 60.3% 60.2% -0.2%
========================================
Files 167 167
Lines 11375 11385 +10
========================================
- Hits 6869 6859 -10
- Misses 3716 3735 +19
- Partials 790 791 +1
|
func (syncer *dataBrokerSyncer) ClearRecords(ctx context.Context) { | ||
syncer.authorize.stateLock.Lock() | ||
syncer.authorize.store.ClearRecords() | ||
syncer.authorize.stateLock.Unlock() | ||
} | ||
|
||
func (syncer *dataBrokerSyncer) UpdateRecords(ctx context.Context, serverVersion uint64, records []*databroker.Record) { | ||
syncer.authorize.stateLock.Lock() | ||
for _, record := range records { | ||
syncer.authorize.store.UpdateRecord(serverVersion, record) | ||
} | ||
syncer.authorize.stateLock.Unlock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not expand the interface of store
and have UpdateRecords()
method there - then you do not seem to need this extra stateLock
mutex and you won't have to acquire the other mutex inside dataBrokerData
for every UpdateRecord()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we remove the state lock there's no guarantee we won't upgrade records in the middle of an evaluation. You would then have an inconsistent view of the data. For example you would get record version 1234 in the audit log, but will have actually used data from version 1235 in the evaluation.
The mutex around map access is necessary because maps are not thread safe in Go. If one goroutine reads a map while another writes it it will cause the program to crash.
If having two locks in unacceptable what should I do instead to preserve thread safety and consistency?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I missed that you also acquire RLock in Check()
; I see why you do need that extra mutex now.
my comment re acquiring mutex in a loop
for _, record := range records {
syncer.authorize.store.UpdateRecord(serverVersion, record)
}
was about acquiring mutex on each UpdateRecord
call, while you probably can change the interface of it to accept multiple records, and change underlying dataBrokerData
funcs to setLocked
and deleteLocked
so that you could acquire mutex once outside the loop. but it's not too important probably in terms of optimization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This violates separation of concerns. The reason I created this type was to avoid having to think about the lock from the store, since locks are easy to misuse. If you're really concerned about performance I can make the change, but I think its negligible. FWIW during sync the method is only called with a single record.
Was running some load against this branch and got a panic after some time:
|
Based on the line number, this is a log message and appears unrelated. I'll fix it though. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haven't been able to reproduce the deadlock on this branch. 👍
LGTM unless there's other feedback from @wasaga.
Summary
It looks like the way we're doing locking with the OPA evaluator store is causing deadlocks. I misunderstood how the
RWMutex
works, and it turns out it can't reliably be used recursively:I believe the deadlock happens like this:
RLock
Lock
RLock
Step (2) is blocked on (1) and step (3) is blocked on (2). But step (1) is blocked on (3), because the policy evaluation can't be completed until we get the record. This PR removes taking a read lock in the transaction, and instead only uses the lock for map access (which is very short and non-recursive).
The original purpose of the transaction lock was to prevent an update while we were evaluating a policy. This is so that the version numbers we get back from evaluation (server version / record version) reflect the data that was actually used during evaluation. So to preserve this behavior I introduce another lock in the authorize service itself. The syncer's update and evaluator will never run simultaneously, and since they don't directly depend on each other, we should have no issues with deadlocks.
Checklist
improvement
/bug
/ etc)