Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Audit RumorStore locking #6674

Merged
merged 21 commits into from Aug 9, 2019

Conversation

@baumanj
Copy link
Contributor

commented Jun 20, 2019

Revamp the locking in RumorStore to prevent deadlocks.

There's quite a bit here; reading the individual commit messages should make motivations fairly clear. I'd also suggest viewing the diff with whitespace changes hidden. The major points:

  • RumorStore now uses habitat_common::sync::Lock, which is a facade for parking_lot that allows us to switch between RwLock and Mutex via a feature flag to help test for recursive locking.
  • Make RumorStore lock fields private to keep management of locking code tractable.
  • Add IterableGuard and ServiceGroupRumors so consumers can use for loops and iterator adapters safely on the data behind the RumorStore lock.
  • Annotate all the callers with rsr/rsw to indicate what code paths take RumorStore locks all the way up the stack.
  • Document our approach to locking and avoiding deadlock.
  • Various cleanups, removals and simplification.

See #6435


This change is Reviewable

@baumanj baumanj added the X-fix label Jun 20, 2019
@baumanj baumanj self-assigned this Jun 20, 2019
@baumanj baumanj force-pushed the audit-rumorstore-locking branch from bced13e to e2747b2 Jun 20, 2019
@baumanj baumanj referenced this pull request Jun 25, 2019
0 of 3 tasks complete
@baumanj baumanj force-pushed the audit-rumorstore-locking branch from e2747b2 to a5e4ab2 Jun 25, 2019
@baumanj baumanj force-pushed the audit-rumorstore-locking branch from e997c89 to b0a331f Jul 3, 2019
@baumanj baumanj force-pushed the audit-rumorstore-locking branch from f352197 to 84f63ed Jul 17, 2019
baumanj added 13 commits Jun 20, 2019
Slight differences in the API vs std::sync::RwLock require changes to
various callers.

Signed-off-by: Jon Bauman <5906042+baumanj@users.noreply.github.com>
Also, add a T: Default bound for Lock<T>

Signed-off-by: Jon Bauman <5906042+baumanj@users.noreply.github.com>
Also remove superfluous Deref impl, provide a get_rumor_cloned
interface to replace public access to the list field and add some
explanatory type aliases.

Signed-off-by: Jon Bauman <5906042+baumanj@users.noreply.github.com>
- Move RumorStore into storage module to hide locking details from
  submodules (but add `pub use` to not break references)

- Implement IterableGuard to provide encapsulated, but ergonomic access
  to RumorStore::list.

- Start converting direct accesses to RumorStore::list to annotated
  methods and propagating annotations to callers.

Signed-off-by: Jon Bauman <5906042+baumanj@users.noreply.github.com>
Also, do some minor refactoring and cleanup:

- implement Deref for IterableGuard so operations on the underlying map
  can take place inside iteration.

- Make RumorStore::list private again

- Remove get_rumor_cloned in favor of map_rumor_rsr: this lets the
  caller decide whether cloning is necessary.

- Remove RumorStore::{new, clear, len_for_key}, which were unused

- Update RumorStore::encode to use map_rumor_rsr and avoid clone and
  test coverage

Signed-off-by: Jon Bauman <5906042+baumanj@users.noreply.github.com>
RumorStore::with_keys is no longer necessary, since the addition of
Iterable guard allows more ergonomic for loops to be used instead.
In general:

    rs.with_keys(|(service_group, rumors)| { ... });

can be replaced with

	for (service_group, rumors) in rs.lock_rsr().iter() { ... }

Additionally, the flexibility of Iterator, allows for the use of all
the iterator adapters, so paradigms like:

    let mut v = vec![];
    for (service_group, rumors) in rs.lock_rsr().iter() { ... }
    v

can be replaced with more functional equivalents:

    rs.lock_rsr().iter().filter_map(|(service_group, rumors)| { ... })

This change just does the former, to minimize the potential for
changing behavior.

Signed-off-by: Jon Bauman <5906042+baumanj@users.noreply.github.com>
RumorStore::with_rumors is no longer necessary since it can be replaced
with a more ergonomic iterator-based approach:

    rumor_store.with_rumors(key, |s| { ... });

becomes

	for s in rumor_store.lock_rsr().service_group(key).rumors() { ... }

As with 13e85e8, most callers could be
converted to a more function approach with iterator adapters, but that
is left for future work.

Additionally, add the ServiceGroupRumors type to support iteration over
a nested HashMap behind a lock, make IterableGuard::read private, and
remove IterableGuard::values.

Signed-off-by: Jon Bauman <5906042+baumanj@users.noreply.github.com>
This simplifies the locking story by moving towards having as much as
possible of the locking occur when acquiring IterableGuard.

Also, add IterableGuard::get_term for (only for ElectionRumors).

Signed-off-by: Jon Bauman <5906042+baumanj@users.noreply.github.com>
It can be replaced (and with more flexibility) by
`ServiceGroupRumors::map_rumor`.

This is also a win to get rid of a public method that was only used in
test code, but couldn't be removed from release builds because some
of that test code was integration tests.

Signed-off-by: Jon Bauman <5906042+baumanj@users.noreply.github.com>
Specifically `ServiceGroupRumors` and `IterableGuard`. This makes the
callers have to go through `RumorStore`'s lock_rsr to make the locking
more explicit.

Signed-off-by: Jon Bauman <5906042+baumanj@users.noreply.github.com>
Signed-off-by: Jon Bauman <5906042+baumanj@users.noreply.github.com>
This is more consistent and should reduce confusion with
`IterableGuard::contains_rumor`.

Signed-off-by: Jon Bauman <5906042+baumanj@users.noreply.github.com>
This further centralizes the locking behing `RumorStore::lock_rsr`

Signed-off-by: Jon Bauman <5906042+baumanj@users.noreply.github.com>
@baumanj baumanj force-pushed the audit-rumorstore-locking branch from 4591c31 to 473aba4 Aug 1, 2019
baumanj added 2 commits Aug 1, 2019
- Remove unnecessary `Rumor` trait bounds and separate into impl blocks
  for methods which require the bound and those which don't.

 - Make increment_update_counter private (and move update_counter test)

 - Change type variable to R when bound to Rumor and T when unbound

 - Remove RumorStore<Service> impl block in favor of adding logic to
   server.rs

 - Replace FakeRumor with () in tests that don't need a Rumor bound

 - Remove create_rumor_store in favor of RumorStore::default

Signed-off-by: Jon Bauman <5906042+baumanj@users.noreply.github.com>
Since RumorStore locks must be taken before MemberList locks, change
the ml[rw]_rs[rw] suffixes to rs[rw]_ml[rw].

Signed-off-by: Jon Bauman <5906042+baumanj@users.noreply.github.com>
@baumanj baumanj force-pushed the audit-rumorstore-locking branch from 534c24c to 19b4313 Aug 2, 2019
Relatedly, simplify doc comments related to locking in favor of a
reference to the new locking.md.

Signed-off-by: Jon Bauman <5906042+baumanj@users.noreply.github.com>
@baumanj baumanj force-pushed the audit-rumorstore-locking branch from 19b4313 to bc69789 Aug 2, 2019
Signed-off-by: Jon Bauman <5906042+baumanj@users.noreply.github.com>
@baumanj baumanj marked this pull request as ready for review Aug 2, 2019
/// Storage for Rumors. It takes a rumor and stores it according to the member that produced it,
/// and the service group it is related to.
///
/// Generic over the type of rumor it stores.
#[derive(Debug, Clone)]
pub struct RumorStore<T: Rumor> {
pub list: Arc<Lock<HashMap<String, HashMap<String, T>>>>,
list: Arc<Lock<HashMap<ServiceGroupName, HashMap<MemberId, T>>>>,

This comment has been minimized.

Copy link
@raskchanky

raskchanky Aug 5, 2019

Member

I don't think this is accurate. I can see by checking the final diff that this particular line of code didn't stay this way, but technically the RumorStore models things as HashMap<Rumor::key(), HashMap<Rumor::id(), Rumor>>. For the Service rumor, e.g. this formulation is correct because Rumor::key() returns the service group of the service and Rumor::id() returns the member ID.

For Election rumors, however, Rumor::id() returns the hardcoded string election, ServiceConfig returns the hardcoded string service_config. For Departure rumors, Rumor::key() returns the hardcoded string departure, not the service group, and Rumor::id() returns the member ID. There's quite a lot of variability.

As I said, I realize this code changed from what it is here, on this particular commit, to the end state, so I'll re-evaluate as I go.

This comment has been minimized.

Copy link
@baumanj

baumanj Aug 5, 2019

Author Contributor

I think you're right about the naming. What would be preferable is not having every RumorStore be a two-level map that takes two keys (with inconsistent naming), and instead define it like:

pub struct RumorStore<S> {
    pub list:       Arc<RwLock<HashMap<String, S>>>,
    update_counter: Arc<AtomicUsize>,
}

Then, according to the needs of the particular rumor, S can be the rumor, or else another HashMap. I've hacked around with this a bit, and it's definitely possible, but I'm not sure if it's worth the effort involved.

Assuming we don't go that route, would you suggest naming these aliases differently, or getting rid of them altogether?

This comment has been minimized.

Copy link
@raskchanky

raskchanky Aug 5, 2019

Member

I like your formulation with the single level map, but I agree that the effort involved there would be non-trivial.

As far as the naming goes, my gut reaction is to name it something like:

HashMap<RumorKey, HashMap<RumorId, T>>

only because those names map naturally to what's actually inserted into the HashMap.

{
rumor.clone().incarnation + 1
rumor.incarnation + 1

This comment has been minimized.

Copy link
@raskchanky

raskchanky Aug 5, 2019

Member

This is a really nice simplification.

for rumor in member.values() {
total += self.write_rumor(writer, rumor)?;
}
for rumor in store.lock_rsr().rumors() {

This comment has been minimized.

Copy link
@raskchanky

raskchanky Aug 5, 2019

Member

It wasn't immediately obvious when reading the definition for rumors() above, but seeing this in action makes me realize how much more ergonomic this is. This is a great improvement.

self.map_rumor_rsr(key, member_id, T::write_to_bytes)
.unwrap_or_else(|| {
Err(Error::NonExistentRumor(String::from(member_id), String::from(key)))
})

This comment has been minimized.

Copy link
@raskchanky

raskchanky Aug 5, 2019

Member

This is a great simplification that I can take advantage of in my rumor work as well (I implemented something very similar).

type ServiceGroupName = String;
type MemberId = String;
type ServiceGroupRumorMap<T> = HashMap<MemberId, T>;
type RumorMap<T> = HashMap<ServiceGroupName, ServiceGroupRumorMap<T>>;

This comment has been minimized.

Copy link
@raskchanky

raskchanky Aug 6, 2019

Member

As I said in a comment on a previous commit, I think these names are misleading. Something like the following would be more accurate:

type RumorKey = String;
type RumorId = String;
type RumorKeyRumorMap<T> = HashMap<RumorId, T>;
type RumorMap<T> = HashMap<RumorKey, RumorKeyRumorMap>;

I'm not stuck on those particular names, per se, only something that illustrates that the top level key is the result of calling Rumor::key(), the actual data of which can vary from rumor to rumor, and the second level key inside the nested map is the result of calling Rumor::id() (again, the actual data of which can vary).

This comment has been minimized.

Copy link
@baumanj

baumanj Aug 6, 2019

Author Contributor

Agreed. I'm not wedded to this naming; I just wanted some simplification and indication of intent over Arc<RwLock<HashMap<String, HashMap<String, T>>>>

This comment has been minimized.

Copy link
@baumanj

baumanj Aug 6, 2019

Author Contributor

We can't use RumorKey since that would conflict with the existing type (and putting the alias in a submodule would just be confusing), so I used the best names I could think of and integrated them into the RumorKey type as well. Let me know how you feel about 0aa2082.

This comment has been minimized.

Copy link
@raskchanky

raskchanky Aug 6, 2019

Member

Of course you're right about RumorKey. I have no idea how I missed that. I think your change looks great.

This comment has been minimized.

Copy link
@baumanj

baumanj Aug 6, 2019

Author Contributor

Actually one more thing:

It was a mistake to make RumorStore::key() return RumorKeyKey. But I realize I mad that mistake because RumorStore::key() is a misleading name for this method. I've switched it to use to_string in 038a472.

Copy link
Member

left a comment

On the whole, I think this looks great. I left one comment around the naming of some rumor type aliases that I would like to see addressed, but once that's done, I'm happy to approve.

@davidMcneil

This comment has been minimized.

Copy link
Contributor

commented Aug 6, 2019

I am not familiar enough with the code to give this official approval, but everything looked good to me. I like the idea of design documents. The locking one was very helpful.

Per PR feedback, make the names more general to reflect the varied
semantics across different rumors types. Additionally, use the type
aliases in RumorKey.

Signed-off-by: Jon Bauman <5906042+baumanj@users.noreply.github.com>
Copy link
Member

left a comment

Nicely done!

@raskchanky

This comment has been minimized.

Copy link
Member

commented Aug 6, 2019

The former is terribly misleading, since it's different from the `key`
field of the struct. Per guidelines, we implement `Display` in order to
get `to_string` rather than implementing the ToString trait directly.

See http://doc.rust-lang.org/std/string/trait.ToString.html

Signed-off-by: Jon Bauman <5906042+baumanj@users.noreply.github.com>
Copy link
Contributor

left a comment

Reviewed 24 of 27 files at r1, 1 of 2 files at r2, 3 of 3 files at r3.
Reviewable status: all files reviewed, 8 unresolved discussions (waiting on @baumanj and @raskchanky)


components/butterfly/src/rumor.rs, line 135 at r3 (raw file):

/// To keep the details of the locking from being directly accessible to all the code in the
/// rumor submodule.

Nice use of modules to hide stuff 👍


components/butterfly/src/rumor.rs, line 148 at r3 (raw file):

    /// Provides access to the rumors for a particular service group bounded by the
    /// lifetime of the service group key.
    pub struct ServiceGroupRumors<'sg_key, R>(Option<&'sg_key RumorSubMap<R>>);

I like the descriptive lifetime names 👍


components/butterfly/tests/encryption/mod.rs, line 17 at r3 (raw file):

    assert!(net[1].service_store
                  .lock_rsr()
                  .service_group("beast.prod",)

The trailing comma can be removed.


components/butterfly/tests/rumor/departure.rs, line 13 at r3 (raw file):

    assert!(net[1].departure_store
                  .lock_rsr()
                  .service_group("departure",)

The service group's name is "departure"?


components/butterfly/tests/rumor/departure.rs, line 32 at r3 (raw file):

    assert!(net[2].departure_store
                  .lock_rsr()
                  .service_group("departure",)

Same


components/sup/src/manager.rs, line 1195 at r3 (raw file):

                              .service_group(&service.service_group)
                              .map_rumor(&self.sys.member_id, |rumor| rumor.incarnation + 1)
                              .unwrap_or(1);

Nice way to eliminate the clone call 😄


docs/dev/design_documents/locking.md, line 12 at r3 (raw file):

comment. Additionally, a suffix is applied to the function name to indicate
the locks that are acquired either directly or indirectly. The suffix is an
abbreviation of the lock name (e.g., `ml` for `MemberList::entries`) and an

Is there a place where all the currently-used lock name abbreviations are listed?

- Add Const{Key,Id}Rumor traits to replace the use of string literals

- Implement IterableGuard::contains_id for ConstKeyRumor to eliminate the
  nonsensical `service_group()` calls

- Replace identical RumorStoreProxy specializations for Election,
  ElectionUpdate and ServiceConfig with a ConstIdRumor-bound one

- Remove superfluous trailing comma

- Specifically document lock name suffixes

Signed-off-by: Jon Bauman <5906042+baumanj@users.noreply.github.com>
@baumanj

This comment has been minimized.

Copy link
Contributor Author

commented Aug 8, 2019

I think aa4faaf should address all your feedback, @christophermaier.

Copy link
Contributor

left a comment

Tiny change, but otherwise 💯 👍

impl<'a, C: ConstKeyRumor> IterableGuard<'a, RumorMap<C>> {
pub fn contains_id(&self, member_id: &str) -> bool {
self.get(C::const_key())
.map(|departures| departures.contains_key(member_id))

This comment has been minimized.

Copy link
@christophermaier

christophermaier Aug 8, 2019

Contributor

Should this be called rumors, rather than departures, since this isn't specifically about Departure rumors?

This comment has been minimized.

Copy link
@baumanj

baumanj Aug 8, 2019

Author Contributor

Good catch! I initially specialized it for Departures, but now that it's using the ConstKeyRumor bound, that makes more sense. Fixed in 55254f0.

This comment has been minimized.

Copy link
@christophermaier

christophermaier Aug 8, 2019

Contributor

I figured that's what happened 😄

Signed-off-by: Jon Bauman <5906042+baumanj@users.noreply.github.com>
@baumanj baumanj merged commit a972b61 into master Aug 9, 2019
5 checks passed
5 checks passed
DCO This commit has a DCO Signed-off-by
Details
buildkite/habitat-sh-habitat-master-verify Build #2991 passed (33 minutes, 10 seconds)
Details
buildkite/habitat-sh-habitat-master-website Build #120 passed (38 seconds)
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
expeditor/config-validation Validated your Expeditor config file
Details
@baumanj baumanj deleted the audit-rumorstore-locking branch Aug 9, 2019
@baumanj baumanj referenced this pull request Aug 9, 2019
10 of 10 tasks complete
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.