Navigation Menu

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Purge "rumor heat" information for departed members #5744

Merged
merged 2 commits into from Oct 16, 2018

Conversation

christophermaier
Copy link
Contributor

This should allow us to do a degree of cleanup of old rumor
information. Absent this, large and long-running Habitat networks can
accumulate significant amounts of wasted memory on what is effectively
useless historical information.

The RumorHeat data structure effectively maps a rumor to a mapping
of members to the number of times the member has seen the rumor. This
allows us to know who's seen a rumor already, and who hasn't.

Now, when a member departs the ring, we'll remove any of the
"member-to-number-of-times-they've-seen-the-rumor" mappings for that
member.

Additionally, we'll remove all such mappings for any rumors about
services hosted on that member. After all, the member is going away,
so all of its services will too.

We aren't removing the membership rumors, though, because that's how
we pass the word that the member is departed.

Future work may add more cleanups and rearrangements.

Signed-off-by: Christopher Maier cmaier@chef.io

This should allow us to do a degree of cleanup of old rumor
information. Absent this, large and long-running Habitat networks can
accumulate significant amounts of wasted memory on what is effectively
useless historical information.

The `RumorHeat` data structure effectively maps a rumor to a mapping
of members to the number of times the member has seen the rumor. This
allows us to know who's seen a rumor already, and who hasn't.

Now, when a member departs the ring, we'll remove any of the
"member-to-number-of-times-they've-seen-the-rumor" mappings for that
member.

Additionally, we'll remove _all_ such mappings for any rumors about
_services_ hosted on that member. After all, the member is going away,
so all of its services will too.

We aren't removing the membership rumors, though, because that's how
we pass the word that the member is departed.

Future work may add more cleanups and rearrangements.

Signed-off-by: Christopher Maier <cmaier@chef.io>
@thesentinels
Copy link
Contributor

Thanks for the pull request! Here is what will happen next:

  1. Your PR will be reviewed by the maintainers
  2. If everything looks good, one of them will approve it, and your PR will be merged.

Thank you for contributing!

@christophermaier
Copy link
Contributor Author

I added the "do not merge" label while I run this through some further local testing. Pushing it out for early feedback, though.

Copy link
Contributor

@adamhjk adamhjk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens when a departed member returns? Instinctually, the system exepcts to find a key in the rumor_heat for everyone? Or does it get recreated?

Wroth adding a test for, at least.

@christophermaier
Copy link
Contributor Author

@adamhjk The rumor heat structure would get repopulated again for that member. In that regard, it'd basically behave the same as if a totally new Supervisor joined up.

Copy link
Contributor

@baumanj baumanj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks generally good to me. Make some suggestions, but nothing I feel too strongly about.

/// memory this consumes. If that member should ever come back
/// again, all rumors would be considered "hot" for them, so they
/// will get a bit more network traffic initially.
pub fn purge(&self, id: &str) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some debug!s info about how many things were removed might be nice

@@ -57,6 +57,8 @@ impl Expire {
.members_expired_to_departed(self.timing.departure_timeout_duration());

for id in newly_departed_members {
// Purge "heat" information for a member that's gone
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Between the good naming of newly_departed_members and rumor_heat.purge(&id), I honestly think this comment is superfluous.

trace_incarnation,
trace_health
);

// Purge "heat" information for a member that's gone
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, I'm not sure this comment says anything the code doesn't convey quite clearly (ditto to the several other times this same comment is used).

What is confusing to me though is why we still start_hot_rumor even if we're purging. I'm not saying this is wrong (still trying to grok this code), but it seems strange enough that a comment about that might be warranted.

trace_incarnation,
trace_health
);

// Purge "heat" information for a member that's gone
if member_id != self.member_id() && health == Health::Departed {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is almost identical to the code in insert_member. I wonder if it's possible to factor out the commonality as this sort of duplication seems prone to bugs. I don't just mean for the added code, but for the two functions more generally.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point... I'm not clear on if there's actually a meaningful distinction between these functions; the names don't give it away, at least. This would probably be good to tackle in a follow-on PR.

@@ -744,6 +757,9 @@ impl Server {
{
// TODO (CM): Why are we inferring departure from
// a service rumor?

// Purge "heat" information for a member that's gone
self.rumor_heat.purge(&service_rumor.member_id);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this and the duplication on L804 could be avoided by putting this logic inside insert_health_by_id.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd initially wanted to do that, too, but the MemberList knows nothing of the RumorHeat data structure 😦

Copy link
Contributor

@raskchanky raskchanky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still coming to grips with this part of the codebase, so take my approval with the appropriately sized grain of salt, but it all looks good to me.

// all types of rumors.
for heat in heat_map.values_mut() {
heat.remove(id);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps either quantifying or summarizing the bytes purged might be of some use here?

+use std::mem;
...
         for heat in heat_map.values_mut() {
-            heat.remove(id);
+            if let Some(r) = heat.remove(id) {
+                debug!("Purging rumor heat of size {} bytes", mem::size_of_val(&r));
+            } else {
+                debug!("Member {} unexpectedly had no rumor heat to purge.", id);
+            }
         }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

size_of_val doesn't always give you what you think you want, since it's the size of the value, which may not be the same as the "total size" of the value, including any heap allocations. For example, you'll always get 24 if your value is a String, since a String is a 3-byte "fat pointer", regardless of how many characters are in the actual character string pointed to on the heap.

However, in this case, it's always going to be 64 bytes; 56 for the member ID (24 for the "fat pointer" and 32 for the ID string itself) and 8 for the usize counter.

Signed-off-by: Christopher Maier <cmaier@chef.io>
@christophermaier christophermaier merged commit aac5876 into master Oct 16, 2018
@christophermaier christophermaier deleted the cm/prune-rumors branch October 16, 2018 12:51
chef-ci added a commit that referenced this pull request Oct 16, 2018
Obvious fix; these changes are the result of automation not creative thinking.
@christophermaier christophermaier added Type:BugFixes PRs that fix an existing bug and removed X-fix labels Jul 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type:BugFixes PRs that fix an existing bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants