New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Purge "rumor heat" information for departed members #5744
Conversation
This should allow us to do a degree of cleanup of old rumor information. Absent this, large and long-running Habitat networks can accumulate significant amounts of wasted memory on what is effectively useless historical information. The `RumorHeat` data structure effectively maps a rumor to a mapping of members to the number of times the member has seen the rumor. This allows us to know who's seen a rumor already, and who hasn't. Now, when a member departs the ring, we'll remove any of the "member-to-number-of-times-they've-seen-the-rumor" mappings for that member. Additionally, we'll remove _all_ such mappings for any rumors about _services_ hosted on that member. After all, the member is going away, so all of its services will too. We aren't removing the membership rumors, though, because that's how we pass the word that the member is departed. Future work may add more cleanups and rearrangements. Signed-off-by: Christopher Maier <cmaier@chef.io>
Thanks for the pull request! Here is what will happen next:
Thank you for contributing! |
I added the "do not merge" label while I run this through some further local testing. Pushing it out for early feedback, though. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens when a departed member returns? Instinctually, the system exepcts to find a key in the rumor_heat for everyone? Or does it get recreated?
Wroth adding a test for, at least.
@adamhjk The rumor heat structure would get repopulated again for that member. In that regard, it'd basically behave the same as if a totally new Supervisor joined up. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks generally good to me. Make some suggestions, but nothing I feel too strongly about.
/// memory this consumes. If that member should ever come back | ||
/// again, all rumors would be considered "hot" for them, so they | ||
/// will get a bit more network traffic initially. | ||
pub fn purge(&self, id: &str) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some debug!
s info about how many things were removed might be nice
@@ -57,6 +57,8 @@ impl Expire { | |||
.members_expired_to_departed(self.timing.departure_timeout_duration()); | |||
|
|||
for id in newly_departed_members { | |||
// Purge "heat" information for a member that's gone |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Between the good naming of newly_departed_members
and rumor_heat.purge(&id)
, I honestly think this comment is superfluous.
trace_incarnation, | ||
trace_health | ||
); | ||
|
||
// Purge "heat" information for a member that's gone |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, I'm not sure this comment says anything the code doesn't convey quite clearly (ditto to the several other times this same comment is used).
What is confusing to me though is why we still start_hot_rumor
even if we're purging. I'm not saying this is wrong (still trying to grok this code), but it seems strange enough that a comment about that might be warranted.
trace_incarnation, | ||
trace_health | ||
); | ||
|
||
// Purge "heat" information for a member that's gone | ||
if member_id != self.member_id() && health == Health::Departed { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is almost identical to the code in insert_member
. I wonder if it's possible to factor out the commonality as this sort of duplication seems prone to bugs. I don't just mean for the added code, but for the two functions more generally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point... I'm not clear on if there's actually a meaningful distinction between these functions; the names don't give it away, at least. This would probably be good to tackle in a follow-on PR.
@@ -744,6 +757,9 @@ impl Server { | |||
{ | |||
// TODO (CM): Why are we inferring departure from | |||
// a service rumor? | |||
|
|||
// Purge "heat" information for a member that's gone | |||
self.rumor_heat.purge(&service_rumor.member_id); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this and the duplication on L804 could be avoided by putting this logic inside insert_health_by_id
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd initially wanted to do that, too, but the MemberList
knows nothing of the RumorHeat
data structure 😦
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still coming to grips with this part of the codebase, so take my approval with the appropriately sized grain of salt, but it all looks good to me.
// all types of rumors. | ||
for heat in heat_map.values_mut() { | ||
heat.remove(id); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps either quantifying or summarizing the bytes purged might be of some use here?
+use std::mem;
...
for heat in heat_map.values_mut() {
- heat.remove(id);
+ if let Some(r) = heat.remove(id) {
+ debug!("Purging rumor heat of size {} bytes", mem::size_of_val(&r));
+ } else {
+ debug!("Member {} unexpectedly had no rumor heat to purge.", id);
+ }
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
size_of_val
doesn't always give you what you think you want, since it's the size of the value, which may not be the same as the "total size" of the value, including any heap allocations. For example, you'll always get 24
if your value is a String
, since a String
is a 3-byte "fat pointer", regardless of how many characters are in the actual character string pointed to on the heap.
However, in this case, it's always going to be 64 bytes; 56 for the member ID (24 for the "fat pointer" and 32 for the ID string itself) and 8 for the usize
counter.
Signed-off-by: Christopher Maier <cmaier@chef.io>
Obvious fix; these changes are the result of automation not creative thinking.
This should allow us to do a degree of cleanup of old rumor
information. Absent this, large and long-running Habitat networks can
accumulate significant amounts of wasted memory on what is effectively
useless historical information.
The
RumorHeat
data structure effectively maps a rumor to a mappingof members to the number of times the member has seen the rumor. This
allows us to know who's seen a rumor already, and who hasn't.
Now, when a member departs the ring, we'll remove any of the
"member-to-number-of-times-they've-seen-the-rumor" mappings for that
member.
Additionally, we'll remove all such mappings for any rumors about
services hosted on that member. After all, the member is going away,
so all of its services will too.
We aren't removing the membership rumors, though, because that's how
we pass the word that the member is departed.
Future work may add more cleanups and rearrangements.
Signed-off-by: Christopher Maier cmaier@chef.io