Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server: integrate shared mem hash set into envoy for tracking stats #2358

Merged
merged 35 commits into from
Jan 17, 2018

Conversation

jmarantz
Copy link
Contributor

@jmarantz jmarantz commented Jan 12, 2018

Description:
Adds and tests a pointer-free hash-map mapping strings to anything, suitable for using with shared memory. Fixes #1824 (O(n^2) stats creation) and helps #2063 (startup performance overall) significantly. Startup is still slowed by regexes after this PR.

Risk Level: Medium -- this is used in Envoy startup and a failure here would prevent startup. However, the new class is reasonably well tested in isolation.

Testing:
Its own new unit test, run under tsan, asan, msan, and valgrind, as well as debug and opt.

to the other handlers with a browser.

Add a private mechanism to have admin handlers supply arbitrary
headers.  Note that the handler callback doesn't expose a header
structure, so the public v2 API does not provide this header-control.
However, it was specifying text/html as content-type by default, so
I changed the default to text/plain (with nosniff).

Added sanitization of both the prefix string and the help text, which
need to be safely injected into the HTML doc.

Added some styling and a reference to the existing Envoy favicon as well.

Note that no existing handlers get any response bytes changed as a result
of this commit, just their content-type.

Signed-off-by: Joshua Marantz <jmarantz@google.com>
…ms in particular.

Signed-off-by: Joshua Marantz <jmarantz@google.com>
… can set the

content-type and/or cache-control.

Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
…ize.

This is in prepration for supporting deletion of keys.  Also, fixes a
problem in put() of a duplicate key (which needs to be tested.

Signed-off-by: Joshua Marantz <jmarantz@google.com>
…l class name, etc.

Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
@mattklein123
Copy link
Member

@jmarantz thanks a ton for working on this. before digging into review, can you give an outline of how this is going to be used for our actual use case? It's not totally clear to me that an independent implementation like this is going to be easily usable in the stats case without changing a bunch of other stuff also.

@ggreenway
Copy link
Contributor

A few things that are going to get in the way from a high level:

  • Value in this case will be a Stats::RawStatData. Note that the size of this type is determined at runtime (via command line arguments), not at compile time.
  • You're going to need a way to provide a binary-compatibility-version for this. Things like changing the hash function, or any layout change, will change the "version".
  • The lock used needs to be process-shared. See Envoy::Server::SharedMemory::initializeMutex() (source/server/hot_restart_impl.cc)
  • This is very likely to be the only shared-memory hash-table that envoy ever uses. I would consider making it not-a-template for simplicity.

@ggreenway ggreenway self-requested a review January 12, 2018 18:07
@ggreenway ggreenway self-assigned this Jan 12, 2018
@ggreenway
Copy link
Contributor

@jmarantz feel free to ping me on slack if you want to discuss

@ggreenway ggreenway removed their request for review January 12, 2018 18:29
@jmarantz
Copy link
Contributor Author

Thanks @mattklein123 & @ggreenway . High order bit is I'm going to have a go at doing the integration PR and make sure I've got that right before proceeding further with this one. I also need to understand why there are mutex problems in CI when all my testing (including tsan) worked in development.

@ggreenway
My thought about RawStatsData was that I would remove the char name_[] from it, since that is now in the key, making it fixed size. Is there some other variable sized member? Maybe making that change will be too hard and I'll change my mind.

In any case I admit I don't get huge value currently from termplating this. If it is getting in the way, or we need the payload to be resized at startup, that should be easy to remove in favor of a payload_size member variable in Options. At the moment it does not appear to be getting in the way, except that the impl is in a .h file rather than a .cc.

I thought I was using the right mutex for cross-process, but maybe CI is telling me different? I just double-checked and it's using the same attributes as SharedMemory::initializeMutex. Admittedly this should be shared via a refactor.

Adding a versioning uint32_t to Control SGTM. If the hash function changes we should bump it. Is this a potential issue for Envoy (hacking the hash function, pushing new binaries, and hot-restarting the existing ones?) I guess it could be. A version bump sounds appropriate.

My reasoning for making this hash-table general purpose:

  • it's easier to thoroughly test it when it's independent of the rest of the system, IMO
  • I like the idea of contributing a potentially useful piece of general purpose infrastructure to the OSS ecosystem, which might improve some other software where it wasn't worth crafting another one of these.

I'm not sure that will ever happen, but if someone's searching for a shared-memory hash table to borrow, there are no good hits on Google right now, and I think there should be.

Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
@ggreenway
Copy link
Contributor

My thought about RawStatsData was that I would remove the char name_[] from it, since that is now in the key, making it fixed size. Is there some other variable sized member? Maybe making that change will be too hard and I'll change my mind.

No, just the name. But keep in mind that RawStatsData is also used/created outside the shmem, so if you remove the name, you'll have to do something to fix up the other uses (I think). You'll run into that pretty fast (compiler will tell you where) when you do your integration change.

In any case I admit I don't get huge value currently from termplating this. If it is getting in the way, or we need the payload to be resized at startup, that should be easy to remove in favor of a payload_size member variable in Options. At the moment it does not appear to be getting in the way, except that the impl is in a .h file rather than a .cc.

My feeling is that if it is templated, it should be tested that it works with an arbitrary template argument, which adds some testing burden. Let's see how it goes as we iterate on this, and we can un-template it later if we need to.

I thought I was using the right mutex for cross-process, but maybe CI is telling me different? I just double-checked and it's using the same attributes as SharedMemory::initializeMutex. Admittedly this should be shared via a refactor.

I'm not sure what's going on with that then.

Adding a versioning uint32_t to Control SGTM. If the hash function changes we should bump it. Is this a potential issue for Envoy (hacking the hash function, pushing new binaries, and hot-restarting the existing ones?) I guess it could be. A version bump sounds appropriate.

The "version" is spit out as a string. My thought was to include the hash of "test string" in the version string, so changes in hash function will show up automatically.

My reasoning for making this hash-table general purpose:

it's easier to thoroughly test it when it's independent of the rest of the system, IMO
I like the idea of contributing a potentially useful piece of general purpose infrastructure to the OSS ecosystem, which might improve some other software where it wasn't worth crafting another one of these.
I'm not sure that will ever happen, but if someone's searching for a shared-memory hash table to borrow, there are no good hits on Google right now, and I think there should be.

I'm ok with keeping it separate as long as it doesn't substantially increase the complexity of either the hash table or envoy in general. I think it'll be easier to evaluate that once you've done some of the integration work.

It might be easiest if you just include that integration work in this PR. I don't think they need to be separate.

Signed-off-by: Joshua Marantz <jmarantz@google.com>
Copy link
Contributor

@ggreenway ggreenway left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found a few things to consider. But I mostly wouldn't worry about this until you've done the integration work; let's get the higher-level issues sorted out first.

* but not across machine architectures, as it doesn't use network byte order for
* storing ints.
*/
template <class Value> class SharedHashMap : public Logger::Loggable<Logger::Id::config> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think i would name this SharedMemoryHashMap or similar. When I see "SharedHashMap" it reminds me too much of shared_ptr, and makes me think of ownership or concurrent access or something.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


/**
* Constructs a map control structure given a set of options, which cannot be changed.
* Note that the map dtaa itself is not constructed when the object is constructed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: dtaa

* backing-store (eg) in shared memory, which we do after
* constructing the object with the desired sizing.
*/
size_t numBytes() const {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the interface might be cleaner if you make this a static function that takes the Options, and then put the code in attach() into the constructor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still figuring out how to integrate this. Making this a static function is fine, but it looks like the caller will know ahead of time whether they are attempting to attach or just init, so I'm reluctant to attempt attaching in the ctor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100% sure this made it cleaner, but it's reasonable.

* Note that if mutex is in a locked state at the time of attachment, this function
* can hang.
*/
bool attach(uint8_t* memory) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For extra error checking, please pass the length of memory along with the pointer, and validate that it's the expected length

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

}

/**
* Puts a new key into the map. If successful (e.g. map has capacity)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please document that this returns existing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought, I think the interface needs to indicate whether it added a new one or returned existing. Or you need to run a placement new on Value in the case that it is new. But somehow we have to be able to initialize a new one, and not initialize (ie overwrite) an existing one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Done. Returned a pair, a la STL map insert.


// It seems like this is an obvious constexpr, but it won't compile as one.
static size_t alignment() {
return std::max(alignof(Cell), std::max(alignof(uint32_t), alignof(Control)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should alignof(Value) be included here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No,I don't think so, because it's part of Cell. I will note, however, that the CI seems to catch alignment errors that don't show up in my development environment, for some reason.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it; that seems correct.

return false;
}

// As a sanity check, makee sure there are control_->size values
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: makee

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

for (uint32_t next, cell_index = slots_[slot]; cell_index != Sentinal; cell_index = next) {
if (cell_index >= options_.capacity) {
ENVOY_LOG(error, "SharedHashMap cell index too high={}, capacity={}", cell_index,
options_.capacity);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ret = false; ?

 - renamed to SharedMemHashMap
 - more detail on the return values.

I still need to add some version-stamping.

Signed-off-by: Joshua Marantz <jmarantz@google.com>
@jmarantz
Copy link
Contributor Author

jmarantz commented Jan 13, 2018

I have this integrated now. This is easier to integrate as a hash_set rather than a hash_map; then the changes to Envoy's server stack are minimal. Also I realize that I can just punt on the mutex-locking in the class itself, since that's done in the class. I still have some cleanup to do, then I'll push the set class and the Envoy changes together in this PR.

Raw timing based on @htuch's data from #2063 , and hacking main_common.cc to exit(0) before server_run:

envoy_1k_clusters_1k_virtual_hosts.json : 30 seconds --> 3.5 seconds
envoy_10k_clusters_10k_virtual_hosts.json : 330 seconds --> 30 seconds

This is with an optimized build. Definitely needs more scrutiny on the sizes & alignments, but 'bazel test //test/...' passes.

I will re-open again when it's ready for review.

@jmarantz jmarantz closed this Jan 13, 2018
Changes hash-map implemenation to a hash-set, which makes it easier to
allow Envoy to retain its own name storage as it does today.

Signed-off-by: Joshua Marantz <jmarantz@google.com>
…ed method.

Signed-off-by: Joshua Marantz <jmarantz@google.com>
Save the hash of a bunch of non-zero 8-bit characters in the shared-memory
segment, so we can double-check that the hash algorithm hasn't changed when
we try to attach it.

Add a comment explaining how initialization works for a new RawStatData.

Signed-off-by: Joshua Marantz <jmarantz@google.com>
Copy link
Contributor

@ggreenway ggreenway left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is coming along really well. I agree that switching from a map to a set made the integration nicer. Feel free to ping me on slack if you want to discuss any of this feedback more interactively.

ASSERT(name.size() <= maxNameLength());
ASSERT(std::string::npos == name.find(':'));
ASSERT(key.size() <= maxNameLength());
ASSERT(std::string::npos == key.find(':'));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

absl::string_view::npos == key.find(.....)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done; good catch.

class Primes {
public:
/**
* Determines whether x is prime, assuming it is not even.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Insead of assuming it's not even, I'd say the API is cleaner if you just check for divisible by 2 at the beginning of the function. One less gotcha.

*/
void initialize(const std::string& name);
void initialize(absl::string_view key);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is sizeof(absl::string_view)? Should it be passed by const&, or is it normal to pass by value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sizeof(absl::string_view)==16 for me. It's normal to pass it by value.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok; if that's the convention, seems fine to me.

static SharedMemoryHashSetOptions sharedMemHashOptions(Options& options) {
SharedMemoryHashSetOptions hash_set_options;
hash_set_options.capacity = options.maxStats();
hash_set_options.num_slots = Primes::findPrimeLargerThan(hash_set_options.capacity / 2);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's a good chance that we'll spend more time finding a prime number, than we'll save in reduced hash collisions. So I think I'd lean towards a simpler implementation that doesn't bother with primes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's yak-shaving either way, most likely. But my intuition is different. Here's why.

  • the penalty for the search on collisions is a strcmp, whereas the penalty for searching for a prime is a uint32 modulus.
  • at (say) 10k maxKeys, I'm searching for a prime >5k and I only have to look at ~5 numbers
  • the current hash function may give a really good distribution without primes, but it's really sensitive and harder to reason about.

How about we leave this as is for now, and I'll put in a TODO to do a benchmark once #2350 is in?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with that. I wasn't as worried about the time spent finding a prime, as the added code complexity for probably minimal performance savings (especially since it is only the performance of loading config). But let's wait and do a benchmark, then decide.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do primes come into this? Sorry not a hash table/set expert. If we keep the primes can you add some comments about how/whey they are used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TL;DR: better distribution for many hash functions.

It may not be needed for the hash function I found in Envoy, but let's say were making a hash table using pointers as keys. The pointers would typically have the low order 3 bits 0, so you'd only use 1/8 of your buckets. modding with a prime solves that and gets you a better distribution.

https://stackoverflow.com/questions/3980117/hash-table-why-size-should-be-prime

I just left that as a one-line comment at the call-site.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, thanks

if (options.restartEpoch() != 0) {
parent_address_ = createDomainSocketAddress((options.restartEpoch() + -1));
need_init = !stats_set_.attach(shmem_.stats_set_data_, stats_set_data_size_);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think need_init should be set to false here, and if attach didn't work, then RELEASE_ASSERT or throw or something. There shouldn't be any case where epoch != 0 that we should init the hash table.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I switched to init/attach during construction, throwing on error.

Of Envoy's variances with the Google Style Guide, the ok-to-throw policy is the hardest for me to get used to, but I understand the motivation and I'll get used to it.

I chose to throw rather than RELEASE_ASSERT so I could test it. I'm not sure if I'm doing everything right (e.g. I'm catching in the test but not in production; is that the right thing?) and I just put a string in to describe what I'm throwing, as opposed to making a new std::exception object.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can actually test RELEASE_ASSERT. Look for death tests in the codebase. But that being said, I think it's ok to throw.

Almost every thrown exception is an instance of EnvoyException. So unless you need to be able to catch specific exceptions by type, just use that.


bool RawStatData::matches(const std::string& name) {
// In case a stat got truncated, match on the truncated name.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You've added a small regression by refactoring this. It used to be that if the stat name was too long, everything would still work (with the possibility of a name collision); now the names will be considered not matching. Not sure if it's worth fixing or not though. @mattklein123 any thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually this was a big regression (assert-fail). I fixed it and added a testcase.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very good catch by the way. Thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think originally when I wrote the code that way it was just for sanity/safety. I don't have a strong opinion on what we do now as long as it doesn't crash obviously. @jmarantz out of curiosity what asserts were failing when you changed the behavior?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RawStatData::initialize has
ASSERT(key.size() <= maxNameLength());


// Copy of the options in process-local memory; which is used to help compute
// the required size in bytes after construction and before init/attach.
const SharedMemoryHashSetOptions options_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think the interface on this is a bit weird, as shown by the need for a 2nd copy of the options, and also that the object can be fully constructed but crash on use if attach/init hasn't been called yet.

I think attach/init should happen in the constructor so that the object is always valid, and then we don't need 2 copies of the options. Pass a boolean to the constructor so it knows whether we're doing attach or init.

slots_[slot] = Sentinal;
}

// Initialize all the key-char offsets.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what this comment means. Maybe change to "Initialize the free-cell list."?

}

/**
* Removes the specified key from the map, returning bool if the key
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: "returning true if the key..."

return std::max(alignof(Cell), std::max(alignof(uint32_t), alignof(Control)));
}

static uint32_t align(uint32_t size) { return (size + alignment() - 1) & ~(alignment() - 1); }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please RELEASE_ASSERT that alignment() returns a power-of-2; otherwise this math doesn't work (I think)

…throwing exceptions.

This eliminates the case where a SharedMemoryHashSet is constructed
but not yet usable.

Fix a few other style nits.

Signed-off-by: Joshua Marantz <jmarantz@google.com>
…ive.

Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
if (init) {
initialize(options, memory);
} else if (!attach(options, memory)) {
throw std::runtime_error("Incompatible memory block");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

throw EnvoyException("the text");

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

* @param memory
*/
void initialize(const SharedMemoryHashSetOptions& options, uint8_t* memory) {
mapMemorySegments(options, memory);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor cleanup: mapMemorySegments() can be done in the constructor before either initialize() or attach() is called, and then removed from both of these functions. And optionally mapMemorySegments() can be inlined into the constructor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

return std::max(alignof(Cell), std::max(alignof(uint32_t), alignof(Control)));
}

static uint32_t align(uint32_t size) { return (size + alignment() - 1) & ~(alignment() - 1); }
static uint32_t align(uint32_t size) {
const size_t alignment = calculateAlignment();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just remembered that there's already an implementation of this in stats_impl.cc at the top: roundUpMultipleNaturalAlignment(). Can you put it in a common location and share the function between the two? Or add a TODO?

…em-version call.

Reworked the version-test to test explicitly for the version-string
being deterministic based on options.

Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking sweet. Love how this turned out. Just a few comments form a quick skim.

@@ -156,7 +164,8 @@ void HotRestartImpl::free(Stats::RawStatData& data) {
if (--data.ref_count_ > 0) {
return;
}

bool key_removed = stats_set_.remove(data.key());
RELEASE_ASSERT(key_removed);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I would probably make this a normal ASSERT

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That will fail to compile in release builds, because key_removed is assigned and not used when the ASSERT disappears (found that through CI but it's a pattern I've seen many times).

The pattern we'd use at Google for this is:

 bool key_removed = stats_set_.remove(data.key());
 if (!key_removed) {
   LOG(DFATAL) << "key not removed for key " << data.key();
}

In my vanilla C++ life I'd do:

#ifndef NDEBUG
assert(stats_set_.remove(data.key()));
#else 
stats_set_.remove(data.key());
#endif

I'd be happy with whatever you think makes the most sense in the Envoy codebase.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In these cases in the code in Envoy we generally do UNREFERENCED_PARAMETER(key_removed)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still need to switch to ASSERT :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, fixed now :) Also fixed a broken integration test due to the way SharedMemory::version() was called directly from main(). Made it call a new HotRestart::hotRestartVersion() method which also pulls in the SharedMemoryHashSet data.

static SharedMemoryHashSetOptions sharedMemHashOptions(Options& options) {
SharedMemoryHashSetOptions hash_set_options;
hash_set_options.capacity = options.maxStats();
hash_set_options.num_slots = Primes::findPrimeLargerThan(hash_set_options.capacity / 2);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do primes come into this? Sorry not a hash table/set expert. If we keep the primes can you add some comments about how/whey they are used?

@@ -473,7 +474,7 @@ void HotRestartImpl::terminateParent() {

void HotRestartImpl::shutdown() { socket_event_.reset(); }

std::string HotRestartImpl::version() { return shmem_.version(); }
std::string HotRestartImpl::version() { return shmem_.version() + stats_set_.version(); }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we put a separator between the two? space or period?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, sorry, will do.

Signed-off-by: Joshua Marantz <jmarantz@google.com>
ggreenway
ggreenway previously approved these changes Jan 17, 2018
Copy link
Contributor

@ggreenway ggreenway left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

…ry::version() and thus lacked the SharedMemoryHashSet data.

Also finally changes a RELEASE_ASSERT to ASSERT.

Signed-off-by: Joshua Marantz <jmarantz@google.com>
Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 small nits given that it looks like there is an ASAN failure.

size_t bytes = RawStatDataSet::numBytes(options);
std::unique_ptr<uint8_t> mem_buffer_for_dry_run_(new uint8_t[bytes]);
RawStatDataSet stats_set(options, true /* init */, mem_buffer_for_dry_run_.get());
return SharedMemory::version(max_num_stats, max_stat_name_len) + "." + stats_set.version();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: somehow this part and HotRestartImpl::version() would be combined to avoid an oops in the future, or I would at least drop a small comment in both functions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was bugging me too. Fixed.

// can generate the version string.
std::string HotRestartImpl::hotRestartVersion(size_t max_num_stats, size_t max_stat_name_len) {
const SharedMemoryHashSetOptions options = sharedMemHashOptions(max_num_stats);
size_t bytes = RawStatDataSet::numBytes(options);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: const

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

…edMemory.

There was significant overlap in functionality, leading to confusion
and potential misalignment.  Factored out
SharedMemory::version() (non-static version) and use a common helper
function for the static and non-static versions in HotRestartImpl.

Signed-off-by: Joshua Marantz <jmarantz@google.com>
@jmarantz
Copy link
Contributor Author

also fixed the asan issue, which was a due to using std::unique_ptr to storing an array, but not using the correct specialization.

@jmarantz
Copy link
Contributor Author

Thanks Gary & Matt for the reviews & insight into the Envoy mechanisms I needed to understand better in order to land this.

@@ -477,15 +473,24 @@ void HotRestartImpl::terminateParent() {

void HotRestartImpl::shutdown() { socket_event_.reset(); }

std::string HotRestartImpl::version() { return shmem_.version() + "." + stats_set_.version(); }
std::string HotRestartImpl::version() {
// return shmem_.version() + "." + stats_set_.version();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean for this comment to be here?

Signed-off-by: Joshua Marantz <jmarantz@google.com>
@mattklein123
Copy link
Member

@jmarantz looks like this needs a master merge.

@jmarantz
Copy link
Contributor Author

master-merged done :)

@mattklein123 mattklein123 merged commit 0d4e4b2 into envoyproxy:master Jan 17, 2018
@jmarantz jmarantz deleted the shared-mem-hash-map branch January 17, 2018 23:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants