Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix flat_hash_map when used across library boundaries #26

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

smessmer
Copy link

Singletons don't work well when used across dynamic library boundaries, each library might get its own copy of the singleton.
This means if library A creates a hash table and library B adds elements to it, causing a rehash and therefore deallocation, the empty_default_table() won't be recognized correctly because the two libraries have different default tables.

This PR fixes this by treating the default table like any other table, allocating and deallocating it. This causes additional allocations when the hashmap is created or set to empty, so it might not be the optimal solution, but at least it works. A potential better solution might involve nullptr and checking against nullptr. I don't have time to implement that though.

Singletons don't work well when used across dynamic library boundaries, each library might get its own copy of the singleton.
This means if library A creates a hash table and library B adds elements to it, causing a rehash and therefore deallocation, the `empty_default_table()` won't be recognized correctly because the two libraries have different default tables.

This PR fixes this by treating the default table like any other table, allocating and deallocating it. This causes additional allocations when the hashmap is created or set to empty, so it might not be the optimal solution, but at least it works. A potential better solution might involve nullptr and checking against nullptr.
@ezyang
Copy link

ezyang commented Dec 19, 2018

This feels like a header-only library self-inflicted wound...

@ezyang
Copy link

ezyang commented Dec 19, 2018

I'm a little confused how this ever worked. Why does a hash map library need a singleton? Aren't you going to immediately mutate the "empty" table at some later point in time?

@smessmer
Copy link
Author

@ezyang There are a few control flows where now, after this PR, additional allocations happen. For example, if you create a new hash map and immediately call reserve(some_large_number) on it. Before this PR, that would only have been one allocation for the large number. After this PR, it'll at first allocate the empty state (which has space for 4 elements) and then re-allocate for the large number.

@cebtenzzre
Copy link

Can this be adjusted to use this new behavior only on Windows, or if a certain compile-time flag is defined? As far as I understand, the existing code works just fine on Unix-like systems.
Also, this needs to be done for the other two hash tables as well, as they use similar constructs.

@antialize
Copy link

@cebtenzzre The existing code does not work on unix-like systems in all cases. Specificly when compiling a so file with "-fvisibility=hidden" parsing a hash map over the so boundary will result in different empty_blocks() in the different units.

We have worked around this issue by changing the line:
static sherwood_v8_block * empty_block()

to:
attribute((visibility("default"))) static sherwood_v8_block * empty_block()

Another less intrusive way to deal with this is to capture a pointer to the value of some empty_block() when constructing the table as a member of the table, and then alwayes use this captured pointer.

@wesleyw72
Copy link

I believe this change also helps with using the flat hash map in shared memory. Previously, the pointer returned by empty_default_table is in a particular process's address space (not in shared memory). This causes issues as other processes can't access that address, and therefore can cause a segmentation fault. The change proposed in this PR fixes that, as the provided allocator is used, so as long as that allocator is a shared memory allocator - it seems to work quite well.

william-silversmith added a commit to seung-lab/fastremap that referenced this pull request May 25, 2022
Avoids problems with dynamic library situations:
skarupke/flat_hash_map#26
william-silversmith added a commit to seung-lab/fastremap that referenced this pull request May 25, 2022
* perf: switch to ska::flat_hash_map instead of std::unordered_map

Small ~10% improvement to connectomics.npy renumber, but ~45%
improvement to a random array.

* fix: support 32-bit

skarupke/flat_hash_map#18

* fix: allocate default table

Avoids problems with dynamic library situations:
skarupke/flat_hash_map#26

* fix: use more compatible definition of void_t

* docs: update comment date
onepick added a commit to onepick/Paddle that referenced this pull request Jul 3, 2023
* declear new rccl api in paddle/fluid
* Fix build error:free(): invalid pointer by pick the fix:
skarupke/flat_hash_map#26
* Filter to check codestyle for flash_hash_map.h

Signed-off-by: jiajuku <jiajuku12@163.com>
ronny1996 pushed a commit to PaddlePaddle/Paddle that referenced this pull request Jul 4, 2023
* Fix build error on rocm4.5/rocm5 on ubuntu18.04

* declear new rccl api in paddle/fluid
* Fix build error:free(): invalid pointer by pick the fix:
skarupke/flat_hash_map#26
* Filter to check codestyle for flash_hash_map.h

Signed-off-by: jiajuku <jiajuku12@163.com>

* Apply suggestions from code review

Co-authored-by: Nyakku Shigure <sigure.qaq@gmail.com>

---------

Signed-off-by: jiajuku <jiajuku12@163.com>
Co-authored-by: Nyakku Shigure <sigure.qaq@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants