Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AddressSanitizer issue on multihreaded insert #40

Closed
chybz opened this issue Feb 27, 2020 · 7 comments
Closed

AddressSanitizer issue on multihreaded insert #40

chybz opened this issue Feb 27, 2020 · 7 comments

Comments

@chybz
Copy link

chybz commented Feb 27, 2020

Hello @greg7mdp !

I'm using a parallel_flat_hash_map<uint64_t, uint32_t> to store information about some groups, much like a SQL GROUP BY. Map key is a XXH3 cumulative hash of the grouped columns and value is the position of that group.

I submit to 16 worker threads a list of group keys to insert, following example code from your documentation:

using map_type = phmap::parallel_flat_hash_map<uint64_t, uint32_t>;

map_type map;

tv::parallel::for_each(
    map.subcnt(),
    [&](auto& ctx) {
        using hasher_type = typename map_type::hasher;

        hasher_type hasher;

        // gkeys is a std::vector<uint64_t> containing group keys
        for (auto gkey : gkeys) {
            auto map_key = hasher(gkey);
            auto idx = map.subidx(map_key);

            // ctx.local_index is the worker index: from 0 to map.subcnt() - 1
            if (idx == ctx.local_index) {
                auto gpos = current_group_count;

                // This line triggers AddressSanitizer or segfault
                auto res = map.insert(map_entry(gkey, gpos));

                if (res.second) {
                    // New element inserted
                    ++current_group_count;
                } else {
                    // Grab existing group position
                    gpos = res.first->second;
                }

                // Update size of group
                ++group_sizes[gpos];
            }
        }
    }
);

When compiled with AddressSanitizer under GCC 8.3.0, I get the attached trace. When compiled without AddressSanitizer, it segfaults at the same location. I've banged my head around (though it starts to hurt a bit now) checking and re-checking to no-avail.

Could you please help me ?

log.txt

@chybz
Copy link
Author

chybz commented Feb 27, 2020

Hi, just to let you know I finally found the issue. In documentation, you're using typename HT::hasher hasher and then hashval = hasher(key), whereas I found in bench.cc that you use hashval = hash.hash(key). When I changed to the latter, insert works fine.

@chybz chybz closed this as completed Feb 27, 2020
@greg7mdp
Copy link
Owner

@chybz where did you see that reference to typename HT::hasher in the documentation?
But you are right the original version should work as well, sorry for the issue. I'll look into fixing this.

@greg7mdp greg7mdp reopened this Feb 27, 2020
@chybz
Copy link
Author

chybz commented Feb 28, 2020

Thanks @greg7mdp for your update.

It's there in the code block right after "and here is the code for the multi-threaded insert".

I've another question, though. Could you please add an insert(...) method when map subkey is already known, similar to what find(key, subkey) does ? My use case is that I have a hierarchical and parallel aggregation on many groups, merged in parallel and when I merge, I already computed map subkeys and would like to reuse them. Note that I cannot insert directly a map in another map because I must know what entries are new.

Thanks for any help.

@greg7mdp
Copy link
Owner

Sounds good @chybz I'll try to get this done this week-end. Thanks for pointing out the location of the code!

greg7mdp added a commit that referenced this issue Mar 2, 2020
greg7mdp added a commit that referenced this issue Mar 2, 2020
@greg7mdp
Copy link
Owner

greg7mdp commented Mar 2, 2020

Hi @chybz I tried adding the insert() with the hash value provided, but in my tests it doesn't make any speed difference, so I am not sure it is worth it.

@chybz
Copy link
Author

chybz commented Mar 3, 2020

Hi @greg7mdp, thank you very much for the time you spent on the subject !

@greg7mdp
Copy link
Owner

greg7mdp commented Mar 3, 2020

Hi @chybz pas de problème, merci d'utiliser parallel_hashmap!

@greg7mdp greg7mdp closed this as completed Mar 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants