AddressSanitizer issue on multihreaded insert #40

chybz · 2020-02-27T09:10:13Z

I'm using a parallel_flat_hash_map<uint64_t, uint32_t> to store information about some groups, much like a SQL GROUP BY. Map key is a XXH3 cumulative hash of the grouped columns and value is the position of that group.

I submit to 16 worker threads a list of group keys to insert, following example code from your documentation:

using map_type = phmap::parallel_flat_hash_map<uint64_t, uint32_t>;

map_type map;

tv::parallel::for_each(
    map.subcnt(),
    [&](auto& ctx) {
        using hasher_type = typename map_type::hasher;

        hasher_type hasher;

        // gkeys is a std::vector<uint64_t> containing group keys
        for (auto gkey : gkeys) {
            auto map_key = hasher(gkey);
            auto idx = map.subidx(map_key);

            // ctx.local_index is the worker index: from 0 to map.subcnt() - 1
            if (idx == ctx.local_index) {
                auto gpos = current_group_count;

                // This line triggers AddressSanitizer or segfault
                auto res = map.insert(map_entry(gkey, gpos));

                if (res.second) {
                    // New element inserted
                    ++current_group_count;
                } else {
                    // Grab existing group position
                    gpos = res.first->second;
                }

                // Update size of group
                ++group_sizes[gpos];
            }
        }
    }
);

When compiled with AddressSanitizer under GCC 8.3.0, I get the attached trace. When compiled without AddressSanitizer, it segfaults at the same location. I've banged my head around (though it starts to hurt a bit now) checking and re-checking to no-avail.

Could you please help me ?

log.txt

The text was updated successfully, but these errors were encountered:

chybz · 2020-02-27T12:57:25Z

Hi, just to let you know I finally found the issue. In documentation, you're using typename HT::hasher hasher and then hashval = hasher(key), whereas I found in bench.cc that you use hashval = hash.hash(key). When I changed to the latter, insert works fine.

greg7mdp · 2020-02-27T15:07:26Z

@chybz where did you see that reference to typename HT::hasher in the documentation?
But you are right the original version should work as well, sorry for the issue. I'll look into fixing this.

chybz · 2020-02-28T11:05:07Z

Thanks @greg7mdp for your update.

It's there in the code block right after "and here is the code for the multi-threaded insert".

I've another question, though. Could you please add an insert(...) method when map subkey is already known, similar to what find(key, subkey) does ? My use case is that I have a hierarchical and parallel aggregation on many groups, merged in parallel and when I merge, I already computed map subkeys and would like to reuse them. Note that I cannot insert directly a map in another map because I must know what entries are new.

Thanks for any help.

greg7mdp · 2020-02-28T13:04:28Z

Sounds good @chybz I'll try to get this done this week-end. Thanks for pointing out the location of the code!

greg7mdp · 2020-03-02T16:38:20Z

Hi @chybz I tried adding the insert() with the hash value provided, but in my tests it doesn't make any speed difference, so I am not sure it is worth it.

chybz · 2020-03-03T15:15:18Z

Hi @greg7mdp, thank you very much for the time you spent on the subject !

greg7mdp · 2020-03-03T16:12:25Z

Hi @chybz pas de problème, merci d'utiliser parallel_hashmap!

chybz closed this as completed Feb 27, 2020

greg7mdp reopened this Feb 27, 2020

greg7mdp added a commit that referenced this issue Mar 2, 2020

fix documentation issue: #40

aa9b3f1

greg7mdp added a commit that referenced this issue Mar 2, 2020

update doc issue #40

13b3433

greg7mdp closed this as completed Mar 3, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AddressSanitizer issue on multihreaded insert #40

AddressSanitizer issue on multihreaded insert #40

chybz commented Feb 27, 2020

chybz commented Feb 27, 2020

greg7mdp commented Feb 27, 2020

chybz commented Feb 28, 2020

greg7mdp commented Feb 28, 2020

greg7mdp commented Mar 2, 2020

chybz commented Mar 3, 2020

greg7mdp commented Mar 3, 2020

AddressSanitizer issue on multihreaded insert #40

AddressSanitizer issue on multihreaded insert #40

Comments

chybz commented Feb 27, 2020

chybz commented Feb 27, 2020

greg7mdp commented Feb 27, 2020

chybz commented Feb 28, 2020

greg7mdp commented Feb 28, 2020

greg7mdp commented Mar 2, 2020

chybz commented Mar 3, 2020

greg7mdp commented Mar 3, 2020