You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The implementation has a bug in the case where the strings are not long enough to generate the target hash full length.
The function:
std::vector<int32_t> getAllHashes(std::vector& bytes, uint64_t k)
stops generating the hast at the last character of the string.
The function:
std::vector<int32_t> digest(uint64_t k, std::vector& bytes)
then does a vector.resize up to the target hash size. This means padding the vector with zeros.
The set intersection function has a very strong initial assumption about the set being a real set and not a vector. If there is one single duplicate value in either of the input arrays the comparison of the hashes gives wrong results.
Here is my current solution to this problem. (Space padding the input string until the target hash size is reached using the same set insert mechanism as for the input string).
std::unordered_set<int32_t> x_set;
MurmurHash3 running_hash = MurmurHash3();
for(char b : bytes)
{
int32_t hash = running_hash.pushByte(b);
if (x_set.insert(hash).second)
{
//was successfully added, so never seen it before. Put it in!
ints.push_back(hash);
running_hash.reset();
}
}
while (ints.size() < k) {
int32_t hash = running_hash.pushByte(0x20);
if (x_set.insert(hash).second)
{
//was successfully added, so never seen it before. Put it in!
ints.push_back(hash);
running_hash.reset();
}
}
return ints;
}
The text was updated successfully, but these errors were encountered:
Hello,
The implementation has a bug in the case where the strings are not long enough to generate the target hash full length.
The function:
std::vector<int32_t> getAllHashes(std::vector& bytes, uint64_t k)
stops generating the hast at the last character of the string.
The function:
std::vector<int32_t> digest(uint64_t k, std::vector& bytes)
then does a vector.resize up to the target hash size. This means padding the vector with zeros.
The set intersection function has a very strong initial assumption about the set being a real set and not a vector. If there is one single duplicate value in either of the input arrays the comparison of the hashes gives wrong results.
Here is my current solution to this problem. (Space padding the input string until the target hash size is reached using the same set insert mechanism as for the input string).
std::vector<int32_t> getAllHashes(std::vector& bytes, uint64_t k)
{
std::vector<int32_t> ints;
}
The text was updated successfully, but these errors were encountered: