Bloom filter collisions #60

EvelynSubarrow · 2021-09-07T12:17:40Z

The bloom filters have apparently not been resized since release 1.0.17, and the collision rate is obviously increasing more and more as the bloom filters become more saturated, something massively exacerbated by the fact that your hashes are 32-bit. This is unacceptable, even excepting the obvious issues of oversight and integrity, you yourself have no meaningful control or insight into how some URLs are classified. In addition, there is no mechanism to address the fact that collisions necessarily persist so long as the hashes which form them do.

By way of example, the normalised URL twitter.com/x0s1jpnq2sk2 is classified as both trans-friendly (since 1.0.17) and transphobic (since 1.0.16).

Bloom filters are a wholly inappropriate mechanism for this task and this implementation is grotesquely irresponsible.

My strong recommendation is that you:

Use a more robust cryptographic hashing algorithm
Approximate the false positive rate and ensure that all users are adequately informed of the risk
Increase the size of the bloom filter to reduce the rate of false positives
Introduce a version-dependent salt before hashing (the version string would serve just fine, it doesn't need to be complex) to break false positive persistence

I am giving these recommendations to you as harm reduction. This extension, for no technically justifiable reason, centralises easily abusable power into your hands beyond meaningful oversight. Your contempt for transparency, and for those who fear being outed by your recklessness are unconscionable. If you have a shred of decency, you should discontinue this extension immediately.

The text was updated successfully, but these errors were encountered:

elle-trudgett · 2022-04-14T04:37:04Z

+1 to reducing collisions, it is a nuisance. I don't know about all that moralistic stuff. But I'd like the bloom filter resized.

EvelynSubarrow · 2022-04-14T09:06:29Z

I honestly think the extension might well be abandoned at this point. Between the increased scrutiny, datatilsynet's decision, and the lack of an update for ~8mo... this is an unusually long gap between releases

Ralimbahere · 2024-02-18T02:20:20Z

The bloom filter was updated recently.

lunaynx · 2024-02-18T09:12:02Z

It was updated yes, but is the problem with the collisions solved?

EvelynSubarrow · 2024-02-19T17:07:59Z

I've been told about the update, not looked at it yet. I'm not holding my breath, either way.

EvelynSubarrow mentioned this issue Dec 11, 2022

1030→1031, where are we now? #90

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bloom filter collisions #60

Bloom filter collisions #60

EvelynSubarrow commented Sep 7, 2021 •

edited

Loading

elle-trudgett commented Apr 14, 2022

EvelynSubarrow commented Apr 14, 2022

Ralimbahere commented Feb 18, 2024

lunaynx commented Feb 18, 2024 •

edited

Loading

EvelynSubarrow commented Feb 19, 2024

Bloom filter collisions #60

Bloom filter collisions #60

Comments

EvelynSubarrow commented Sep 7, 2021 • edited Loading

elle-trudgett commented Apr 14, 2022

EvelynSubarrow commented Apr 14, 2022

Ralimbahere commented Feb 18, 2024

lunaynx commented Feb 18, 2024 • edited Loading

EvelynSubarrow commented Feb 19, 2024

EvelynSubarrow commented Sep 7, 2021 •

edited

Loading

lunaynx commented Feb 18, 2024 •

edited

Loading