Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bloom filter collisions #60

Open
EvelynSubarrow opened this issue Sep 7, 2021 · 5 comments
Open

Bloom filter collisions #60

EvelynSubarrow opened this issue Sep 7, 2021 · 5 comments

Comments

@EvelynSubarrow
Copy link

EvelynSubarrow commented Sep 7, 2021

The bloom filters have apparently not been resized since release 1.0.17, and the collision rate is obviously increasing more and more as the bloom filters become more saturated, something massively exacerbated by the fact that your hashes are 32-bit. This is unacceptable, even excepting the obvious issues of oversight and integrity, you yourself have no meaningful control or insight into how some URLs are classified. In addition, there is no mechanism to address the fact that collisions necessarily persist so long as the hashes which form them do.

By way of example, the normalised URL twitter.com/x0s1jpnq2sk2 is classified as both trans-friendly (since 1.0.17) and transphobic (since 1.0.16).

Bloom filters are a wholly inappropriate mechanism for this task and this implementation is grotesquely irresponsible.

My strong recommendation is that you:

  • Use a more robust cryptographic hashing algorithm
  • Approximate the false positive rate and ensure that all users are adequately informed of the risk
  • Increase the size of the bloom filter to reduce the rate of false positives
  • Introduce a version-dependent salt before hashing (the version string would serve just fine, it doesn't need to be complex) to break false positive persistence

I am giving these recommendations to you as harm reduction. This extension, for no technically justifiable reason, centralises easily abusable power into your hands beyond meaningful oversight. Your contempt for transparency, and for those who fear being outed by your recklessness are unconscionable. If you have a shred of decency, you should discontinue this extension immediately.

@elle-trudgett
Copy link

+1 to reducing collisions, it is a nuisance. I don't know about all that moralistic stuff. But I'd like the bloom filter resized.

@EvelynSubarrow
Copy link
Author

I honestly think the extension might well be abandoned at this point. Between the increased scrutiny, datatilsynet's decision, and the lack of an update for ~8mo... this is an unusually long gap between releases

@Ralimbahere
Copy link

The bloom filter was updated recently.

@qtlunya
Copy link

qtlunya commented Feb 18, 2024

It was updated yes, but is the problem with the collisions solved?

@EvelynSubarrow
Copy link
Author

I've been told about the update, not looked at it yet. I'm not holding my breath, either way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants