Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce false positives (those not caused by WAFs or bot detection) #2068

Merged
merged 24 commits into from
May 4, 2024

Conversation

ppfeister
Copy link
Member

@ppfeister ppfeister commented Apr 8, 2024

The following targets were fixed or partially fixed:
Archive[.]org (eh. sometimes. it has a loading problem that causes other F+s... some were fixed)
Archive of Our Own
CGTrader
CNET
Contently
Eintracht Frankfurt Forum
GeeksForGeeks
Genius[.]com Artists
Genius[.]com Users
Gumroad
HackerNews
HackerRank
IFTTT
Kongregate
Linktree
OpenStreetMap
Pinkbike
Polymart (often hits bot detection, but can be bypassed with some proxies)
Slides
Splits[.]io
Strava
Telegram
xHamster
YandexMusic
eintracht
jeuxvideo

The following targets were removed:
BitcoinForum (likely defunct)
G2G
HexRPG (auth wall)
Metacritic
ModelHub (defunct)
Oracle Communities (auth wall)

Misc changes:
Default User Agent updated


ModelHub was not added to ./removed_sites.[md|json] as the platform itself is confirmed to be shutting down (and will therefore never return to Sherlock). The other removed targets were documented normally.

Fiverr, Euw, NationStates Nations, NationStates Regions, and a couple others remain occasionally problematic as they are behind WAFs and bot detection services. These WAF-induced false positives are resolved with sister PR #2069 and have partial support for a decent proxy being discussed in #2081.


Issues:
Fixes #904
Fixes #1966
Fixes #1999 (with sister PR #2069)
Fixes #2027 (with sister PR #2069)
Fixes #2071

Collateral (trumped or negated):
Closes #1843 // Removes jeuxvideo instead of applying a fix (a fix is being applied here).
Closes #2083 // Removes ModelHub, which is an action taken here. Would be negated by merge.
Closes #2096 // Fixes Archive[.]org, which is an action taken here. Would be negated by merge.

The following targets were fixed:
Archive[.]org
CGTrader
CNET
Contently
IFTTT
Linktree
xHamster

The following targets were removed:
HexRPG (auth wall)
ModelHub (defunct)
Oracle Communities (auth wall)

ModelHub was not added to ./removed_sites.md as the platform itself is shutting down (and will therefore never return to Sherlock). The other removed targets were documented normally.

BitcoinForum is currently down and suspected to be defunct. Since this is uncertain, however, a test condition was added to suppress false positives while allowing for normal operation upon the forum's return.
@ppfeister ppfeister changed the title Reduce false positives Reduce non-CF false positives Apr 8, 2024
@ppfeister ppfeister mentioned this pull request Apr 8, 2024
Error codes module expanded to support arrays of error codes rather than only one.
Using this new functionality, Slides was set to error codes 404 (as standard) AND 204 (non standard), to accomodate for that website's odd edge case.
@ppfeister ppfeister mentioned this pull request Apr 9, 2024
4 tasks
@ppfeister ppfeister force-pushed the master branch 2 times, most recently from 040100e to b6564a8 Compare April 9, 2024 16:44
@ppfeister ppfeister mentioned this pull request Apr 10, 2024
Attempts were met with a Varnish error page presenting 54113 (possibly Fastly related).

Change to User Agent necessary to avoid Varnish/Fastly issues.
Change to Accept necessary to avoid infinite 302 redirection.
Without BOTH of these changes, attempts will fail.

Both changes being made also permit the use of status_code rather than message.
@ppfeister
Copy link
Member Author

ppfeister commented Apr 11, 2024

A lil bit larger of a pr than first expected........

I think that's all of em, though... Well, I hope that's all of em
When combined with #2069, I'm no longer experiencing any false positives in my results. I'm sure there's some username combo that causes one, but none that I've stumbled upon

For general testing purposes, ppfeister:rc/combopoc currently reflects master with this, #2069, #2070, and #2092 all merged.

(ppfeister:rc/combopoc itself has a messy merge history though -- don't use that branch for any merging upstream)

Cheers.

@ppfeister ppfeister changed the title Reduce non-CF false positives Reduce non-WAF false positives Apr 11, 2024
I've noticed that many bot-detection pages are able to be avoided by using this UA. Unless there's a reason to stay on the old old old one, we may as well update it and reduce our WAF hits.
@ppfeister ppfeister changed the title Reduce non-WAF false positives Reduce false positives (those not caused by WAFs or bot detection) Apr 19, 2024
@sdushantha
Copy link
Member

Thank you so much @ppfeister for fixing all of this, I really appriciate it!

@sdushantha sdushantha merged commit 482982c into sherlock-project:master May 4, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants