Skip to content

Fix no-op chardet MINIMUM_THRESHOLD patch in dirtyPatches() (#6024)#6066

Merged
stamparm merged 1 commit into
sqlmapproject:masterfrom
potato-20:fix-chardet-threshold-noop
Jun 4, 2026
Merged

Fix no-op chardet MINIMUM_THRESHOLD patch in dirtyPatches() (#6024)#6066
stamparm merged 1 commit into
sqlmapproject:masterfrom
potato-20:fix-chardet-threshold-noop

Conversation

@potato-20
Copy link
Copy Markdown
Contributor

Fixes #6024.

The bug

dirtyPatches() in lib/core/patch.py does:

# to prevent too much "guessing" in case of binary data retrieval
thirdparty.chardet.universaldetector.MINIMUM_THRESHOLD = 0.90

But MINIMUM_THRESHOLD is a class attribute of UniversalDetector (thirdparty/chardet/universaldetector.py), read as self.MINIMUM_THRESHOLD. The line above sets an unused module-level attribute, so it has been a no-op — the effective threshold has stayed at the default 0.20, not the intended 0.90.

module attr set to : 0.9
value actually read : 0.2   # class default -> patch ineffective

The fix (1 line)

Assign the class attribute that is actually read, so the patch does what its comment says:

thirdparty.chardet.universaldetector.UniversalDetector.MINIMUM_THRESHOLD = 0.90

(Touches only lib/core/patch.py; thirdparty/ is left untouched. data/txt/sha256sums.txt is regenerated for the modified file; --smoke passes.)

Behavioural impact (kept minimal/safe)

chardet.detect() is used only as a last-resort charset fallback in lib/request/basic.py — i.e. when a response has no Content-Type charset and no <meta charset>. With the threshold actually applied:

  • Pure-ASCII pages: unaffected (short-circuit to ascii).
  • Multibyte encodings (UTF-8, win-1251, Shift-JIS, GBK): still detected on any real-length body (confidence well above 0.90).
  • The only recurring change is short single-byte (e.g. latin-1) bodies now returning None instead of a ~0.2–0.7 guess — and that is neutral, because sqlmap then falls back to DEFAULT_PAGE_ENCODING = "iso-8859-1", which is byte-identical. Page-comparison verdicts are unchanged.

So the change suppresses exactly the low-confidence over-guessing the original comment was written to prevent, with no detection regression.

Open question for the maintainer

Since this line has effectively run at 0.20 for years, there are two valid resolutions and Iʼm happy to switch to whichever you prefer:

  1. (this PR) correct the target so the intended 0.90 takes effect, or
  2. simply remove the dead line and codify the 0.20 that has shipped in practice.

Defaulted to (1) since it matches the codeʼs documented intent, but happy to change to (2).

dirtyPatches() set MINIMUM_THRESHOLD as a module-level attribute on
thirdparty.chardet.universaldetector, but the value is read as a class
attribute (UniversalDetector.MINIMUM_THRESHOLD, used as self.MINIMUM_THRESHOLD
in get_confidence()). The module-level assignment was therefore a no-op and
the effective threshold stayed at the 0.20 default instead of the intended
0.90, so low-confidence charset guesses on binary/ambiguous response data were
accepted rather than rejected.

Assign the class attribute so the patch takes effect. Verified: with the fix
an ambiguous 24-byte sample now resolves to encoding=None (was a spurious
0.45-confidence 'IBM855'/Russian guess). Regenerated the sha256sums.txt entry
for the modified file; smoke test passes.
@stamparm stamparm merged commit 762037e into sqlmapproject:master Jun 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

patching chardet.universaldetector.MINIMUM_THRESHOLD

2 participants