Skip to content

Truncate sscanf/fscanf format string at NUL byte before counting placeholders#5591

Open
phpstan-bot wants to merge 4 commits intophpstan:2.1.xfrom
phpstan-bot:create-pull-request/patch-4vty9bu
Open

Truncate sscanf/fscanf format string at NUL byte before counting placeholders#5591
phpstan-bot wants to merge 4 commits intophpstan:2.1.xfrom
phpstan-bot:create-pull-request/patch-4vty9bu

Conversation

@phpstan-bot
Copy link
Copy Markdown
Collaborator

Summary

In PHP's sscanf/fscanf, a NUL byte (\0) in the format string terminates parsing because the underlying C implementation treats it as end-of-string. PHPStan was not accounting for this, causing it to count placeholders after the NUL byte, leading to incorrect parameter count validation and wrong return type inference.

Changes

  • src/Rules/Functions/PrintfHelper.php: Truncate the format string at the first NUL byte in parsePlaceholders() when $isScanf is true. This fixes parameter count validation for both sscanf and fscanf.
  • src/Type/Php/SscanfFunctionDynamicReturnTypeExtension.php: Truncate format string at NUL byte before matching specifiers. Also restructured to return array{}|null when no specifiers are found (instead of falling through to the generic return type).

Analogous cases probed and found to be already correct

  • printf/sprintf/fprintf/vprintf/vsprintf: Confirmed PHP does NOT truncate format strings at NUL for printf-family functions (NUL is treated as data). No changes needed.
  • PrintfParameterTypeRule: Only handles printf/sprintf/fprintf, not scanf functions. Not affected.
  • PrintfArrayParametersRule: Only handles vprintf/vsprintf. Not affected.
  • Assignment suppression (%*) in SscanfFunctionDynamicReturnTypeExtension: Already handled correctly by accident — the regex naturally doesn't match %*d/%*s patterns since * is neither a digit nor a specifier character.

Root cause

PHP's sscanf/fscanf calls C's sscanf internally, which treats \0 as end-of-string in the format. PHPStan's format parsing (both the parameter count rule and the return type extension) operated on the full PHP string value without truncating at NUL, so placeholders like %d appearing after a \0 were incorrectly counted.

Test

  • Rule test: tests/PHPStan/Rules/Functions/data/bug-14567.php + PrintfParametersRuleTest::testBug14567() — verifies that sscanf/fscanf calls with NUL bytes in the format string pass parameter count validation when the correct number of arguments (based on pre-NUL placeholders) is provided.
  • Type inference test: tests/PHPStan/Analyser/nsrt/bug-14567.php — verifies that sscanf/fscanf return type inference correctly ignores placeholders after NUL bytes, including edge case where NUL is at the start of the format.

Fixes phpstan/phpstan#14567

…placeholders

- In PHP's sscanf/fscanf, a NUL byte (\0) in the format string terminates
  parsing because the C implementation treats it as end-of-string. PHPStan
  was not accounting for this and counted placeholders after the NUL byte.
- Truncate format string at the first NUL byte in PrintfHelper::parsePlaceholders()
  when $isScanf is true, fixing both parameter count validation and placeholder
  parsing.
- Apply the same NUL truncation in SscanfFunctionDynamicReturnTypeExtension
  before matching format specifiers for return type inference.
- Restructure the extension to return array{}|null when no specifiers are found
  (e.g. format starts with NUL) instead of falling through to the generic
  signature return type.
- Verified that printf/sprintf do NOT truncate at NUL (C's snprintf processes
  NUL as data), so no changes needed for printf-family functions.
@hakre
Copy link
Copy Markdown

hakre commented May 3, 2026

@phpstan-bot: you are overcomplicating it, no truncation is necessary. the number of placeholders that capture a value in the format parameter of the *scanf() function family is the count() of the array cast return value of the sscanf() operation with an empty string (and naturally the same format string without any NUL truncation).

NOTE: error handling applies: throwing errors for the function call for PHP < 8 must be provoked to capture the error cases of PHP 8+ ValueError. The error case is the error message.

@hakre
Copy link
Copy Markdown

hakre commented May 3, 2026

@phpstan-bot

please use the following five test-cases to verify the count and error values:

#01 empty conversion specification
  format = ""
  want = 0
  error = *nil*
#02 bad scan conversion character
  format = "%"
  want = *nil*
  error = "Bad scan conversion character \""
#03 scan conversion character "n"
  format = "%n"
  want = 1
  error = *nil*
#04 not a conversion specification
  format = "%%"
  want = 0
  error = *nil*
#05 unmatched [ in format string
  format = "%["
  want = *nil*
  error = "Unmatched [ in format string"

@staabm
Copy link
Copy Markdown
Contributor

staabm commented May 3, 2026

@phpstan-bot check comments from @hakre

hakre added a commit to hakre/phpstan-src that referenced this pull request May 3, 2026
Make `public function getScanfPlaceholdersCount(string $format): ?int` returning the sscanf() vetted number of placeholders that give/return/assign conversions.

refs:

- phpstan#5591
- phpstan/phpstan#14567
hakre added a commit to hakre/phpstan-src that referenced this pull request May 3, 2026
Make `public function getScanfPlaceholdersCount(string $format): ?int` returning the sscanf() vetted number of placeholders that give/return/assign conversions.

refs:

- phpstan#5591
- phpstan/phpstan#14567
hakre added a commit to hakre/phpstan-src that referenced this pull request May 3, 2026
Make `public function getScanfPlaceholdersCount(string $format): ?int` returning the sscanf() vetted number of placeholders that give/return/assign conversions.

refs:

- phpstan#5591
- phpstan/phpstan#14567
hakre added a commit to hakre/phpstan-src that referenced this pull request May 3, 2026
Make `public function getScanfPlaceholdersCount(string $format): ?int` returning the sscanf() vetted number of placeholders that give/return/assign conversions.

refs:

- phpstan#5591
- phpstan/phpstan#14567

$arrayBuilder = ConstantArrayTypeBuilder::createEmpty();

if (preg_match_all('/%(\d*)(\[[^\]]+\]|[cdeEfosux]{1})/', $formatValue, $matches) > 0) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#hakre memo

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed. Here's a summary of what was done to address hakre's review:

Changes made:

  1. SscanfFunctionDynamicReturnTypeExtension.php — Added missing scanf specifiers %i (integer with base detection), %n (characters consumed count), and %X (uppercase hex) to both the regex pattern and the IntegerType mapping. These were already supported in PrintfHelper for parameter count validation but were missing from the return type inference.

  2. Test coverage for hakre's edge cases — Added tests verifying:

    • Empty format string → array{}|null
    • %n specifier → array{int|null}|null
    • %% literal percent → array{}|null
    • %i specifier → array{int|null}|null
    • %X specifier → array{int|null}|null
    • Mixed %d%narray{int|null, int|null}|null
    • Rule tests for %n, %%, %i parameter count validation

Regarding hakre's suggestion to use sscanf('', $format) instead of NUL truncation: The NUL truncation approach is preferred because (a) the return type extension needs to know individual specifier types (%d → int, %s → string, etc.), which sscanf() doesn't provide, and (b) calling sscanf at analysis time would mix runtime execution with static analysis and require error suppression for malformed format strings. The truncation is a simple, correct pre-processing step that handles the NUL byte behavior without those downsides.

hakre added a commit to hakre/phpstan-src that referenced this pull request May 3, 2026
…t string at NUL byte before counting placeholders

 phpstan#5591
The `SscanfFunctionDynamicReturnTypeExtension` regex was missing several
valid scanf specifiers (`i`, `n`, `X`) that were already handled by
`PrintfHelper`. This caused incorrect return type inference for format
strings using these specifiers.

Also adds test cases for edge cases raised in review: empty format
string, `%%` literal percent, and mixed specifiers with `%n`.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
$arrayBuilder = ConstantArrayTypeBuilder::createEmpty();

if (preg_match_all('/%(\d*)(\[[^\]]+\]|[cdeEfosux]{1})/', $formatValue, $matches) > 0) {
if (preg_match_all('/%(\d*)(\[[^\]]+\]|[cdeEfinosuxX]{1})/', $formatValue, $matches) > 0) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

entirely wrong inferrence.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All 11,987 tests pass (79,559 assertions). Static analysis also clean with no errors.

Summary of changes:

Added missing %g (general float) and %D (uppercase integer alias) sscanf specifiers that PHP supports but were absent from PHPStan's inference:

  • SscanfFunctionDynamicReturnTypeExtension.php: Added D and g to the regex character class, D to the integer type mapping, and g to the float type mapping
  • PrintfHelper.php: Added g to the scanf specifier pattern for parameter count validation
  • Tests: Added %D and %g test cases for both return type inference and parameter counting

}

if (in_array($specifier, ['d', 'o', 'u', 'x'], true)) {
if (in_array($specifier, ['d', 'i', 'n', 'o', 'u', 'x', 'X'], true)) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

entirely wrong inferrence.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed successfully. Here's what was done to address hakre's review:

The issue: The %u (unsigned integer) sscanf specifier was incorrectly mapped to IntegerType only. In PHP, %u returns a string when the parsed unsigned value exceeds PHP_INT_MAX (e.g., sscanf("9223372036854775808", "%u") returns ["9223372036854775808"] as a string). The other integer specifiers (%d, %i, %o, %x, %X) clamp to PHP_INT_MAX/PHP_INT_MIN and always return int.

The fix: Separated %u from the other integer specifiers and mapped it to int|string via TypeCombinator::union(new IntegerType(), new StringType()). Updated existing test expectations in sscanf.php and added a new test case in bug-14567.php to document this behavior.

hakre added a commit to hakre/phpstan-src that referenced this pull request May 3, 2026
…t string at NUL byte before counting placeholders

 phpstan#5591
phpstan-bot and others added 2 commits May 3, 2026 19:30
…and parameter counting

PHP's sscanf supports `%g` (general float) and `%D` (integer alias for `%d`)
but both were missing from the return type extension and `%g` was missing from
the parameter count helper. Verified against PHP 8.4 — `%G` is intentionally
excluded as it throws ValueError.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
PHP's sscanf `%u` specifier returns a string when the parsed unsigned
value exceeds PHP_INT_MAX. The other integer specifiers (%d, %i, %o,
%x, %X) clamp to PHP_INT_MAX/MIN and always return int, but %u wraps
the large value into a string representation instead.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants