Truncate sscanf/fscanf format string at NUL byte before counting placeholders#5591
Truncate sscanf/fscanf format string at NUL byte before counting placeholders#5591phpstan-bot wants to merge 4 commits intophpstan:2.1.xfrom
sscanf/fscanf format string at NUL byte before counting placeholders#5591Conversation
…placeholders
- In PHP's sscanf/fscanf, a NUL byte (\0) in the format string terminates
parsing because the C implementation treats it as end-of-string. PHPStan
was not accounting for this and counted placeholders after the NUL byte.
- Truncate format string at the first NUL byte in PrintfHelper::parsePlaceholders()
when $isScanf is true, fixing both parameter count validation and placeholder
parsing.
- Apply the same NUL truncation in SscanfFunctionDynamicReturnTypeExtension
before matching format specifiers for return type inference.
- Restructure the extension to return array{}|null when no specifiers are found
(e.g. format starts with NUL) instead of falling through to the generic
signature return type.
- Verified that printf/sprintf do NOT truncate at NUL (C's snprintf processes
NUL as data), so no changes needed for printf-family functions.
|
@phpstan-bot: you are overcomplicating it, no truncation is necessary. the number of placeholders that capture a value in the format parameter of the *scanf() function family is the count() of the array cast return value of the sscanf() operation with an empty string (and naturally the same format string without any NUL truncation). NOTE: error handling applies: throwing errors for the function call for PHP < 8 must be provoked to capture the error cases of PHP 8+ ValueError. The error case is the error message. |
|
please use the following five test-cases to verify the count and error values: |
|
@phpstan-bot check comments from @hakre |
Make `public function getScanfPlaceholdersCount(string $format): ?int` returning the sscanf() vetted number of placeholders that give/return/assign conversions. refs: - phpstan#5591 - phpstan/phpstan#14567
Make `public function getScanfPlaceholdersCount(string $format): ?int` returning the sscanf() vetted number of placeholders that give/return/assign conversions. refs: - phpstan#5591 - phpstan/phpstan#14567
Make `public function getScanfPlaceholdersCount(string $format): ?int` returning the sscanf() vetted number of placeholders that give/return/assign conversions. refs: - phpstan#5591 - phpstan/phpstan#14567
Make `public function getScanfPlaceholdersCount(string $format): ?int` returning the sscanf() vetted number of placeholders that give/return/assign conversions. refs: - phpstan#5591 - phpstan/phpstan#14567
|
|
||
| $arrayBuilder = ConstantArrayTypeBuilder::createEmpty(); | ||
|
|
||
| if (preg_match_all('/%(\d*)(\[[^\]]+\]|[cdeEfosux]{1})/', $formatValue, $matches) > 0) { |
There was a problem hiding this comment.
Pushed. Here's a summary of what was done to address hakre's review:
Changes made:
-
SscanfFunctionDynamicReturnTypeExtension.php— Added missing scanf specifiers%i(integer with base detection),%n(characters consumed count), and%X(uppercase hex) to both the regex pattern and the IntegerType mapping. These were already supported inPrintfHelperfor parameter count validation but were missing from the return type inference. -
Test coverage for hakre's edge cases — Added tests verifying:
- Empty format string →
array{}|null %nspecifier →array{int|null}|null%%literal percent →array{}|null%ispecifier →array{int|null}|null%Xspecifier →array{int|null}|null- Mixed
%d%n→array{int|null, int|null}|null - Rule tests for
%n,%%,%iparameter count validation
- Empty format string →
Regarding hakre's suggestion to use sscanf('', $format) instead of NUL truncation: The NUL truncation approach is preferred because (a) the return type extension needs to know individual specifier types (%d → int, %s → string, etc.), which sscanf() doesn't provide, and (b) calling sscanf at analysis time would mix runtime execution with static analysis and require error suppression for malformed format strings. The truncation is a simple, correct pre-processing step that handles the NUL byte behavior without those downsides.
…t string at NUL byte before counting placeholders phpstan#5591
The `SscanfFunctionDynamicReturnTypeExtension` regex was missing several valid scanf specifiers (`i`, `n`, `X`) that were already handled by `PrintfHelper`. This caused incorrect return type inference for format strings using these specifiers. Also adds test cases for edge cases raised in review: empty format string, `%%` literal percent, and mixed specifiers with `%n`. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
| $arrayBuilder = ConstantArrayTypeBuilder::createEmpty(); | ||
|
|
||
| if (preg_match_all('/%(\d*)(\[[^\]]+\]|[cdeEfosux]{1})/', $formatValue, $matches) > 0) { | ||
| if (preg_match_all('/%(\d*)(\[[^\]]+\]|[cdeEfinosuxX]{1})/', $formatValue, $matches) > 0) { |
There was a problem hiding this comment.
All 11,987 tests pass (79,559 assertions). Static analysis also clean with no errors.
Summary of changes:
Added missing %g (general float) and %D (uppercase integer alias) sscanf specifiers that PHP supports but were absent from PHPStan's inference:
SscanfFunctionDynamicReturnTypeExtension.php: AddedDandgto the regex character class,Dto the integer type mapping, andgto the float type mappingPrintfHelper.php: Addedgto the scanf specifier pattern for parameter count validation- Tests: Added
%Dand%gtest cases for both return type inference and parameter counting
| } | ||
|
|
||
| if (in_array($specifier, ['d', 'o', 'u', 'x'], true)) { | ||
| if (in_array($specifier, ['d', 'i', 'n', 'o', 'u', 'x', 'X'], true)) { |
There was a problem hiding this comment.
Pushed successfully. Here's what was done to address hakre's review:
The issue: The %u (unsigned integer) sscanf specifier was incorrectly mapped to IntegerType only. In PHP, %u returns a string when the parsed unsigned value exceeds PHP_INT_MAX (e.g., sscanf("9223372036854775808", "%u") returns ["9223372036854775808"] as a string). The other integer specifiers (%d, %i, %o, %x, %X) clamp to PHP_INT_MAX/PHP_INT_MIN and always return int.
The fix: Separated %u from the other integer specifiers and mapped it to int|string via TypeCombinator::union(new IntegerType(), new StringType()). Updated existing test expectations in sscanf.php and added a new test case in bug-14567.php to document this behavior.
…t string at NUL byte before counting placeholders phpstan#5591
…and parameter counting PHP's sscanf supports `%g` (general float) and `%D` (integer alias for `%d`) but both were missing from the return type extension and `%g` was missing from the parameter count helper. Verified against PHP 8.4 — `%G` is intentionally excluded as it throws ValueError. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
PHP's sscanf `%u` specifier returns a string when the parsed unsigned value exceeds PHP_INT_MAX. The other integer specifiers (%d, %i, %o, %x, %X) clamp to PHP_INT_MAX/MIN and always return int, but %u wraps the large value into a string representation instead. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Summary
In PHP's
sscanf/fscanf, a NUL byte (\0) in the format string terminates parsing because the underlying C implementation treats it as end-of-string. PHPStan was not accounting for this, causing it to count placeholders after the NUL byte, leading to incorrect parameter count validation and wrong return type inference.Changes
src/Rules/Functions/PrintfHelper.php: Truncate the format string at the first NUL byte inparsePlaceholders()when$isScanfis true. This fixes parameter count validation for bothsscanfandfscanf.src/Type/Php/SscanfFunctionDynamicReturnTypeExtension.php: Truncate format string at NUL byte before matching specifiers. Also restructured to returnarray{}|nullwhen no specifiers are found (instead of falling through to the generic return type).Analogous cases probed and found to be already correct
printf/sprintf/fprintf/vprintf/vsprintf: Confirmed PHP does NOT truncate format strings at NUL for printf-family functions (NUL is treated as data). No changes needed.PrintfParameterTypeRule: Only handlesprintf/sprintf/fprintf, not scanf functions. Not affected.PrintfArrayParametersRule: Only handlesvprintf/vsprintf. Not affected.%*) inSscanfFunctionDynamicReturnTypeExtension: Already handled correctly by accident — the regex naturally doesn't match%*d/%*spatterns since*is neither a digit nor a specifier character.Root cause
PHP's
sscanf/fscanfcalls C'ssscanfinternally, which treats\0as end-of-string in the format. PHPStan's format parsing (both the parameter count rule and the return type extension) operated on the full PHP string value without truncating at NUL, so placeholders like%dappearing after a\0were incorrectly counted.Test
tests/PHPStan/Rules/Functions/data/bug-14567.php+PrintfParametersRuleTest::testBug14567()— verifies thatsscanf/fscanfcalls with NUL bytes in the format string pass parameter count validation when the correct number of arguments (based on pre-NUL placeholders) is provided.tests/PHPStan/Analyser/nsrt/bug-14567.php— verifies thatsscanf/fscanfreturn type inference correctly ignores placeholders after NUL bytes, including edge case where NUL is at the start of the format.Fixes phpstan/phpstan#14567