-
Notifications
You must be signed in to change notification settings - Fork 7.9k
Allow empty $escape to eschew escaping CSV #3515
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The polyfill for the Reader class uses a RFC418 Parser Iterator which is used only if the escape character is the empty string and on a PHP version where fgetcsv does not support the empty escape string see php/php-src#3515 for more info on when the patch will land on PHP master trunk and when it will be available
The polyfill for the Reader class uses a RFC418 Parser Iterator which is used only if the escape character is the empty string and on a PHP version where fgetcsv does not support the empty escape string see php/php-src#3515 for more info on when the patch will land on PHP master trunk and when it will be available
The polyfill for the Reader class uses a RFC418 Parser Iterator which is used only if the escape character is the empty string and on a PHP version where fgetcsv does not support the empty escape string see php/php-src#3515 for more info on when the patch will land on PHP master trunk and when it will be available
The polyfill for the Reader class uses a RFC418 Parser Iterator which is used only if the escape character is the empty string and on a PHP version where fgetcsv does not support the empty escape string see php/php-src#3515 for more info on when the patch will land on PHP master trunk and when it will be available
Comment on behalf of carusogabriel at php.net: Labelling |
Albeit CSV is still a widespread data exchange format, it has never been officially standardized. There exists, however, the “informational” RFC 4180[1] which has no notion of escape characters, but rather defines `escaped` as strings enclosed in double-quotes where contained double-quotes have to be doubled. While this concept is supported by PHP's implementation (`$enclosure`), the `$escape` sometimes interferes, so that `fgetcsv()` is unable to correctly parse externally generated CSV, and `fputcsv()` is sometimes generating non-compliant CSV. Since PHP's `$escape` concept is availble for many years, we cannot drop it for BC reasons (even though many consider it as bug). Instead we allow to pass an empty string as `$escape` parameter to the respective functions, which results in ignoring/omitting any escaping, and as such is more inline with RFC 4180. It is noteworthy that this is almost no userland BC break, since formerly most functions did not accept an empty string, and failed in this case. The only exception was `str_getcsv()` which did accept an empty string, and used a backslash as escape character then (which appears to be unintended behavior, anyway). The changed functions are `fputcsv()`, `fgetcsv()` and `str_getcsv()`, and also the `::setCsvControl()`, `::getCsvControl()`, `::fputcsv()`, and `::fgetcsv()` methods of `SplFileObject`. The implementation also changes the type of the escape parameter of the PHP_APIs `php_fgetcsv()` and `php_fputcsv()` from `char` to `int`, where `PHP_CSV_NO_ESCAPE` means to ignore/omit escaping. The parameter accepts the same values as `isalpha()` and friends, i.e. “the value of which shall be representable as an `unsigned char` or shall equal the value of the macro `EOF`. If the argument has any other value, the behavior is undefined.” This is a subtle BC break, since the character `chr(128)` has the value `-1` if `char` is signed, and so likely would be confused with `EOF` when converted to `int`. We consider this BC break to be acceptable, since it's rather unlikely that anybody uses `chr(128)` as escape character, and it easily can be fixed by casting all `escape` arguments to `unsigned char`. This patch implements the feature requests 38301[2] and 51496[3]. [1] <https://tools.ietf.org/html/rfc4180> [2] <https://bugs.php.net/bug.php?id=38301> [3] <https://bugs.php.net/bug.php?id=51496>
12beae6
to
39487e5
Compare
Rebased, and introduced the macro |
If there are no objections, I'm going to merge this on the next weekend. |
Applied as 3b0f051. PS: Docs: http://svn.php.net/viewvc?view=revision&revision=346346. |
Cf. <php/php-src#3515>. git-svn-id: http://svn.php.net/repository/phpdoc/en@346346 c90b9560-bf6c-de11-be94-00142212c4b1
Cf. <php/php-src#3515>. git-svn-id: https://svn.php.net/repository/phpdoc/en/trunk@346346 c90b9560-bf6c-de11-be94-00142212c4b1
From PHP 7.4, the recommended way to disable the escape character for fgetcsv() and fputcsv() is an empty string, instead of "\0". Discussed here: php/php-src#3515
From PHP 7.4, the recommended way to disable the escape character for fgetcsv() and fputcsv() is an empty string, instead of "\0". Discussed here: php/php-src#3515
Cf. <php/php-src#3515>. git-svn-id: https://svn.php.net/repository/phpdoc/en/trunk@346346 c90b9560-bf6c-de11-be94-00142212c4b1
Cf. <php/php-src#3515>. git-svn-id: https://svn.php.net/repository/phpdoc/en/trunk@346346 c90b9560-bf6c-de11-be94-00142212c4b1
The polyfill for the Reader class uses a RFC418 Parser Iterator which is used only if the escape character is the empty string and on a PHP version where fgetcsv does not support the empty escape string see php/php-src#3515 for more info on when the patch will land on PHP master trunk and when it will be available
The polyfill for the Reader class uses a RFC418 Parser Iterator which is used only if the escape character is the empty string and on a PHP version where fgetcsv does not support the empty escape string see php/php-src#3515 for more info on when the patch will land on PHP master trunk and when it will be available
Albeit CSV is still a widespread data exchange format, it has never been
officially standardized. There exists, however, the “informational” RFC
4180[1] which has no notion of escape characters, but rather defines
escaped
as strings enclosed in double-quotes where containeddouble-quotes have to be doubled. While this concept is supported by
PHP's implementation (
$enclosure
), the$escape
sometimes interferes,so that
fgetcsv()
is unable to correctly parse externally generatedCSV, and
fputcsv()
is sometimes generating non-compliant CSV. SincePHP's
$escape
concept is availble for many years, we cannot drop itfor BC reasons (even though many consider it as bug). Instead we allow
to pass an empty string as
$escape
parameter to the respectivefunctions, which results in ignoring/omitting any escaping, and as such
is more inline with RFC 4180. It is noteworthy that this is almost no
userland BC break, since formerly most functions did not accept an empty
string, and failed in this case. The only exception was
str_getcsv()
which did accept an empty string, and used a backslash as escape
character then (which appears to be unintended behavior, anyway).
The changed functions are
fputcsv()
,fgetcsv()
andstr_getcsv()
,and also the
::setCsvControl()
,::getCsvControl()
,::fputcsv()
,and
::fgetcsv()
methods ofSplFileObject
.The implementation also changes the type of the escape parameter of the
PHP_APIs
php_fgetcsv()
andphp_fputcsv()
fromchar
toint
, whereEOF
means to ignore/omit escaping. The parameter accepts the samevalues as
isalpha()
and friends, i.e. “the value of which shall berepresentable as an
unsigned char
or shall equal the value of themacro
EOF
. If the argument has any other value, the behavior isundefined.” This is a subtle BC break, since the character
chr(128)
has the value
-1
ifchar
is signed, and so likely would be confusedwith
EOF
when converted toint
. We consider this BC break to beacceptable, since it's rather unlikely that anybody uses
chr(128)
asescape character, and it easily can be fixed by casting all
escape
arguments to
unsigned char
.This patch implements the feature requests 38301[2] and 51496[3].
[1] https://tools.ietf.org/html/rfc4180
[2] https://bugs.php.net/bug.php?id=38301
[3] https://bugs.php.net/bug.php?id=51496