Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow empty $escape to eschew escaping CSV #3515

Closed
wants to merge 1 commit into from

Conversation

3 participants
@cmb69
Copy link
Contributor

commented Sep 13, 2018

Albeit CSV is still a widespread data exchange format, it has never been
officially standardized. There exists, however, the “informational” RFC
4180[1] which has no notion of escape characters, but rather defines
escaped as strings enclosed in double-quotes where contained
double-quotes have to be doubled. While this concept is supported by
PHP's implementation ($enclosure), the $escape sometimes interferes,
so that fgetcsv() is unable to correctly parse externally generated
CSV, and fputcsv() is sometimes generating non-compliant CSV. Since
PHP's $escape concept is availble for many years, we cannot drop it
for BC reasons (even though many consider it as bug). Instead we allow
to pass an empty string as $escape parameter to the respective
functions, which results in ignoring/omitting any escaping, and as such
is more inline with RFC 4180. It is noteworthy that this is almost no
userland BC break, since formerly most functions did not accept an empty
string, and failed in this case. The only exception was str_getcsv()
which did accept an empty string, and used a backslash as escape
character then (which appears to be unintended behavior, anyway).

The changed functions are fputcsv(), fgetcsv() and str_getcsv(),
and also the ::setCsvControl(), ::getCsvControl(), ::fputcsv(),
and ::fgetcsv() methods of SplFileObject.

The implementation also changes the type of the escape parameter of the
PHP_APIs php_fgetcsv() and php_fputcsv() from char to int, where
EOF means to ignore/omit escaping. The parameter accepts the same
values as isalpha() and friends, i.e. “the value of which shall be
representable as an unsigned char or shall equal the value of the
macro EOF. If the argument has any other value, the behavior is
undefined.” This is a subtle BC break, since the character chr(128)
has the value -1 if char is signed, and so likely would be confused
with EOF when converted to int. We consider this BC break to be
acceptable, since it's rather unlikely that anybody uses chr(128) as
escape character, and it easily can be fixed by casting all escape
arguments to unsigned char.

This patch implements the feature requests 38301[2] and 51496[3].

[1] https://tools.ietf.org/html/rfc4180
[2] https://bugs.php.net/bug.php?id=38301
[3] https://bugs.php.net/bug.php?id=51496

nyamsprod added a commit to thephpleague/csv that referenced this pull request Sep 14, 2018

Adding a polyfill for the Reader class
The polyfill for the Reader class uses a RFC418 Parser Iterator
which is used only if the escape character is the empty string and
on a PHP version where fgetcsv does not support the empty escape string

see php/php-src#3515 for more info on when
the patch will land on PHP master trunk and when it will be available

nyamsprod added a commit to thephpleague/csv that referenced this pull request Sep 14, 2018

Adding a polyfill for the Reader class
The polyfill for the Reader class uses a RFC418 Parser Iterator
which is used only if the escape character is the empty string and
on a PHP version where fgetcsv does not support the empty escape string

see php/php-src#3515 for more info on when
the patch will land on PHP master trunk and when it will be available

nyamsprod added a commit to thephpleague/csv that referenced this pull request Sep 14, 2018

Adding a polyfill for the Reader class
The polyfill for the Reader class uses a RFC418 Parser Iterator
which is used only if the escape character is the empty string and
on a PHP version where fgetcsv does not support the empty escape string

see php/php-src#3515 for more info on when
the patch will land on PHP master trunk and when it will be available

nyamsprod added a commit to thephpleague/csv that referenced this pull request Sep 14, 2018

Adding a polyfill for the Reader class
The polyfill for the Reader class uses a RFC418 Parser Iterator
which is used only if the escape character is the empty string and
on a PHP version where fgetcsv does not support the empty escape string

see php/php-src#3515 for more info on when
the patch will land on PHP master trunk and when it will be available
@php-pulls

This comment has been minimized.

Copy link

commented Sep 16, 2018

Comment on behalf of carusogabriel at php.net:

Labelling

Show resolved Hide resolved ext/spl/spl_directory.c Outdated

@cmb69 cmb69 changed the title Allow empty $escape to eschew escaping CSV [RFC] Allow empty $escape to eschew escaping CSV Sep 27, 2018

Allow empty $escape to eschew escaping CSV
Albeit CSV is still a widespread data exchange format, it has never been
officially standardized.  There exists, however, the “informational” RFC
4180[1] which has no notion of escape characters, but rather defines
`escaped` as strings enclosed in double-quotes where contained
double-quotes have to be doubled.  While this concept is supported by
PHP's implementation (`$enclosure`), the `$escape` sometimes interferes,
so that `fgetcsv()` is unable to correctly parse externally generated
CSV, and `fputcsv()` is sometimes generating non-compliant CSV.  Since
PHP's `$escape` concept is availble for many years, we cannot drop it
for BC reasons (even though many consider it as bug).  Instead we allow
to pass an empty string as `$escape` parameter to the respective
functions, which results in ignoring/omitting any escaping, and as such
is more inline with RFC 4180.  It is noteworthy that this is almost no
userland BC break, since formerly most functions did not accept an empty
string, and failed in this case.  The only exception was `str_getcsv()`
which did accept an empty string, and used a backslash as escape
character then (which appears to be unintended behavior, anyway).

The changed functions are `fputcsv()`, `fgetcsv()` and `str_getcsv()`,
and also the `::setCsvControl()`, `::getCsvControl()`, `::fputcsv()`,
and `::fgetcsv()` methods of `SplFileObject`.

The implementation also changes the type of the escape parameter of the
PHP_APIs `php_fgetcsv()` and `php_fputcsv()` from `char` to `int`, where
`PHP_CSV_NO_ESCAPE` means to ignore/omit escaping.  The parameter
accepts the same values as `isalpha()` and friends, i.e. “the value of
which shall be representable as an `unsigned char` or shall equal the
value of the macro `EOF`.  If the argument has any other value, the
behavior is undefined.”  This is a subtle BC break, since the character
`chr(128)` has the value `-1` if `char` is signed, and so likely would
be confused with `EOF` when converted to `int`.  We consider this BC
break to be acceptable, since it's rather unlikely that anybody uses
`chr(128)` as escape character, and it easily can be fixed by casting
all `escape` arguments to `unsigned char`.

This patch implements the feature requests 38301[2] and 51496[3].

[1] <https://tools.ietf.org/html/rfc4180>
[2] <https://bugs.php.net/bug.php?id=38301>
[3] <https://bugs.php.net/bug.php?id=51496>

@cmb69 cmb69 force-pushed the cmb69:csv-empty-escape branch from 12beae6 to 39487e5 Dec 2, 2018

@cmb69

This comment has been minimized.

Copy link
Contributor Author

commented Dec 2, 2018

Rebased, and introduced the macro PHP_CSV_NO_ESCAPE to mark no-escaping internally. I have also withdrawn the RFC, since it doesn't seem necessary to have an RFC for this small change (nobody objected to this during the discussion phase.

@cmb69 cmb69 changed the title [RFC] Allow empty $escape to eschew escaping CSV Allow empty $escape to eschew escaping CSV Dec 2, 2018

@cmb69

This comment has been minimized.

Copy link
Contributor Author

commented Dec 12, 2018

If there are no objections, I'm going to merge this on the next weekend.

@cmb69

This comment has been minimized.

Copy link
Contributor Author

commented Dec 15, 2018

@cmb69 cmb69 closed this Dec 15, 2018

@cmb69 cmb69 deleted the cmb69:csv-empty-escape branch Dec 15, 2018

svn2github pushed a commit to svn2github/phpdoc_en that referenced this pull request Dec 15, 2018

salathe pushed a commit to salathe/phpdoc-en that referenced this pull request Dec 15, 2018

Christoph Michael Becker

mauriciofauth pushed a commit to php-doc/en that referenced this pull request Dec 15, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.