Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Serializer][CSV] Add context options to handle BOM #33896

Merged
merged 1 commit into from Oct 13, 2019

Conversation

@malarzm
Copy link
Contributor

commented Oct 7, 2019

Q A
Branch? 4.4
Bug fix? no
New feature? yes
Deprecations? no
Tickets Fix #33684
License MIT
Doc PR symfony/symfony-docs#12461

This allows BOM handling in en/decoded CSV files. To keep current behaviour intact both skipping BOM at the beginning of the CSV and outputting BOM are an opt-in feature.

Personally I'd propose to make SKIP_INPUT_BOM default to false in 5.0 so the BOM is transparent and people that for some reasons expect BOM characters to be present in the parsed text explicitly opt-out of trimming it.

@malarzm malarzm requested a review from dunglas as a code owner Oct 7, 2019
@malarzm malarzm changed the title Add context options to handle BOM [Serializer][CSV] Add context options to handle BOM Oct 7, 2019
@malarzm malarzm force-pushed the malarzm:csv-bom branch from 618b90d to c1d9694 Oct 7, 2019
@nicolas-grekas nicolas-grekas added this to the next milestone Oct 8, 2019
Copy link
Member

left a comment

When decoding a value, we should always deal with the BOM to me, per the robustness principle.
Then, does it really make sense to add an option to prefix the BOM in the output, when the alternative is just a concatention away? I'm not sure...

src/Symfony/Component/Serializer/ByteOrderMark.php Outdated Show resolved Hide resolved
@stof

This comment has been minimized.

Copy link
Member

commented Oct 8, 2019

I agree with @nicolas-grekas that we should always handle BOM in the input.
And I'm not sure we need the ByteOrderMask class.

@malarzm

This comment has been minimized.

Copy link
Contributor Author

commented Oct 8, 2019

@nicolas-grekas so you're proposing we always add BOM to output and handle it in input (if present)?

@nicolas-grekas

This comment has been minimized.

Copy link
Member

commented Oct 8, 2019

I'm proposing to never add the BOM in the output, and always handle it in the input.

@malarzm

This comment has been minimized.

Copy link
Contributor Author

commented Oct 8, 2019

I'm 👎 for never adding BOM in the output as then I'm as a consumer responsible for checking whether data is serialized to CSV and if so, add BOM to it.

@stof

This comment has been minimized.

Copy link
Member

commented Oct 8, 2019

@malarzm as @nicolas-grekas said, you can still concatenate the BOM before the serialized data.

@malarzm

This comment has been minimized.

Copy link
Contributor Author

commented Oct 8, 2019

@nicolas-grekas @stof I'm not sure when exactly you want to manually concatenate BOM, may I ask for an example how do you see it?

@malarzm malarzm force-pushed the malarzm:csv-bom branch 2 times, most recently from ebe7fd3 to f6b1405 Oct 8, 2019
@malarzm

This comment has been minimized.

Copy link
Contributor Author

commented Oct 8, 2019

@nicolas-grekas I've removed the ByteOrderMask class keeping only UTF-8 BOM in the encoder itself, the BOM is always stripped while decoding and, for now, I've left a bool switch to include BOM in encoded string as I believe this is encoder's responsibility.

@dunglas
dunglas approved these changes Oct 9, 2019
@malarzm malarzm force-pushed the malarzm:csv-bom branch from f6b1405 to 424da8b Oct 9, 2019
@malarzm malarzm force-pushed the malarzm:csv-bom branch 3 times, most recently from 93602d0 to 35d8e61 Oct 9, 2019
@fabpot
fabpot approved these changes Oct 13, 2019
Copy link
Member

left a comment

Small typo

@@ -5,6 +5,7 @@ CHANGELOG
-----

* deprecated the `XmlEncoder::TYPE_CASE_ATTRIBUTES` constant, use `XmlEncoder::TYPE_CAST_ATTRIBUTES` instead
* added option to output an UTF-8 BOM in CSV encoder via `CsvEncoder::OUTPUT_UTF8_BOM_KEY` context option

This comment has been minimized.

Copy link
@fabpot

fabpot Oct 13, 2019

Member

a UTF-8

@@ -114,6 +119,14 @@ public function encode($data, $format, array $context = [])
$value = stream_get_contents($handle);
fclose($handle);
if ($outputBom) {
if (!preg_match('//u', $value)) {
throw new UnexpectedValueException('You are trying to add an UTF-8 BOM to a non UTF-8 text.');

This comment has been minimized.

Copy link
@fabpot

fabpot Oct 13, 2019

Member

a UTF-8

@malarzm malarzm force-pushed the malarzm:csv-bom branch from 35d8e61 to 3eb3668 Oct 13, 2019
@malarzm

This comment has been minimized.

Copy link
Contributor Author

commented Oct 13, 2019

@fabpot thanks for noticing, fixed :)

@fabpot

This comment has been minimized.

Copy link
Member

commented Oct 13, 2019

Thank you @malarzm.

fabpot added a commit that referenced this pull request Oct 13, 2019
…alarzm)

This PR was merged into the 4.4 branch.

Discussion
----------

[Serializer][CSV] Add context options to handle BOM

| Q             | A
| ------------- | ---
| Branch?       | 4.4
| Bug fix?      | no
| New feature?  | yes
| Deprecations? | no
| Tickets       | Fix #33684
| License       | MIT
| Doc PR        | symfony/symfony-docs#12461

This allows BOM handling in en/decoded CSV files. To keep current behaviour intact both skipping BOM at the beginning of the CSV and outputting BOM are an opt-in feature.

Personally I'd propose to make `SKIP_INPUT_BOM` default to `false` in 5.0 so the BOM is transparent and people that for some reasons expect BOM characters to be present in the parsed text explicitly opt-out of trimming it.

Commits
-------

3eb3668 Add context options to handle BOM
@fabpot fabpot merged commit 3eb3668 into symfony:4.4 Oct 13, 2019
1 of 3 checks passed
1 of 3 checks passed
continuous-integration/appveyor/pr AppVeyor build failed
Details
continuous-integration/travis-ci/pr The Travis CI build failed
Details
fabbot.io Your code looks good.
Details
@malarzm malarzm deleted the malarzm:csv-bom branch Oct 13, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants
You can’t perform that action at this time.