New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Serializer] Fix security issue on CsvEncoder about CSV injection #24508

Merged
merged 1 commit into from Feb 7, 2018

Conversation

@welcoMattic
Contributor

welcoMattic commented Oct 10, 2017

Q A
Branch? master (4.1)
Bug fix? no
New feature? yes
BC breaks? yes
Deprecations? no
Tests pass? yes
License MIT

I read this article about CSV injection and I thought it best to update the CsvEncoder so that it does not generate potentially malicious CSV files by default.

@sroze

This comment has been minimized.

Member

sroze commented Oct 10, 2017

Great initiative 👍 I'd target 3.4.

@javiereguiluz

In addition to prefixing a \t to problematic values, another solution proposed in the comments of the article is to prefix a = before the "..." because that forces Excel to interpret the value as text. Not sure if it's really better.

/**
* @param string $delimiter
* @param string $enclosure
* @param string $escapeChar
* @param string $keySeparator
* @param bool $escapeFormulaValues

This comment has been minimized.

@javiereguiluz

javiereguiluz Oct 10, 2017

Member

$escapeFormulaValues -> $escapeFormulas ?

@@ -173,6 +173,70 @@ public function testEncodeCustomHeaders()
$this->assertEquals($csv, $this->encoder->encode($value, 'csv', $context));
}
public function testEncodeFormulaValues()
{
$this->assertEquals(<<<'CSV'

This comment has been minimized.

@chalasr

chalasr Oct 10, 2017

Member

assertSame should be used as possible

*/
public function __construct($delimiter = ',', $enclosure = '"', $escapeChar = '\\', $keySeparator = '.')
public function __construct($delimiter = ',', $enclosure = '"', $escapeChar = '\\', $keySeparator = '.', $escapeFormulas = true)

This comment has been minimized.

@chalasr

chalasr Oct 10, 2017

Member

Do we want this to be optional? I mean, shouldn't true be the only possible behavior and this be merged as a bugfix in the branch that introduced the encoder? IIRC 3.3 is the lowest actively maintained branch where it exists

This comment has been minimized.

@kaznovac

kaznovac Oct 11, 2017

Contributor

i'd propose $scapeFormulas to be false by default as this can lead potentially undesired behavior (e.g. adding \t before each negative number - in current implementation!)

This comment has been minimized.

@xabbuh

xabbuh Oct 11, 2017

Member

Setting this argument to true by default would be a BC break. The author of the linked article explicitly mentions that you need to decide for each single use case whether or not you can escape formulas.

@@ -27,24 +27,28 @@ class CsvEncoder implements EncoderInterface, DecoderInterface
const ESCAPE_CHAR_KEY = 'csv_escape_char';
const KEY_SEPARATOR_KEY = 'csv_key_separator';
const HEADERS_KEY = 'csv_headers';
const FORMULAS_START_CHARACTERS = array('=', '-', '+', '@');

This comment has been minimized.

@chalasr

chalasr Oct 10, 2017

Member

I don't think this needs to be exposed. So it could be relevant as a private const, that would work as of 4.0 only. If we accept the PR as a bugfix, this would not work since using constants for arrays is possible since PHP 5.6 only, whereas our 3.* branches support PHP 5.5). should be private, or inlined

@xabbuh xabbuh added this to the 4.1 milestone Oct 11, 2017

@welcoMattic

This comment has been minimized.

Contributor

welcoMattic commented Oct 11, 2017

Thanks @chalasr @kaznovac @xabbuh for your comments and review.
I just push fixes. I'm agree to set $escapeFormulas at false by default, after all negative numbers doesn't have to be escaped. And it will avoid BC Break.

Should I replace all assertEquals by assertSame in CsvEncoderTest class?

@chalasr chalasr removed the BC Break label Oct 11, 2017

@chalasr

This comment has been minimized.

Member

chalasr commented Oct 11, 2017

@welcoMattic Nope, that would create merge conflicts when merging lower branches up to master. Thanks.

@xabbuh

This comment has been minimized.

Member

xabbuh commented Oct 11, 2017

Should we also allow to pass this option through the context so that you do not necessary have to configure this option globally?

@welcoMattic

This comment has been minimized.

Contributor

welcoMattic commented Oct 11, 2017

@xabbuh why not. We should also make the context $escapeFormulas value overrides the global $escapeFormulas value. It could be set to true globally, but for one particular encoding it could be set to false.

@chalasr

This comment has been minimized.

Member

chalasr commented Oct 11, 2017

👍 to allow overriding this option in the context.

@welcoMattic

This comment has been minimized.

Contributor

welcoMattic commented Oct 11, 2017

@chalasr @xabbuh Overriding is now possible

@dunglas

This comment has been minimized.

Member

dunglas commented Oct 12, 2017

Great!

But as stated in the article, it's definitely not a security issue on our side. The current implementation respects the RFC (Excel and Google Sheets don't), and most software consuming CSV expect the current behavior.
It should made be clear in the docs PR that this option should only be used when the generated CSV will be imported in spreadsheets or it can cause interoperability issues.

@welcoMattic

This comment has been minimized.

Contributor

welcoMattic commented Oct 12, 2017

Thanks @dunglas!

For documentation, should I create a new page under Serializer section? Named CsvEncoder, and where I explain why this options exists?

@ostrolucky

This comment has been minimized.

Contributor

ostrolucky commented Oct 16, 2017

Alternative approach might be to just add a normalizer. That would help with deserialization of such sanitized csvs too and doesn't pollute __construct arguments.

I don't like having this in core anyway though, as it's security issue in Excel and Google sheets only and this is destructive kind of escaping

} else {
$result[$parentKey.$key] = $value;
if ($escapeFormulas && \in_array(mb_substr($value, 0, 1), $this->formulasStartCharacters, true)) {

This comment has been minimized.

@ostrolucky

ostrolucky Oct 16, 2017

Contributor

What's the reason for using mb version? UTF is not used in formulasStartCharacters.

@fabpot

This comment has been minimized.

Member

fabpot commented Dec 1, 2017

Not sure if we need to support that in core. /cc @symfony/mergers

@Tobion

This comment has been minimized.

Member

Tobion commented Dec 10, 2017

👍 to include that as it's security feature that people should be aware about.

@dbu

dbu approved these changes Dec 20, 2017

i think in 95% of the use cases, the serializer will be used to export data, which might include user provided input.

there is only a problem if this component is used to generate sheets with formula. i would add an explaination in the phpdoc to explain what this flag is about and say to set it to true when you don't intend to have formula in the csv you generate.

i am no merger, but +1 on this. it is a relevant security topic and obviously excel/libreoffice/google and whoever else is affected decided they rather not properly fix it, so its up to generators of csv files to ensure security.

i think this should be against 3.3 as that is the branch for security fixes.

@dunglas

This comment has been minimized.

Member

dunglas commented Dec 20, 2017

👍 to support this feature too. But as it's definitely not a security fix (on our side), it's a new feature that should be merged in 4.1.

there is only a problem if this component is used to generate sheets with formula.

There is also a problem when the value start with '=', '-', '+', '@' and the parser supports correctly the RFC: a \t will be added, but it should not (actually, when using this flag, it's not really a true CSV file anymore).

@welcoMattic

This comment has been minimized.

Contributor

welcoMattic commented Dec 20, 2017

Thanks @dbu @dunglas for your comments. I think that we can support this "fix" and explain it explicitly in PHPDoc.

Should I update the PR now, or we need to wait other opinions?

@dbu

This comment has been minimized.

Contributor

dbu commented Dec 20, 2017

i'd update the changelog and phpdoc as that will be relevant either way. i hope one of the maintainers can decide which version this should go. i can see the point of kevin, and its a good reason why its not on by default. but i feel it still is security relevant, even if its not a bug of symfony, but symfony helping around lax security of other tools - and as such should go into 3.3 or 3.4.

@dunglas

This comment has been minimized.

Member

dunglas commented Jan 11, 2018

I think that you can update and rebase the PR @welcoMattic

@@ -11,6 +11,7 @@ CHANGELOG
* added an optional `string $format = null` argument to `AbstractNormalizer::instantiateObject`
* added an optional `array $context = array()` to `Serializer::supportsNormalization`, `Serializer::supportsDenormalization`,
`Serializer::supportsEncoding` and `Serializer::supportsDecoding`
* added optional `bool $escapeFormulas = false` argument to `CsvEncoder::__construct`

This comment has been minimized.

@nicolas-grekas

nicolas-grekas Feb 4, 2018

Member

should be moved to a new 4.1.0 section
also please mind @ostrolucky's comment below

@welcoMattic

This comment has been minimized.

Contributor

welcoMattic commented Feb 4, 2018

@nicolas-grekas Done, I also fix @ostrolucky's comment.

@nicolas-grekas

This comment has been minimized.

Member

nicolas-grekas commented Feb 4, 2018

@welcoMattic thanks. Can you rebase to get rid of the merge commit? I invite you to squash while doing so.

public function __construct(string $delimiter = ',', string $enclosure = '"', string $escapeChar = '\\', string $keySeparator = '.')
public function __construct(string $delimiter = ',', string $enclosure = '"', string $escapeChar = '\\', string $keySeparator = '.', $escapeFormulas = false)

This comment has been minimized.

@Tobion

Tobion Feb 6, 2018

Member

missing bool type declaration for the new argument

@@ -172,13 +176,17 @@ public function supportsDecoding($format)
/**
* Flattens an array and generates keys including the path.
*/
private function flatten(array $array, array &$result, string $keySeparator, string $parentKey = '')
private function flatten(array $array, array &$result, string $keySeparator, string $parentKey = '', $escapeFormulas = false)

This comment has been minimized.

@Tobion

Tobion Feb 6, 2018

Member

same bool missing

@welcoMattic

This comment has been minimized.

Contributor

welcoMattic commented Feb 6, 2018

Thanks for review @Tobion 👍

@fabpot

fabpot approved these changes Feb 7, 2018

@fabpot

This comment has been minimized.

Member

fabpot commented Feb 7, 2018

Thank you @welcoMattic.

@fabpot fabpot merged commit a1b0bdb into symfony:master Feb 7, 2018

3 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
fabbot.io Your code looks good.
Details

fabpot added a commit that referenced this pull request Feb 7, 2018

feature #24508 [Serializer] Fix security issue on CsvEncoder about CS…
…V injection (welcoMattic)

This PR was merged into the 4.1-dev branch.

Discussion
----------

[Serializer] Fix security issue on CsvEncoder about CSV injection

| Q             | A
| ------------- | ---
| Branch?       | master (4.1)
| Bug fix?      | no
| New feature?  | yes
| BC breaks?    | yes
| Deprecations? | no
| Tests pass?   | yes
| License       | MIT

I read [this article](http://georgemauer.net/2017/10/07/csv-injection.html) about CSV injection and I thought it best to update the `CsvEncoder` so that it does not generate potentially malicious CSV files by default.

Commits
-------

a1b0bdb Fix security issue on CsvEncoder

@fabpot fabpot referenced this pull request May 7, 2018

Merged

Release v4.1.0-BETA1 #27181

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment