Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

getLibXmlLoaderOptions() globally affects external XML reading (e.g. Soap) #74

Closed
amcsi opened this issue Jan 13, 2017 · 11 comments
Closed

Comments

@amcsi
Copy link

amcsi commented Jan 13, 2017

It appears that when getLibXmlLoaderOptions() is called, it globally affects external xml reading.
This is bad, because just reading excel files with this library has a global side effect, and it breaks Soap.

I created a PR with a failing test: #73

@amcsi
Copy link
Author

amcsi commented Jan 13, 2017

FYI it's because of these lines:
https://github.com/PHPOffice/PhpSpreadsheet/blob/develop/src/PhpSpreadsheet/Settings.php#L346
https://github.com/PHPOffice/PhpSpreadsheet/blob/develop/src/PhpSpreadsheet/Settings.php#L363

For anyone that's looking for a workaround for your SOAP calls or other external xml reading, call libxml_disable_entity_loader(false) before instantiating SoapClient or other things that involve reading external xml

@carlosafonso
Copy link

Can confirm. Our Symfony app breaks after using PhpSpreadsheet because it fails to properly read XML files.

@inspiran
Copy link

@amcsi thx for mentioning a work around, works fine.

@PowerKiKi
Copy link
Member

It would be appreciated it someone can come up with a proper fix that is covered by unit tests.

Without digging into it I suppose something like remembering previous global settings and re-set them afterwards might be the way to go. What do you think ?

@agopaul
Copy link
Contributor

agopaul commented Mar 7, 2017

I did a bit or research on this.

Apparently, the incriminated piece of code calls the libxml_disable_entity_loader() function which is global and affects all libxml-bases libraries, so basically, every other library/code that works with XML is impacted by this. The function call was added 3 years ago for security reasons (CVE-2014-2054).

It seems that that call should be left in the codebase, so maybe the best workaround would be to make an XML reader "factory" for each XML library (we have all of them here: SimpleXML, DOMDocument, and XMLReader) that wraps the code required to open the XML file. Here is an example taken from the last link:

$oldValue = libxml_disable_entity_loader(true);
$dom = new DOMDocument();
$dom->loadXML($xml);
libxml_disable_entity_loader($oldValue);

Settings::getLibXmlLoaderOptions() has 40 occurrences in total, so this would be a fairly big change.

Edit: Just wanted to clarify. The change wouldn't be that big per se, but of the four classes involved, only one is covered by unit tests. I'm not sure if this is something that could be changed with confidence without having all the necessary tests in place.

@PowerKiKi
Copy link
Member

Unfortunately unit tests are lacking and there isn't much we can do except to add new one progressively.

Your suggestion sounds good to me as long as we are sure that those options only have effect when loading the document, and not later when using the objects. If that would be the case, that would mean we would have to keep "our" options until the end of usage of the reader/writer, which may be hard to do. Is there are a to confirm or infirm that ?

Also on a side note and slightly out of scope, but I wouldn't mind getting rid of SimpleXML in favor of DOMDocument. If it's not too much work, not too much risky, and could simplify our code base (future factories), then that could/should be done.

@amcsi
Copy link
Author

amcsi commented Mar 8, 2017

@agopaul Would you really have to unit test every usage? I would think that you could just create a wrapper class around DOMDocument/SimpleXML, use and unit test those. You already assume that DOMDocument and SimpleXML work as expected due to being external classes, so if you just created a wrapper, you could get away with just testing the differences.

@agopaul
Copy link
Contributor

agopaul commented Mar 8, 2017

Theoretically, that is true, but there is a marginal risk of breaking something due to unexpected behavior of the XML libraries involved.

Also, I'm not 100% sure that a wrapper would do the job: like @PowerKiKi said, we don't know if the options are only used while loading the XML or if they are also used in other parts of the libraries.

Maybe I'll do some testing on the weekend.

In the meantime, if this is breaking your application, a quick workaround would be using the libxml_disable_entity_loader() function after you've used PhpSpreadsheet and re-enable the entity loader.

@agopaul
Copy link
Contributor

agopaul commented Mar 11, 2017

Coincidently I ran into an error due to this very issue just yesterday when a test suite on a project that I'm working on was failing unexpectedly and sure enough, it was because of this function calls.

Anyway, the whole point of disabling the external entities loader via libxml_disable_entity_loader() is to prevent XXE/XEE attack vectors.

Digging inside the codebase I've found that @MarkBaker actually fixed the XXE/XEE vulnerability in another way after the commit that added the calls to the libxml_disable_entity_loader() function: PHPOffice/PHPExcel@0ab614f

So there are 2 fixes for the same problem.

A few considerations on @MarkBaker's solution:

  • It's thread safe, whereas libxml_disable_entity_loader() is not: https://bugs.php.net/bug.php?id=64938
  • It doesn't touch libxml global configuration so it doesn't conflict with other libraries that interact with libxml
  • It's more aggressive since libxml_disable_entity_loader() just disables the external entity loading whereas this implementation just scans for entities and throws an Exception if one is found. That, of course, is because (I assume) none of the XML files parsed by PhpSpreadsheet should use XML entities.

I looked up how the Zend Framework team fixed this and they created a library that parses XML safely and they adopted both approaches (libxml_disable_entity_loader() and Mark's). The lib uses one or the other depending on the thread safety of the current PHP context.

Links:

In other words, I think it's safe to say that these libxml_disable_entity_loader() calls inside the Settings class can just be removed.

I think this is probably a safe change and we wouldn't need to have a complete test coverage of the readers to do it. I can work on a PR if needed.

@PowerKiKi
Copy link
Member

PowerKiKi commented Mar 12, 2017

Thanks for your in-depth analysis. If I understand things correctly we can remove libxml_disable_entity_loader() if and only if every single loaded XML is checked with BaseReader::securityScanFile() beforehand. It seems it was done by Mark, but it would be best to double-check. If you can do so, then I'll merge your PR.

agopaul added a commit to redokun/PhpSpreadsheet that referenced this issue Mar 12, 2017
…not necessary.

Prevent setLibXmlLoaderOptions() and getLibXmlLoaderOptions() to call the libxml_disable_entity_loader() function which alter the global state of libxml.
See the discussion here PHPOffice#74
PowerKiKi pushed a commit that referenced this issue Mar 12, 2017
Settings: deleted libxml_disable_entity_loader() calls since they're not necessary.
Prevent setLibXmlLoaderOptions() and getLibXmlLoaderOptions() to call the libxml_disable_entity_loader() function which alter the global state of libxml.
See the discussion here #74
@PowerKiKi
Copy link
Member

Closed via #113

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

5 participants