Bug #63380: Use PHP's allocator for libxml #223

Closed
wants to merge 1 commit into
from

Conversation

Projects
None yet
4 participants
@tstarling
Contributor

tstarling commented Oct 29, 2012

Allocation via libxml does not use PHP's per-request allocator. So any memory used by libxml will not be accounted against memory_get_usage() or memory_limit.

At Wikimedia we use libxml DOM trees to store wikitext parse trees, because they are more compact in memory than the equivalent pure-PHP data structures. When these parse trees are cached, the memory requirements can become excessive, and the memory is typically not returned to the system after request termination. Using xmlMemSetup() to set hook functions which use PHP's per-request allocation functions will allow us to more effectively monitor and limit the use of libxml in production.

The task is somewhat complicated by the fact that libxml does not use a different hook for persistent and request-local allocations. So we use the persistent allocator during MINIT, and take extra care to ensure all libxml globals are initialised during MINIT, rather than lazy-initialised on the first call to the relevant module.

The typical consequences of the initialisation of a global pointer during request execution would be a dangling pointer after deactivate, which is exploitable for memory corruption during subsequent requests. I used gdb's "info variables" command to audit all global variables in libxml, as configured in the Ubuntu package.

I tested the code with make test, and with server-tests.php against Apache with "worker" MPM. I also tested it with and without LIBXML_THREAD_ALLOC_ENABLED. There was a bug in libxml2 which prevented it from working with LIBXML_THREAD_ALLOC_ENABLED. I submitted a patch:

https://bugzilla.gnome.org/show_bug.cgi?id=687084

That configuration option appears to be rarely used.

Bug #63380: Use PHP's allocator for libxml
The task is somewhat complicated by the fact that libxml does not use a
different hook for persistent and request-local allocations. So we use
the persistent allocator during MINIT, and take extra care to ensure
all libxml globals are initialised during MINIT, rather than
lazy-initialised on the first call to the relevant module.

The typical consequences of the initialisation of a global pointer
during request execution would be a dangling pointer after deactivate,
which is exploitable for memory corruption during subsequent requests. I
used gdb's "info variables" command to audit all global variables in
libxml, as configured in the Ubuntu package.

I tested the code with make test, and with server-tests.php
against Apache with "worker" MPM. I also tested it with and without
LIBXML_THREAD_ALLOC_ENABLED. There was a bug in libxml2 which prevented
it from working with LIBXML_THREAD_ALLOC_ENABLED. I submitted a patch:

https://bugzilla.gnome.org/show_bug.cgi?id=687084

That configuration option appears to be rarely used.
@lstrojny

This comment has been minimized.

Show comment Hide comment
@lstrojny

lstrojny Nov 4, 2012

Contributor

Have you benchmarked how XML parsing/constructing performance is affected with and without the patch as I would guess that the custom allocators might slow down things a little bit?
Besides that: as your patch depends on a newer (not yet released version?) of libxml, we should definitly adjust the configure check to take that into account.

Contributor

lstrojny commented Nov 4, 2012

Have you benchmarked how XML parsing/constructing performance is affected with and without the patch as I would guess that the custom allocators might slow down things a little bit?
Besides that: as your patch depends on a newer (not yet released version?) of libxml, we should definitly adjust the configure check to take that into account.

@weltling

This comment has been minimized.

Show comment Hide comment
@weltling

weltling Nov 26, 2012

Contributor

Just tested your patch, looks good on linux. On windows these three tests fail for me:

ext\dom\tests\dom005.phpt
ext\xmlwriter\tests\004.phpt
ext\xmlwriter\tests\OO_004.phpt

I think it should be tested more stresful. Though there is no need to initialize HTTP and FTP clients as they're not used in the XML extensions.

@lstrojny The needed infos will be contained (or missing) in the xmlversion.h header, so configure must not to be changed.

Contributor

weltling commented Nov 26, 2012

Just tested your patch, looks good on linux. On windows these three tests fail for me:

ext\dom\tests\dom005.phpt
ext\xmlwriter\tests\004.phpt
ext\xmlwriter\tests\OO_004.phpt

I think it should be tested more stresful. Though there is no need to initialize HTTP and FTP clients as they're not used in the XML extensions.

@lstrojny The needed infos will be contained (or missing) in the xmlversion.h header, so configure must not to be changed.

@lstrojny

This comment has been minimized.

Show comment Hide comment
@lstrojny

lstrojny Jan 14, 2013

Contributor

@tstarling ping

Contributor

lstrojny commented Jan 14, 2013

@tstarling ping

@lstrojny

This comment has been minimized.

Show comment Hide comment
@lstrojny

lstrojny Jan 19, 2013

Contributor

@tstarling ping again

Contributor

lstrojny commented Jan 19, 2013

@tstarling ping again

@tstarling

This comment has been minimized.

Show comment Hide comment
@tstarling

tstarling Jan 21, 2013

Contributor

lstrojny: no I haven't benchmarked it. I haven't had time to work on this recently. Based on comments on the bug report, I expect this will be a configure option, off by default. That makes the performance characteristics less critical.

It doesn't require a new version of libxml, it just needs libxml to be configured without --with-thread-alloc. There's no obvious reason anyone would want to use --with-thread-alloc, and it's not enabled in the distro packages I have checked.

Contributor

tstarling commented Jan 21, 2013

lstrojny: no I haven't benchmarked it. I haven't had time to work on this recently. Based on comments on the bug report, I expect this will be a configure option, off by default. That makes the performance characteristics less critical.

It doesn't require a new version of libxml, it just needs libxml to be configured without --with-thread-alloc. There's no obvious reason anyone would want to use --with-thread-alloc, and it's not enabled in the distro packages I have checked.

@smalyshev

This comment has been minimized.

Show comment Hide comment
@smalyshev

smalyshev Aug 25, 2013

Contributor

While I like the idea, sounds like a lot of complications for the code that would not work in most environments due to libxml not compiled with right options.

Contributor

smalyshev commented Aug 25, 2013

While I like the idea, sounds like a lot of complications for the code that would not work in most environments due to libxml not compiled with right options.

@tstarling tstarling closed this Nov 8, 2013

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment