New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement thread-local memory adjustments #300

Open
wants to merge 21 commits into
base: master
from

Conversation

Projects
None yet
4 participants
@gagern
Contributor

gagern commented Apr 7, 2015

If the current thread is not the main V8 event thread, we don't report memory adjustments back to V8 immediately, but instead keep track of all the changes in a thread-local variable. The net change will be reported back after the worker is done.

All state changes are encapsulated in two RAII sentinels: WorkerParent and WorkerSentinel. The former is constructed and destroyed on the V8 thread, the latter on the worker thread. Having these in place helps to make client code thread-safe with respect to the state in question. It also provides a convenient extension point: if, at some point, we need more code for handling worker threads, all we have to do is extend these two classes.

The current delegation to the libxml2 memory management functions has ben completely dropped. The main use case for these would have been memory debugging, so they do a lot of work (like maintaining linked lists between allocated blocks) that we don't need. The main thing we could have gained from them, namely the call to xmlMemUsed, was not thread-safe in any case.

With these adjustments, node-libxslt works with this code here.

gagern added some commits Apr 7, 2015

Implement thread-local memory adjustments.
If the current thread is not the main V8 event thread, we don't report
memory adjustments back to V8 immediately, but instead keep track of all the
changes in a thread-local variable.  The net change will be reported back
after the worker is done.

All state changes are encapsulated in two RAII sentinels: WorkerParent and
WorkerSentinel.  The former is constructed and destroyed on the V8 thread,
the latter on the worker thread.  Having these in place helps to make client
code thread-safe with respect to the state in question.  It also provides a
convenient extension point: if, at some point, we need more code for
handling worker threads, all we have to do is extend these two classes.

The current delegation to the libxml2 memory management functions has ben
completely dropped.  The main use case for these would have been memory
debugging, so they do a lot of work (like maintaining linked lists between
allocated blocks) that we don't need.  The main thing we could have gained
from them, namely the call to xmlMemUsed, was not thread-safe in any case.
Complain when allocating without V8 instance.
This helps ensure that a worker will actually use the WorkerParent and
WorkerSentinel machinery.  So far, any memory handling on a worker which
failed to do so would fall under the clause where there is no current
instance.  That might cause memory adjustments to go unnoticed.  So it's
better to force users to update their code, to properly set up workers.
@gagern

This comment has been minimized.

Show comment
Hide comment
@gagern

gagern Apr 7, 2015

Contributor

Travis complains, since apparently TLS support in libuv is a more recent thing. To be more precise, joyent/libuv#904 added this for libuv v0.11.11. So for older versions, we'll have to duplicate some of that code, I fear. Except if we can somehow use C++11 and its thread_local storage class specifier?

Contributor

gagern commented Apr 7, 2015

Travis complains, since apparently TLS support in libuv is a more recent thing. To be more precise, joyent/libuv#904 added this for libuv v0.11.11. So for older versions, we'll have to duplicate some of that code, I fear. Except if we can somehow use C++11 and its thread_local storage class specifier?

Backport TLS support from libuv.
This should help us get things running even with node versions prior to 0.12.
@gagern

This comment has been minimized.

Show comment
Hide comment
@gagern

gagern Apr 7, 2015

Contributor

OK, Travis is still complaining, but now only for ancient Node 0.8, and I don't see my code being responsible for this. Are you willing to merge my approach?

Contributor

gagern commented Apr 7, 2015

OK, Travis is still complaining, but now only for ancient Node 0.8, and I don't see my code being responsible for this. Are you willing to merge my approach?

gagern added some commits Apr 7, 2015

Enable threading.
This should be the equivalent of ./configure --with-threads.
The --with-thread-alloc option wasn't present so far, but we leave that off.
Add test to verify that threading support is enabled.
This adds a list of features to the libxmljs object, which may be useful for
other tests as well in the future.
@defunctzombie

This comment has been minimized.

Show comment
Hide comment
@defunctzombie

defunctzombie Apr 7, 2015

Can we do this in NaN instead?

defunctzombie commented on 55a6861 Apr 7, 2015

Can we do this in NaN instead?

This comment has been minimized.

Show comment
Hide comment
@gagern

gagern Apr 7, 2015

Owner

I'm not a Nan author, and I fear that adding one mor element in the chain might mean this will take even longer to get fixed. I'd say there is nothing wrong with asking Nan people whether they take of this, but I'm not sure whether I'd wait on a decision there.

Owner

gagern replied Apr 7, 2015

I'm not a Nan author, and I fear that adding one mor element in the chain might mean this will take even longer to get fixed. I'd say there is nothing wrong with asking Nan people whether they take of this, but I'm not sure whether I'd wait on a decision there.

This comment has been minimized.

Show comment
Hide comment
@gagern

gagern Apr 7, 2015

Owner

I've filed nodejs/nan#306 asking whether TLS would be in scope for Nan.

Owner

gagern replied Apr 7, 2015

I've filed nodejs/nan#306 asking whether TLS would be in scope for Nan.

@defunctzombie

This comment has been minimized.

Show comment
Hide comment
@defunctzombie

defunctzombie Apr 7, 2015

We should be consistent with the feature naming I think. We either need to make them all uppercase or all lowercase. You did lowercase but then LZMA is uppercase at the end. If we are consistent, then we can document the field as being a simple mapping from XML_WITH_X -> x

defunctzombie commented on src/libxmljs.cc in 52c5eba Apr 7, 2015

We should be consistent with the feature naming I think. We either need to make them all uppercase or all lowercase. You did lowercase but then LZMA is uppercase at the end. If we are consistent, then we can document the field as being a simple mapping from XML_WITH_X -> x

This comment has been minimized.

Show comment
Hide comment
@defunctzombie

defunctzombie Apr 7, 2015

On that note, you could also make a macro to avoid some of the duplication if you wished.

defunctzombie replied Apr 7, 2015

On that note, you could also make a macro to avoid some of the duplication if you wished.

@defunctzombie

This comment has been minimized.

Show comment
Hide comment
@defunctzombie

defunctzombie Apr 7, 2015

Collaborator

This should work in node 0.8 and you should investigate why that is failing because master works in 0.8 on travis last I checked.

Collaborator

defunctzombie commented Apr 7, 2015

This should work in node 0.8 and you should investigate why that is failing because master works in 0.8 on travis last I checked.

@gagern

This comment has been minimized.

Show comment
Hide comment
@gagern

gagern Apr 7, 2015

Contributor

Emphasis on “last I checked”. #301 fails despite the change being just one whitespace. See also this Stack Overflow question for details.

Contributor

gagern commented Apr 7, 2015

Emphasis on “last I checked”. #301 fails despite the change being just one whitespace. See also this Stack Overflow question for details.

Consistent upper-case for features list
This allows us to use a simple macro for each of these.
Show outdated Hide outdated src/libxmljs.cc Outdated
Use memcpy instead of strcpy.
This avoids the cost of determining the string length twice, and allows
copying multiple characters at a time without looking at them.
@gagern

This comment has been minimized.

Show comment
Hide comment
@gagern

gagern Apr 8, 2015

Contributor

The Nan people have accepted nodejs/nan#307 so we can drop uvcompat.h once Nan 1.8.0 gets released. Until then we could of course depend on a specific github commit, but that feels somewhat unclean. While nathan7 continues in #302 to work towards an async-based solution, I still feel that this approach here will be more flexible solution to #296 in the long run.

Contributor

gagern commented Apr 8, 2015

The Nan people have accepted nodejs/nan#307 so we can drop uvcompat.h once Nan 1.8.0 gets released. Until then we could of course depend on a specific github commit, but that feels somewhat unclean. While nathan7 continues in #302 to work towards an async-based solution, I still feel that this approach here will be more flexible solution to #296 in the long run.

@defunctzombie

This comment has been minimized.

Show comment
Hide comment
@defunctzombie

defunctzombie Apr 8, 2015

Collaborator

@kkoopa can you release a new nan?

Collaborator

defunctzombie commented Apr 8, 2015

@kkoopa can you release a new nan?

@defunctzombie

This comment has been minimized.

Show comment
Hide comment
@defunctzombie

defunctzombie Apr 8, 2015

Collaborator

Is this PR different than what @302 fixes?

Collaborator

defunctzombie commented Apr 8, 2015

Is this PR different than what @302 fixes?

@gagern

This comment has been minimized.

Show comment
Hide comment
@gagern

gagern Apr 8, 2015

Contributor

Both fix #296, so both have the same aim. This one here requires the cooperation of whoever sets up worker threads, but on the other hand avoids locking and signaling overhead. I guess it's time to write some demo app which can be used to benchmark the two approaches. Perhaps by simply extending libxmljs with some async parser function, and running that in a loop over some big XML files? I'll have a look at that soon.

Contributor

gagern commented Apr 8, 2015

Both fix #296, so both have the same aim. This one here requires the cooperation of whoever sets up worker threads, but on the other hand avoids locking and signaling overhead. I guess it's time to write some demo app which can be used to benchmark the two approaches. Perhaps by simply extending libxmljs with some async parser function, and running that in a loop over some big XML files? I'll have a look at that soon.

Show outdated Hide outdated src/libxmljs.cc Outdated
@kkoopa

This comment has been minimized.

Show comment
Hide comment
@kkoopa

kkoopa Apr 23, 2015

Contributor

NAN 1.8.0 is now out with this addition.

Contributor

kkoopa commented Apr 23, 2015

NAN 1.8.0 is now out with this addition.

gagern added some commits Apr 23, 2015

Adapt NanObjectWrapHandle to Nan 1.8.0
Since Nan commit 0bc6d599, NanObjectWrapHandle is no longer a macro but a
function, and no longer expects a pointer but a reference as its argument.
So dereference all those pointers to conform with that change.
@gagern

This comment has been minimized.

Show comment
Hide comment
@gagern

gagern Apr 23, 2015

Contributor

@kkoopa thanks for the info. nodejs/nan@0bc6d59 had caused me some grief, since it changes the argument of NanObjectWrapHandle from pointer to reference. Apart from that, everything looks good so far. Let's see what Travis thinks of this.

Contributor

gagern commented Apr 23, 2015

@kkoopa thanks for the info. nodejs/nan@0bc6d59 had caused me some grief, since it changes the argument of NanObjectWrapHandle from pointer to reference. Apart from that, everything looks good so far. Let's see what Travis thinks of this.

@kkoopa

This comment has been minimized.

Show comment
Hide comment
@kkoopa

kkoopa Apr 23, 2015

Contributor

Oh, that was unforeseen. It should have been a 1:1 replacement. I can change it to take a pointer instead.

Contributor

kkoopa commented Apr 23, 2015

Oh, that was unforeseen. It should have been a 1:1 replacement. I can change it to take a pointer instead.

@gagern

This comment has been minimized.

Show comment
Hide comment
@gagern

gagern Apr 23, 2015

Contributor

If you do, you break compatibility with the released 1.8.0. But you can add an overloaded version which accepts a pointer, and release that in 1.8.1 (hopefully very soon). If you do, I might revert my last commit on this branch here, and others won't experience similar breakage.

Contributor

gagern commented Apr 23, 2015

If you do, you break compatibility with the released 1.8.0. But you can add an overloaded version which accepts a pointer, and release that in 1.8.1 (hopefully very soon). If you do, I might revert my last commit on this branch here, and others won't experience similar breakage.

@kkoopa

This comment has been minimized.

Show comment
Hide comment
@kkoopa

kkoopa Apr 23, 2015

Contributor

It was never intended to change the argument type, just to replace the macro with a function. 1.8.0 is incorrect in this regard. Having to dereference the pointers in that way is not convenient.

Contributor

kkoopa commented Apr 23, 2015

It was never intended to change the argument type, just to replace the macro with a function. 1.8.0 is incorrect in this regard. Having to dereference the pointers in that way is not convenient.

@kkoopa

This comment has been minimized.

Show comment
Hide comment
@kkoopa

kkoopa Apr 23, 2015

Contributor

There, 1.8.1 is out.

Contributor

kkoopa commented Apr 23, 2015

There, 1.8.1 is out.

@defunctzombie

This comment has been minimized.

Show comment
Hide comment
@defunctzombie

defunctzombie Apr 27, 2015

Collaborator

let me know when this PR is using latest nan and ready for review

Collaborator

defunctzombie commented Apr 27, 2015

let me know when this PR is using latest nan and ready for review

gagern added some commits Apr 29, 2015

Update to Nan 1.8.4
This undoes the NanObjectWrapHandle change from the previous commit,
since the corresponding change in NaN was accidential.
Make use of offsetof to compute header size
The old code was based on how libxml itself does similar things in its
memory debugging code.  The new code should be easier to understand.
@gagern

This comment has been minimized.

Show comment
Hide comment
@gagern

gagern Apr 29, 2015

Contributor

@nathan7: Do you have a idea how we could detect the presence or absence of malloc_usable_size in a gyp environment? I like the idea a lot, but I don't feel well hardcoding its presence depending on OS or whatever, simply because I can't test on all possible platforms.

Contributor

gagern commented Apr 29, 2015

@nathan7: Do you have a idea how we could detect the presence or absence of malloc_usable_size in a gyp environment? I like the idea a lot, but I don't feel well hardcoding its presence depending on OS or whatever, simply because I can't test on all possible platforms.

@kkoopa

This comment has been minimized.

Show comment
Hide comment
@kkoopa

kkoopa Apr 29, 2015

Contributor

You should never use malloc_usable_size, it is a nonstandard extension, not available everywhere, viz. Windows. It does have the same functionality in _msize, but that is beside the point.

It will return the amount of memory actually allocated by the malloc implementation (assuming that the pointer was allocated by malloc in the first place), not the amount of memory requested. For example, if you request 33 bytes it may actually allocate, say, 64 bytes, which is what malloc_usable_size would report. Nothing is specified, except that its usage will cause immense problems.

Contributor

kkoopa commented Apr 29, 2015

You should never use malloc_usable_size, it is a nonstandard extension, not available everywhere, viz. Windows. It does have the same functionality in _msize, but that is beside the point.

It will return the amount of memory actually allocated by the malloc implementation (assuming that the pointer was allocated by malloc in the first place), not the amount of memory requested. For example, if you request 33 bytes it may actually allocate, say, 64 bytes, which is what malloc_usable_size would report. Nothing is specified, except that its usage will cause immense problems.

@gagern

This comment has been minimized.

Show comment
Hide comment
@gagern

gagern Apr 29, 2015

Contributor

I know malloc_usable_size is non-standard, which is the reason why I wanted to auto-detect its availability. The fact that the size it reports is different from the size requested shouldn't bother me: as long as all memory adjustments are based solely on the numbers reported this way, the result will still be sane. But if you'd rather avoid using this (for the 8 bytes overhead it saves per allocated bock), then that's fine with me, too. In that case, I think the branch is ready for review now.

Contributor

gagern commented Apr 29, 2015

I know malloc_usable_size is non-standard, which is the reason why I wanted to auto-detect its availability. The fact that the size it reports is different from the size requested shouldn't bother me: as long as all memory adjustments are based solely on the numbers reported this way, the result will still be sane. But if you'd rather avoid using this (for the 8 bytes overhead it saves per allocated bock), then that's fine with me, too. In that case, I think the branch is ready for review now.

@kkoopa

This comment has been minimized.

Show comment
Hide comment
@kkoopa

kkoopa Apr 29, 2015

Contributor

I don't think it can be reliably detected. For example, malloc_usable_size is available with GCC when linking against glibc, but not on Android. Then there's Windows, then clang (which I guess is compatible with gcc in this aspect). Custom malloc implementations (Valgrind, debug builds) might cause problems as well.

Finally, it might not even save 8 bytes, if it happens to drown in the alignment padding. That is, suppose alignment is at 32 bytes, and the actual data excluding the size_t takes 52. 52 + 8 = 60, giving 64 bytes allocated, which would also be allocated for 52 requested bytes.

Contributor

kkoopa commented Apr 29, 2015

I don't think it can be reliably detected. For example, malloc_usable_size is available with GCC when linking against glibc, but not on Android. Then there's Windows, then clang (which I guess is compatible with gcc in this aspect). Custom malloc implementations (Valgrind, debug builds) might cause problems as well.

Finally, it might not even save 8 bytes, if it happens to drown in the alignment padding. That is, suppose alignment is at 32 bytes, and the actual data excluding the size_t takes 52. 52 + 8 = 60, giving 64 bytes allocated, which would also be allocated for 52 requested bytes.

@edef1c

This comment has been minimized.

Show comment
Hide comment
@edef1c

edef1c May 5, 2015

@gagern I don't think we can do an autoconf-style feature test for this in gyp, so… let's stick with the header approach.

edef1c commented May 5, 2015

@gagern I don't think we can do an autoconf-style feature test for this in gyp, so… let's stick with the header approach.

gagern added some commits Jun 6, 2015

Enable threading support for real
Without either HAVE_PTHREAD_H or HAVE_WIN32_THREADS, most parts of
threading support remain non-functional, so we need one of these.
This is disregarding BeOS, which should be no problem in practice.
Quick and dirty hack for Document.fromXmlAsync
The implementation has no proper error handling yet.
It is only intended for performance benchmarks.
Error reporting from asynchroneous operations
We now have a mechanism where errors are cloned, stored to a vector,
and only turned into a v8 Array back on the main thread.

We now use RAII to manage calls to xmlSetStructuredErrorFunc both for the
synchroneous and the asynchroneous case, setting the function up in the
constructor and resetting it in the destructor.

Instead of calling the push function for the arrays, we simply set elements
by index.  This simplifies code a lot, and should help performance as well.
Call xmlMemSetup in WorkerSentinel
Since memory management functions are thread-local, we need to re-establish
them for every newly created worker thread.
@gagern

This comment has been minimized.

Show comment
Hide comment
@gagern

gagern Jun 6, 2015

Contributor

Since thread-local setup of memory handlers is difficult in #302, I've adapted my work there to this branch here as well. By now we have an asynchroneous doc reader along with a couple of test cases for it. So far everything seems to work nicely. I'd consider it ready for review.

Contributor

gagern commented Jun 6, 2015

Since thread-local setup of memory handlers is difficult in #302, I've adapted my work there to this branch here as well. By now we have an asynchroneous doc reader along with a couple of test cases for it. So far everything seems to work nicely. I'd consider it ready for review.

Merge branch 'features' into threadLocalMemAdjust
That branch originally came from this one here, so there are only redundant
changes.  Having things merged ensures that we don't see merge conflicts
when we touch that code again later on.

gagern added some commits Jul 9, 2015

Use xmlMalloc and xmlFree, not xmlMemMalloc and xmlMemFree
xmlMemMalloc and xmlMemFree are debug implementations, while xmlMalloc and
xmlFree are the (thread-local) handlers configured using xmlMemSetup.
xmlMemStrdup on the other hand is correct, since xmlStrdup deals in xmlChar*
not char* and the debug implementation is called xmlMemoryStrdup.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment