Prevent SEGFAULT of XmlNodes after they are detached from the tree #163

Closed
wants to merge 1 commit into
from

Conversation

Projects
None yet
8 participants
Contributor

skabbes commented Sep 5, 2012

This commit introduced a bug which allows detached XmlNodes to be double free'd. I've done some research, and this commit is definitely necessary since xmlUnlinkNode does not free that memory, and neither does xmlFreeDoc. However, we need a mechanism to prevent this behavior, since it is very difficult for end users to debug segfaults (no stack traces provided by Node).

Repro Steps are laid out in the included test case, but basically go like this:

  1. Access children of node a (so that those nodes are being reference tracked by v8)
  2. Remove node a from the dom tree (which also remove all children nodes, including attributes)
  3. Trigger a garbage collection after all those variables are no longer in scope
  4. Calling xmlFreeNode in a's destructor recursively destroys the xml tree of its children (since a->parent == NULL)
  5. Calling xmlFreeNode on a's children cause them to be freed again (since child->parent != NULL)
  6. However, xml_obj->doc->_private is no longer a valid reference to an XmlDocument, so the unref causes a SEGFAULT

Possible Solutions

  • Simply check xml_obj->type == -1, since libxml automatically resets all memory to -1 after it free's it. However, this is very much implementation dependent and doesn't allow us to property unref the parent document (since the doc pointer is also overwritten)
  • Store a pointer to the xmlDoc on the XmlNode object and compare that to xml_obj->doc before calling xmlFreeNode on that node. This assumes that the xmlDoc will never be free'd before any of its children nodes (should be correct). It also means that we need to maintain that pointer for every operation which may change the node's document (addPrevSibling, addNextSibling off the top of my head)
  • Keep track of which nodes have been removed in XmlDocument, and free them explicitly when ~XmlDocument is called. The code would have to be smart enough to remove nodes from this mapping when they are re-added to the tree.
  • Immediately call xmlUnlinkNode and xmlFreeNode inside XmlNode::remove and make the api enforce that you cannot reuse any removed node or any of its children.

I'm not super familiar with libxml or v8 or what's possible with them, so maybe someone here who understands the problem can weigh in @shtylman?

@skabbes skabbes Test exposing SEGFAULT in XmlNode destructor
When you remove a node from the XmlTree, libxml removes that node and
all of its children from the tree, but doesn't free that memory.
However, these libxml nodes maybe referenced by libxmljs XmlNode's.  So,
when those  XmlNodes' destructors are called, it causes us to try to
doulbe free that memory.
056e4ec

This pull request fails (merged 056e4ec into 2f44976).

Contributor

skabbes commented Sep 5, 2012

This manifested itself in some production code which was conditionally removing elements from a document based off its attributes. Git bisect led me to this commit. Here's the gdb backtrace, xml_obj->doc == 0xffffffffffffffff

Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_INVALID_ADDRESS at address: 0xffffffffffffffff
0x00000001007e25f0 in libxmljs::XmlNode::~XmlNode (this=0x100819950) at ../src/xml_node.cc:195
195     XmlDocument* doc = static_cast<XmlDocument*>(xml_obj->doc->_private);
(gdb) bt
#0  0x00000001007e25f0 in libxmljs::XmlNode::~XmlNode (this=0x100819950) at ../src/xml_node.cc:195
#1  0x00000001007e0c54 in libxmljs::XmlElement::~XmlElement (this=0x100819950) at xml_element.h:10
#2  0x0000000100176f19 in v8::internal::GlobalHandles::Node::PostGarbageCollectionProcessing ()
#3  0x0000000100176911 in v8::internal::GlobalHandles::PostGarbageCollectionProcessing ()
#4  0x000000010017d81a in v8::internal::Heap::PerformGarbageCollection ()
#5  0x000000010017d2b8 in v8::internal::Heap::CollectGarbage ()
#6  0x000000010017d081 in v8::internal::Heap::CollectAllGarbage ()
#7  0x0000000100298255 in v8::internal::GCExtension::GC ()
#8  0x000000010013b54c in v8::internal::Builtin_HandleApiCall ()
# ... snip
Collaborator

defunctzombie commented Sep 5, 2012

Than you for the detailed explanation (very much appreciated)! I will look into the issue once I find a bit of time. Ideally it would be nice to have a node be re-used after removed from one tree but we shall see what can be done here. Certainly crashing is not the answer :)

Contributor

thomas-riccardi commented Sep 5, 2012

I don't get the same stack:

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff5870079 in libxmljs::XmlAttribute::New (attr=0xffffffffffffffff) at ../src/xml_attribute.cc:33
33      assert(attr->type == XML_ATTRIBUTE_NODE);
(gdb) bt
#0  0x00007ffff5870079 in libxmljs::XmlAttribute::New (attr=0xffffffffffffffff) at ../src/xml_attribute.cc:33
#1  0x00007ffff58755f3 in libxmljs::XmlElement::get_attrs (this=0xcd8ca0) at ../src/xml_element.cc:331
#2  0x00007ffff5874395 in libxmljs::XmlElement::Attrs (args=...) at ../src/xml_element.cc:110
#3  0x00000000005addf1 in ?? ()
...

And this is what I would expect from the code.
In your scenario, I'm ok with points 1 to 4, but I don't understand point 5:
Who calls xmlFreeNode on a's children? They do have a parent, so the xmlFreeNode in XmlNode destructor is not called.
The issue is however that the xml_obj is invalid on the XmlNode children objects, since they have already been freed.
On your test case the crash happens earlier: we read the attributes of a freed xmlNode before the destruction of the XmlNode children but the GC.

The issue is similar to the one raised on #148:
What should libxmljs do when we have node objects pointing to now invalid libxml objects? And how to detect all these cases?
In #148 the not so clean solution was to make the node object point to a new empty libxml object to avoid the crash. The detection was however easy: we know when the invalidation happens.
It would be much more difficult to do so in this case since the node objects are not at all aware of the tree structure, only the libxml objects are: when deleting the XmlNode a, we don't know that we have other XmlNodes pointing to childs of a...
Also, we should do better than creating a new empty libxml object when we detect the issue: maybe have an "invalid" state for the node objects, which would raise a javascript exception when used? Is it possible to do this directly by V8 (telling it that the node object is now invalid, and that it can free the C++ object)?

Maybe we should look at what other memory managed languages have done for this when wrapping libxml2.

Contributor

skabbes commented Sep 5, 2012

triccardi, yes, you are correct about point 5. I must have gotten myself mixed up with the late night debugging. Also, the stack trace I posted was from the original problem I encountered - the code was too complicated to reproduce in a short test and I had to change a few things to make sure that nodeunit would actually fail when there was a segfault.

I agree with the 'invalid' node idea, seems to be the cleanest. We just need to make sure that we also keep a separate pointer to the parent XmlDocument so we can unref it later. Just my 2 cents.

Collaborator

defunctzombie commented Jan 6, 2013

I tried the failing test on the latest version of the code and it produces no error for me. It may be that other commits have since fixed this. If this is still a problem which you can expose, please let me know and we can revisit the issue.

defunctzombie reopened this Jan 6, 2013

Collaborator

defunctzombie commented Jan 6, 2013

Seems that the test does fail on travis-ci (just not on my mac). Looking into it.

Contributor

skabbes commented Jan 6, 2013

It is unlikely this was fixed "accidentally". To fix this a significant change needs to be made to the reference counting strategy.

Collaborator

defunctzombie commented Jan 6, 2013

@skabbes yea, I am taking a look at it. For some reason the test worked on my mac but does fail on my linux box so I can attempt to debug it there.

Collaborator

defunctzombie commented Jan 6, 2013

Here is the problem (and one of the reasons that particular test segfaults).

When tag.childNodes() is called, the child nodes are indeed created, but they do not bump the ref count of the parent node that they were created from. This means that when the loop ends and tag (the div) runs xmlFreeNode the span's xml_obj is also removed internally to libxml. All of this is beside the issue of document ref counting.

What really should be happening is that when a child node is created, it refs the parent node to prevent it from going away. This is really a giant mess of ref craziness :) Anyhow, I do understand the issue better now. Not sure I have a clear fix for it at the moment tho but I do have some thoughts.

@polotek Do you want me to revert the failing text change so we get a passing badge back?

Collaborator

polotek commented Jan 6, 2013

Yes please. And if you can suggest improvements to our ref counting strategy, I'd like to hear them.

Is there a workaround for this issue? I'm currently experiencing this when removing nodes from a document. Is there a different way I should do it?

Contributor

skabbes commented Jul 15, 2013

I have the workaround here which works because libxml "zeros" their memory to -1 after it is freed. That check basically sees whether the node has been freed already or not.

Using this fix will leak memory, this is especially important if you are parsing large html documents since it will retain that entire document, not just the removed node.

I'd rather leak memory than crash! Sorry for the ignorance, but how can I incorporate this with my node application?

Contributor

skabbes commented Jul 15, 2013

Fork this repo, apply the patch I linked, and rather than "npm installing" from the npm registry, install it like:

npm install git+ssh://git@github.com:jonlabarge/libxmljs.git#segfault_fix

New segfault: 0 xmljs.node 0x000000010260546e libxmljs::XmlNamespace::~XmlNamespace() + 44
1 xmljs.node 0x000000010260542a libxmljs::XmlNamespace::~XmlNamespace() + 14

Collaborator

polotek commented Oct 20, 2014

I want to come back to this issue. It's one of the major sources of segfaults we have today. We've been talking about overhauling our ref counting strategy. But I want to make sure we've explored other options before doing that.

If I'm understanding the issue correctly, it centers around nodes that have been detached from their document, and get garbage collected by v8. I think we need to focus on understanding how to treat unattached nodes. Most XML implementations have this notion of node ownership. As long as a node exists, it has relationship with it's "owner" document. The document can't go away until all of the nodes it's responsible for are cleaned up. That includes unattached nodes.

So I believe what we need to do is update the node destructor with the appropriate check for unattached nodes. And then make sure we're doing the right amount of reference count updating. Unfortunately the simple checks don't work. I tried something like the following it and it produced similar segfaults.

if(xml_obj->doc->_private) {
  // get the doc and call unref()
}

I don't really understand what state the xml_obj is in when we get it. We should be able to tell that it's unattached and act accordingly. Another code smell here is that we're reaching into the doc for a private member. We should really have a better method for retrieving the libxmljs wrapper objects.

I'm going to take a deeper look at this soon. But I would appreciate any additional information about this problem.

Contributor

skabbes commented Oct 20, 2014

If you read the test case I added, the problem becomes more obvious.

Given an XML tree like so:

<root>
  <level_a>
    <level_b>Hello</level_b>
    <level_b>World</level_b>
  </level_a>
</root>

If you .find for level_a, then .find for all children of level_a. The node.js bindings now hold a reference to all the nodes level_a and lower. Then if you remove the level_a node, libxml2 also frees both level_b nodes, however V8 is still holding a reference to them (whose destructors will be called the next time V8 runs the garbage collector).

Collaborator

polotek commented Oct 20, 2014

Yes I understand how the problem is reproduced. But that doesn't fully
describe the issue or suggest the solution. The problem is not that v8 is
still holding references to removed nodes. This is fine because remove()
doesn't destroy anything. It just detaches the nodes from their owner DOM
document. Even detached nodes stick around until xmlFreeNode is called.
And I believe they still hold references to their parent document as well.

What I'm not fully understanding is where exactly the segfault happens.
I've seen it, but haven't hooked it up to a debugger to inspect it. What I
would expect is that if we're dealing with a detached node, we don't need
to Unref the document at all. But I don't understand the appropriate check
to do for that. It also leaves a remaining problem which is that the
document itself needs to deallocated at some point with xmlFreeDoc. But
it needs to hang around even for unattached nodes or that will cause
problems. The ref counting strategy needs to understand when the document
is truly free.

On Mon, Oct 20, 2014 at 4:06 PM, Steven Kabbes notifications@github.com
wrote:

If you read the test case I added, the problem becomes more obvious.

Given an XML tree like so:

Hello World

If you .find for level_a, then .find for all children of level_a. The
node.js bindings now hold a reference to all the nodes level_a and lower.
Then if you remove the level_a node, libxml2 also frees both level_b
nodes, however V8 is still holding a reference to them (whose destructors
will be called the next time V8 runs the garbage collector).


Reply to this email directly or view it on GitHub
polotek#163 (comment).

Marco Rogers
marco.rogers@gmail.com | https://twitter.com/polotek

Collaborator

polotek commented Oct 20, 2014

Here's an interesting page that addresses this same problem for wxXml2. It's a wrapper around libxml2 for the wxWidgets toolkit. http://wxcode.sourceforge.net/docs/wxxml2/refcount.html

It basically says what I've been saying. But because they also have to do manual memory management for their wrapper objects, they can control the way they're deallocated. Our situation is a little different because we're at the whims of a garbage collector.

Contributor

skabbes commented Oct 20, 2014

Ah, ya sorry. xmlUnlinkNode is fine. The reason it crashes is that some nodes can outlive the parent document, then in the XmlNode destructor, we try to dereference an already freed piece of memory.

The reason this workaround works is because if xml_obj->type == -1 that means that xml_obj->doc->_private points to garbage memory (and thus has likely already been freed).

I think to state it succinctly, we need to ensure that we never call xmlFreeDoc while there are still XmlNode's unfreed.

As a proposed solution, I would say that we separate the refcounting of libxmljs::XmlDocument from the ref counting of xmlDoc. This is how I would do that in c++11.

class DocDeleter final {
  public:
    void operator() (xmlDoc * doc) const {
        xmlFreeDoc(doc);
    }
};

shared_ptr<xmlDoc> open_doc() {
    auto * doc = xmlNewDoc(...);
    return shared_ptr<xmlDoc>(doc, DocDeleter {} );
}

class XmlDocument {
  private:
    shared_ptr<xmlDoc> xml_obj;
  public:
    ~XmlDocument() {
        // compiler automatically does this, but for completeness
        xml_obj = nullptr;
    }
};

class XmlNode {
  private:
   shared_ptr<xmlDoc> xml_obj;
  public:
    ~XmlNode() {
        // compiler automatically does this, but for completeness
        xml_obj = nullptr;
    }
    // etc. etc.
};
Contributor

skabbes commented Oct 20, 2014

Essentially having each XmlNode keep a handle (an std::shared_ptr) on the owning document. We might want to actually keep the XmlDocument wrapper around as well, but this same trick could be applied, replacing xmlDoc with XmlDocument.

Contributor

skabbes commented Oct 20, 2014

^ feel free to ignore me. I think I had some incorrect assumptions about the code. It has been over a year :(

I think I need to sleep on this problem.

Collaborator

polotek commented Oct 21, 2014

Here's how wxXml2 is determining whether a node is unlinked. Search for the definition of isUnlinked in this code.

http://svn.code.sf.net/p/wxcode/code/trunk/wxCode/components/wxxml2/src/xml2.cpp

It's pretty simple. They just check whether all of the relevant reference points are NULL; in this case parent, next, and prev ptrs. I wonder about the doc ptr on nodes though. That should never be NULL as long as one node exists that references the document. But I think this is where we are messing up. We are freeing the document as soon as v8 garbage collects the wrapper node. I'm starting to wrap my head around this problem.

Contributor

skabbes commented Oct 21, 2014

ahh, brilliant @polotek .

I think we might lose the ref to doc as soon as we call xmlUnlinkNode, but I'm unsure. What if we stored the pointer to libxmljs::XmlDocument outside of the _private section (like directly on the XmlNode instance)? Would that work?

Collaborator

polotek commented Oct 21, 2014

Another good reference for this. Here's the libxml++ library. It uses the same technique of saving wrapper objects in the _private ptr. In fact, I'm pretty sure that's where we got the idea from. But I've linked to the free_wrappers function which frees the wrapper function similar to how our destructors are supposed to. Note how they don't free document objects. I need to trace further to see what they actually do with documents.

https://git.gnome.org/browse/libxml++/tree/libxml++/nodes/node.cc#n729

Collaborator

polotek commented Oct 21, 2014

One more reference. The R language is suggesting basically the same approach to this problem.

http://www.omegahat.org/RSXML/MemoryManagement.html

One key difference is they are counting references for the document and the node itself. In their world, they can have multiple wrappers objects for the same xmlNode. We avoid that by ensuring only one object per node. (multiple references are achieved by multiple v8::Handles to the same wrapper object). So I don't think we need the node level ref counters. But I need to study a bit more. They talk about xmlNodes created with no doc attached. But I thought we always had to have a doc, so there's at least one thing I'm misunderstanding.

Contributor

hildjj commented May 29, 2015

This issue should not be freed until a patch is accepted into master.

Collaborator

rchipka commented Jul 7, 2015

Would it be possible to avoid complicated ref counting code by instead giving the user the responsibility of freeing (unrefing) the document via the API? When the user knows that they're done adding/creating elements and modifying/searching the document etc., they could simply call doc.free() or doc.unref() on the JS side. Is this at all a possibility? Could it be potentially dangerous? Maybe a safety variable could hold the nodesConstructed-nodesDestructed and the API call doc.free() would return false when it sees that there are still ref-ed nodes floating around.

Contributor

skabbes commented Jul 11, 2015

@rc0x03 I don't think that's a good idea from an API perspective. The memory management code shouldn't be that complex really - someone just needs to take the time and do it carefully.

Collaborator

rchipka commented Jul 14, 2015

That's a good point. Another reason why that might be a bad idea is if the user has complicated async code then user would then have to determine when they're completely done using the document. After looking through the source I came up with another possible solution.

Libxml has a few functions that accept callbacks. One of those functions is xmlDeregisterNodeDefault. This function sets the default callback function that is called when any node is "deregistered" (i.e. before freeing). The callback is also called for all of the node's children.

We can use this function to prevent bad access related segmentation faults by setting a flag saying that the xmlNode has already been freed. The XmlNode destructor code will then check the flag and return if it has already been freed.

In libxmljs.cc:

// set up our callback function that will set the flag

void flagNode(xmlNode* node) {
  if (node->_private) {
    (static_cast<XmlNode*>(node->_private))->isFree = true;
  }
  return;
}

// set the callback function to be called

LibXMLJS::LibXMLJS()
{
    xmlDeregisterNodeDefault(flagNode);

....

}

In xml_node.h:

class XmlNode : public node::ObjectWrap {
public:
    bool isFree;

In xml_node.cc:

// The constructor will set the default value to false

XmlNode::XmlNode(xmlNode* node) : xml_obj(node) {
    this->isFree = false;

...
}


// The destructor will check the value

XmlNode::~XmlNode() {
    if (this->isFree) {
      return;
    }

...
}

Now every time an xmlNode (or any of its children) is freed, the callback will set a flag on the libxmljs Node object saying that the xmlNode has already been freed. Since the destructor will have this information it will not attempt to free the node or access properties of the node causing a segfault. This solution works without any additional ref counting code or memory leaks.

Collaborator

rchipka commented Jul 15, 2015

Note: this only fixes cases where a parent destructor gets called before a child destructor and the child node is double freed.

This does not fix the problem in pull request #267. The problem here is that the parent is allowed to be freed even though there is still a floating reference to one of its child nodes.

Collaborator

rchipka commented Jul 16, 2015

Collaborator

rchipka commented Jul 23, 2015

Pull request #323 solves this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment