Add failing test case for Nokogiri::XML::Node#content inconsistency between Java and C #794

Closed
wants to merge 2 commits into
from

2 participants

@benlangfeld

C implementation includes content of child nodes recursively, Java does not.

@benlangfeld

Any guidance on how I should approach fixing this would be most appreciated.

@jvshahid
Sparkle Motion member

sorry for the late response. I'll take a look and get back to you shortly.

@jvshahid
Sparkle Motion member

Can you explain what's wrong with the test in commit 0e43632, i mean what are you trying to test ?

@benlangfeld

Note that this is, I hope, the last thing blocking my Nokogiri-dependent stuff from all passing on JRuby. I would love to get this in to 1.5.6, assuming that is coming soon. It'll be beers all round at that point :)

@benlangfeld

On JRuby, node.content returns "I <3 nachos". On libxml it returns "I ". A similar thing can be seen in the rendered XML. It seems libxml is assuming all instances of "<" mean there's an XML element coming and doesn't escape correctly.

@jvshahid
Sparkle Motion member

That's what I thought. I think in this case the C implementation is doing the right thing, the << method on nodes accept markup, which explains why everything after the < was dropped. This is just the way libxml handles invalid markup.

@benlangfeld

So how then do I append a string containing "<"? I can't do node.content += "<". That would work, other than the issue in the other test here where on libxml #content walks all children (which I believe is correct, though perhaps this should be optional).

@jvshahid
Sparkle Motion member

You can use node.content += on the text node or insert a new text node if there isn't one. Element nodes don't have text content by definition. If that still doesn't make sense, can you please send an example of what you're trying to do.

@benlangfeld
@jvshahid
Sparkle Motion member

No worries. I agree that the behavior should be consistent. I'll merge the changes in 311be4b on master but I won't merge the other test.

I'll work on the fix some time later today.

@jvshahid jvshahid added a commit that referenced this pull request Nov 20, 2012
@jvshahid jvshahid Initial fix for #794. I think the solution can be drastically simplif…
…ied. Details below.

This patch changes the behavior of node.content to recursively
aggregate text from children nodes, unless this is a Text node in
which case the content is returned.

In order to support recursively appending contents of children
elements and to fix the broken append behavior demonstrated by the
test in the previous commit, I had to remove the caching (which I'm
not sure why it was there in the first place) Also since we
recursively append text, we'll have to invalidate the cache when
changes deeper in the document tree occur.

Also I removed the content() from XML::Text since we don't need it
anymore.
1928899
@jvshahid jvshahid added a commit that referenced this pull request Nov 20, 2012
@jvshahid jvshahid pushed the fix for #794 to the ruby land.
This solution is much cleaner, but may not be as efficient.
This can be reverted to the previous commit which was pure
Java if there are issues with this implementation.
96f91eb
@jvshahid jvshahid referenced this pull request Nov 20, 2012
Closed

Fix for pull request #794 #797

@jvshahid
Sparkle Motion member

Sorry it took too long to push a fix. I had to wrap my head around the current implementation. I pushed another pull request with the first test and its fix. I'd like @yokolet to review the changes before it gets merged into master.

I'm going to close the issue, but feel free to comment on the new pull request if the problem isn't solved.

@jvshahid jvshahid closed this Nov 20, 2012
@benlangfeld
@yokolet yokolet added a commit that referenced this pull request Nov 23, 2012
@yokolet yokolet Merges java version of fix of a pull request #797, which incluse a te…
…st case from pull request #794.
3b905f6
@jvshahid jvshahid added a commit that referenced this pull request Nov 23, 2012
@jvshahid jvshahid Revert "pushed the fix for #794 to the ruby land."
This reverts commit 96f91eb.
b4f3f47
@patcheng patcheng added a commit that referenced this pull request Nov 26, 2012
@yokolet yokolet Merges java version of fix of a pull request #797, which incluse a te…
…st case from pull request #794.
bd44ce7
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment