Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nokogiri::XML::Reader for java keeps whole document in memory #1066

Closed
pablito opened this issue Mar 14, 2014 · 1 comment
Closed

Nokogiri::XML::Reader for java keeps whole document in memory #1066

pablito opened this issue Mar 14, 2014 · 1 comment
Labels
platform/jruby topic/memory Segfaults, memory leaks, valgrind testing, etc.

Comments

@pablito
Copy link

pablito commented Mar 14, 2014

I have the following code:

require 'nokogiri'

reader = Nokogiri::XML::Reader(File.open('5258513130_report.xml'))
i = 0
reader.each do |node|
  i += 1
  break if i == 5000
end

puts "it's time to dump the heap..."
sleep

5258513130.xml is

<?xml version="1.0" encoding="utf-8"?>
<BulkDataExchangeResponses xmlns="urn:ebay:apis:eBLBaseComponents">
<ActiveInventoryReport>

<SKUDetails>
<SKU>7544569</SKU>
<Price currencyID="EUR">103.99</Price>
<Quantity>35</Quantity>
<ItemID>111298416963</ItemID>
</SKUDetails>
<SKUDetails>
<SKU>22783396</SKU>
<Price currencyID="EUR">148.35</Price>
<Quantity>9</Quantity>
<ItemID>111298416964</ItemID>
</SKUDetails>
...
</ActiveInventoryReport>
</BulkDataExchangeResponses>

as you can see from the following snapshots of the heap dump the reader is keeping in memory all the 5000 nodes it has found, this doesn't happen in MRI ruby 2.0.0p247

nodeQueue.size = 5001
image

ReaderNode$ClosingNode 1249 instances
ReaderNode$ElementNode 1252 instances
ReaderNode$TextNode 2502 instances
image

my config:
nokogiri (1.5.10 java)
jruby 1.7.4 (1.9.3p392) 2013-05-16 2390d3b on OpenJDK 64-Bit Server VM 1.7.0_51-b00 +indy [linux-amd64]

@flavorjones
Copy link
Member

Closing this in favor of #2224 which describes the same issue in a bit more detail.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
platform/jruby topic/memory Segfaults, memory leaks, valgrind testing, etc.
Projects
None yet
Development

No branches or pull requests

2 participants