Keep getting Net::HTTP::Persistent::Error: too many connection resets (due to end of file reached - EOFError) #116

patakijv · 2011-06-18T20:46:14Z

I am processing multiple pages on a site for a payment processor and I run into errors like:

Net::HTTP::Persistent::Error: too many connection resets (due to end of file reached - EOFError) after 3 requests on 2195783400

I am thinking that it is because the processing is happening faster than the Net connection can be closed or released

Is this the case and how to force close or wait until close before continuing - or some better way to handle this?

patakijv · 2011-06-20T17:11:56Z

I keep getting these errors during my testing... has any one else seen this and what needs to be done to resolve it?

Here is another example of the error with more info:

/Library/Ruby/Gems/1.8/gems/net-http-persistent-1.7/lib/net/http/persistent.rb:426:in `request': too many connection resets (due to end of file reached -
EOFError) after 12 requests on 2150615540 (Net::HTTP::Persistent::Error)

chancancode · 2011-06-21T00:24:23Z

After a quick peek at the code, my understanding is that when some (possibly temporary) error occurred, it would attempt to retry for a few more times, and if it still fails then it throws this error. So this probably has little to do with too many dangling connections - and my impression from scanning the code is that it already manages the connection(s?) somewhat intelligently, including reusing them and shutting down those are not needed, but I'm not sure. My guess is this is a server-specific issue?

drbrain · 2011-06-22T02:32:11Z

You will get an EOFError when the connection is closed before all the data is retrieved. Since this is happening at the network layer it's probably an issue with the server closing the connection before sending all the data.

If you run with debug mode enabled and paste the output we may be able to learn more.

metalfingers · 2012-04-26T06:00:46Z

Sorry to open this back up again but I have the same problem. Here's my code:

require 'rubygems'
require 'mechanize'
require 'logger'

agent = Mechanize.new
agent.keep_alive = true
agent.log = Logger.new(STDOUT)
page = agent.get('http://www.lexisnexis.com/lawschool/login.aspx')

login_form = page.form('form1')
login_form.txtLoginID = 'email@example.com'
login_form.TextBox1 = 'password1234'
page = agent.submit(login_form)

# searchPage is where the error occurs #
searchPage = agent.get('https://www.lexis.com/research/xlink')
pp searchPage

I'm not quite sure what to do so I've created a gist of my output https://gist.github.com/2496352.

drbrain · 2012-04-27T22:06:12Z

Looking at your output, mechanize thinks it got two HTTP responses for only one request. Can you send me a log with the raw socket output to drbrain@segment7.net? The raw socket debugging will include your password unless you edit it out. Here is an example:

require 'mechanize'
require 'logger'

agent = Mechanize.new
agent.keep_alive = true
agent.log = Logger.new $stderr
agent.agent.http.debug_output = $stderr

agent.get 'http://google.com'

By default stderr is not buffered, unlike stdout.

drbrain · 2012-04-28T00:59:54Z

I got your log, here's the important part, edited for clarity and brevity:

D, [2012-04-27T20:18:18.505330 #8010] DEBUG -- : response-header: transfer-encoding => chunked
[…]
-> "497\r\n"
reading 1175 bytes...
-> ""
D, [2012-04-27T20:18:18.508102 #8010] DEBUG -- : Read 0 bytes (0 total)
-> "\xC5[…]"
D, [2012-04-27T20:18:18.592762 #8010] DEBUG -- : Read 1175 bytes (1175 total)
read 1175 bytes
reading 2 bytes...
-> "\r\n"
read 2 bytes
-> "0\r\n"
Conn close because of error end of file reached

The second-to-last line is the problem, RFC 2616 states that transfer-encoding: chunked must end with "0\r\n\r\n" and the lexis-nexis servers have omitted it. Since this comes from all the way down in net/http I can't directly work around this broken server. mechanize is supposed to raise Mechanize::ResponseReadError in a case like this, but it didn't happen which is a bug.

I'll look into fixing mechanize to raise the proper error and after that try to add detection for this type of error.

metalfingers · 2012-04-28T01:32:19Z

Thanks for the clarification. Does this mean, ultimately, that the page can't be scraped?

Best regards,
Richard

On Friday, April 27, 2012 at 8:59 PM, Eric Hodel wrote:

I got your log, here's the important part, edited for clarity and brevity:
D, [2012-04-27T20:18:18.505330 #8010] DEBUG -- : response-header: transfer-encoding => chunked
[…]
-> "497\r\n"
reading 1175 bytes...
-> ""
D, [2012-04-27T20:18:18.508102 #8010] DEBUG -- : Read 0 bytes (0 total)
-> "\xC5[…]"
D, [2012-04-27T20:18:18.592762 #8010] DEBUG -- : Read 1175 bytes (1175 total)
read 1175 bytes
reading 2 bytes...
-> "\r\n"
read 2 bytes
-> "0\r\n"
Conn close because of error end of file reached
The second-to-last line is the problem, RFC 2616 states that transfer-encoding: chunked must end with "0\r\n\r\n" and the lexis-nexis servers have omitted it. Since this comes from all the way down in net/http I can't directly work around this broken server. mechanize is supposed to raise Mechanize::ResponseReadError in a case like this, but it didn't happen which is a bug.

I'll look into fixing mechanize to raise the proper error and after that try to add detection for this type of error.

Reply to this email directly or view it on GitHub:
https://github.com/tenderlove/mechanize/issues/116#issuecomment-5393310

drbrain · 2012-04-28T02:25:09Z

I need to fix some bugs, but it will be scrapable after that.

metalfingers · 2012-04-28T05:23:40Z

Cool. Thanks again!

…::ResponseReadError in case of bad servers. Issue #116

drbrain · 2012-05-04T23:30:48Z

@metalfingers the following should work for you after @dd65e11

mech = Mechanize.new
mech.ignore_bad_chunking = true

# … your script

Note that ignore_bad_chunking may cause data loss since mechanize can't tell if the EOF occurred mid-transfer or due to the missing CRLF. If you're concerned about data loss you can leave ignore_bad_chunking off and rescue an exception instead:

# … your script

begin
  page = agent.get('https://www.lexis.com/research/xlink')
rescue Mechanize::ChunkTerminationError => e
  # check e.body_io for completeness
  page = e.force_parse
end

Mechanize::ChunkTerminationError is a subclass of Mechanize::ResponseReadError so you can handle it the same as a content-length error:

http://mechanize.rubyforge.org/Mechanize.html#label-Problems+with+content-length

metalfingers · 2012-05-05T14:06:44Z

Thanks. I'll give it a go!

aantix · 2012-05-14T22:10:17Z

I don't think the above exception handling helps with form submissions receiving this very error. When I encounter the EOF exception after a form submit, I receive the following backtrace :

/Users/jjones/.rvm/gems/ruby-1.9.2-p290@rbanyan/gems/net-http-persistent-2.6/lib/net/http/persistent.rb:839:in `rescue in request'
/Users/jjones/.rvm/gems/ruby-1.9.2-p290@rbanyan/gems/net-http-persistent-2.6/lib/net/http/persistent.rb:848:in `request'
/Users/jjones/.rvm/gems/ruby-1.9.2-p290@rbanyan/gems/mechanize-2.5/lib/mechanize/http/agent.rb:258:in `fetch'
/Users/jjones/.rvm/gems/ruby-1.9.2-p290@rbanyan/gems/mechanize-2.5/lib/mechanize.rb:1229:in `post_form'
/Users/jjones/.rvm/gems/ruby-1.9.2-p290@rbanyan/gems/mechanize-2.5/lib/mechanize.rb:515:in `submit'
/Users/jjones/.rvm/gems/ruby-1.9.2-p290@rbanyan/gems/mechanize-2.5/lib/mechanize/form.rb:178:in `submit'

And the exception object is of type Net::HTTP::Persistent::Error, so you can't call e.force_parse .

Still looking into this.

aantix · 2012-05-14T23:21:44Z

By disabling keep_alive, we were able to alleviate the EOFError's that we were receiving. We found the fix outlined here :

http://rubyforge.org/pipermail/mechanize-users/2010-January/000486.html

bot.keep_alive = false

drbrain · 2012-05-15T17:21:52Z

@aantix can you reproduce with mechanize 2.5.1? There was an unfortunate bug with form submission introduced 2.5 (see #229)

SohumB · 2012-06-02T00:13:47Z

Mechanize 2.5.1? I can.

Net::HTTP::Persistent::Error: too many connection resets (due to end of file reached - EOFError) after 175 requests on 37900880, last used 0.567640916 seconds ago
        from /var/lib/gems/1.9.1/gems/net-http-persistent-2.6/lib/net/http/persistent.rb:839:in `rescue in request'
        from /var/lib/gems/1.9.1/gems/net-http-persistent-2.6/lib/net/http/persistent.rb:848:in `request'
        from /var/lib/gems/1.9.1/gems/mechanize-2.5.1/lib/mechanize/http/agent.rb:258:in `fetch'
        from /var/lib/gems/1.9.1/gems/mechanize-2.5.1/lib/mechanize.rb:1229:in `post_form'
        from /var/lib/gems/1.9.1/gems/mechanize-2.5.1/lib/mechanize.rb:515:in `submit'

Most of my form submissions go through, it's just a few that don't, so I don't think it's the same bug. Trialling disabling keep_alive now.

drbrain · 2012-06-02T09:07:58Z

What is the keep-alive timeout for this host?

Does reducing the idle_timeout to 0.5 help?

SohumB · 2012-06-02T09:53:45Z

Disabling keep_alive worked. The header is:

Keep-Alive: timeout=5, max=75

I'll check the idle_timeout as soon as I can.

SohumB · 2012-06-12T22:12:56Z

Setting agent.idle_timeout = 0.5 instead of agent.keep_alive = false helps in making the problem error out earlier, but there are still form submissions that don't go through.

Net::HTTP::Persistent::Error: too many connection resets (due to end of file reached - EOFError) after 10 requests on 42428560, last used 0.25454795 seconds ago
        from /var/lib/gems/1.9.1/gems/net-http-persistent-2.6/lib/net/http/persistent.rb:839:in `rescue in request'
        from /var/lib/gems/1.9.1/gems/net-http-persistent-2.6/lib/net/http/persistent.rb:848:in `request'
        from /var/lib/gems/1.9.1/gems/mechanize-2.5.1/lib/mechanize/http/agent.rb:258:in `fetch'
        from /var/lib/gems/1.9.1/gems/mechanize-2.5.1/lib/mechanize.rb:1229:in `post_form'
        from /var/lib/gems/1.9.1/gems/mechanize-2.5.1/lib/mechanize.rb:515:in `submit'
        from import.rb:134:in `block in <top (required)>'
        from import.rb:101:in `each'
        from import.rb:101:in `<top (required)>'
        from /var/lib/gems/1.9.1/gems/activesupport-2.3.12/lib/active_support/dependencies.rb:171:in `load'
        from /var/lib/gems/1.9.1/gems/activesupport-2.3.12/lib/active_support/dependencies.rb:171:in `block in load_with_new_constant_marking'
        from /var/lib/gems/1.9.1/gems/activesupport-2.3.12/lib/active_support/dependencies.rb:547:in `new_constants_in'
        from /var/lib/gems/1.9.1/gems/activesupport-2.3.12/lib/active_support/dependencies.rb:171:in `load_with_new_constant_marking'
        from (irb):1
        from /usr/bin/irb:12:in `<main>'

SohumB · 2012-06-12T22:45:44Z

Correction: The form submissions do appear to be going through; I misread my logs.

drbrain · 2012-06-13T02:53:21Z

Since the "last used" value is 0.254 seconds, setting your idle_timeout to 0.25 may be better. It seems your server has a particularly aggressive idle timeout.

drbrain closed this as completed Jun 29, 2011

drbrain reopened this Apr 27, 2012

drbrain added a commit that referenced this issue May 4, 2012

A missing chunked transfer-encoding terminator now raises a Mechanize…

289442a

…::ResponseReadError in case of bad servers. Issue #116

drbrain closed this as completed in dd65e11 May 4, 2012

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Keep getting Net::HTTP::Persistent::Error: too many connection resets (due to end of file reached - EOFError) #116

Keep getting Net::HTTP::Persistent::Error: too many connection resets (due to end of file reached - EOFError) #116

patakijv commented Jun 18, 2011

patakijv commented Jun 20, 2011

chancancode commented Jun 21, 2011

drbrain commented Jun 22, 2011

metalfingers commented Apr 26, 2012

drbrain commented Apr 27, 2012

drbrain commented Apr 28, 2012

metalfingers commented Apr 28, 2012

drbrain commented Apr 28, 2012

metalfingers commented Apr 28, 2012

drbrain commented May 4, 2012

metalfingers commented May 5, 2012

aantix commented May 14, 2012

aantix commented May 14, 2012

drbrain commented May 15, 2012

SohumB commented Jun 2, 2012

drbrain commented Jun 2, 2012

SohumB commented Jun 2, 2012

SohumB commented Jun 12, 2012

SohumB commented Jun 12, 2012

drbrain commented Jun 13, 2012

Keep getting Net::HTTP::Persistent::Error: too many connection resets (due to end of file reached - EOFError) #116

Keep getting Net::HTTP::Persistent::Error: too many connection resets (due to end of file reached - EOFError) #116

Comments

patakijv commented Jun 18, 2011

patakijv commented Jun 20, 2011

chancancode commented Jun 21, 2011

drbrain commented Jun 22, 2011

metalfingers commented Apr 26, 2012

drbrain commented Apr 27, 2012

drbrain commented Apr 28, 2012

metalfingers commented Apr 28, 2012

drbrain commented Apr 28, 2012

metalfingers commented Apr 28, 2012

drbrain commented May 4, 2012

metalfingers commented May 5, 2012

aantix commented May 14, 2012

aantix commented May 14, 2012

drbrain commented May 15, 2012

SohumB commented Jun 2, 2012

drbrain commented Jun 2, 2012

SohumB commented Jun 2, 2012

SohumB commented Jun 12, 2012

SohumB commented Jun 12, 2012

drbrain commented Jun 13, 2012