Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keep getting Net::HTTP::Persistent::Error: too many connection resets (due to end of file reached - EOFError) #116

Closed
patakijv opened this issue Jun 18, 2011 · 20 comments

Comments

@patakijv
Copy link

I am processing multiple pages on a site for a payment processor and I run into errors like:

Net::HTTP::Persistent::Error: too many connection resets (due to end of file reached - EOFError) after 3 requests on 2195783400

I am thinking that it is because the processing is happening faster than the Net connection can be closed or released

Is this the case and how to force close or wait until close before continuing - or some better way to handle this?

@patakijv
Copy link
Author

I keep getting these errors during my testing... has any one else seen this and what needs to be done to resolve it?

Here is another example of the error with more info:

/Library/Ruby/Gems/1.8/gems/net-http-persistent-1.7/lib/net/http/persistent.rb:426:in `request': too many connection resets (due to end of file reached -
EOFError) after 12 requests on 2150615540 (Net::HTTP::Persistent::Error)

@chancancode
Copy link
Contributor

After a quick peek at the code, my understanding is that when some (possibly temporary) error occurred, it would attempt to retry for a few more times, and if it still fails then it throws this error. So this probably has little to do with too many dangling connections - and my impression from scanning the code is that it already manages the connection(s?) somewhat intelligently, including reusing them and shutting down those are not needed, but I'm not sure. My guess is this is a server-specific issue?

@drbrain
Copy link
Member

drbrain commented Jun 22, 2011

You will get an EOFError when the connection is closed before all the data is retrieved. Since this is happening at the network layer it's probably an issue with the server closing the connection before sending all the data.

If you run with debug mode enabled and paste the output we may be able to learn more.

@drbrain drbrain closed this as completed Jun 29, 2011
@metalfingers
Copy link

Sorry to open this back up again but I have the same problem. Here's my code:

require 'rubygems'
require 'mechanize'
require 'logger'

agent = Mechanize.new
agent.keep_alive = true
agent.log = Logger.new(STDOUT)
page = agent.get('http://www.lexisnexis.com/lawschool/login.aspx')

login_form = page.form('form1')
login_form.txtLoginID = 'email@example.com'
login_form.TextBox1 = 'password1234'
page = agent.submit(login_form)

# searchPage is where the error occurs #
searchPage = agent.get('https://www.lexis.com/research/xlink')
pp searchPage

I'm not quite sure what to do so I've created a gist of my output https://gist.github.com/2496352.

@drbrain
Copy link
Member

drbrain commented Apr 27, 2012

Looking at your output, mechanize thinks it got two HTTP responses for only one request. Can you send me a log with the raw socket output to drbrain@segment7.net? The raw socket debugging will include your password unless you edit it out. Here is an example:

require 'mechanize'
require 'logger'

agent = Mechanize.new
agent.keep_alive = true
agent.log = Logger.new $stderr
agent.agent.http.debug_output = $stderr

agent.get 'http://google.com'

By default stderr is not buffered, unlike stdout.

@drbrain drbrain reopened this Apr 27, 2012
@drbrain
Copy link
Member

drbrain commented Apr 28, 2012

I got your log, here's the important part, edited for clarity and brevity:

D, [2012-04-27T20:18:18.505330 #8010] DEBUG -- : response-header: transfer-encoding => chunked
[…]
-> "497\r\n"
reading 1175 bytes...
-> ""
D, [2012-04-27T20:18:18.508102 #8010] DEBUG -- : Read 0 bytes (0 total)
-> "\xC5[…]"
D, [2012-04-27T20:18:18.592762 #8010] DEBUG -- : Read 1175 bytes (1175 total)
read 1175 bytes
reading 2 bytes...
-> "\r\n"
read 2 bytes
-> "0\r\n"
Conn close because of error end of file reached

The second-to-last line is the problem, RFC 2616 states that transfer-encoding: chunked must end with "0\r\n\r\n" and the lexis-nexis servers have omitted it. Since this comes from all the way down in net/http I can't directly work around this broken server. mechanize is supposed to raise Mechanize::ResponseReadError in a case like this, but it didn't happen which is a bug.

I'll look into fixing mechanize to raise the proper error and after that try to add detection for this type of error.

@metalfingers
Copy link

Thanks for the clarification. Does this mean, ultimately, that the page can't be scraped?

Best regards,
Richard

On Friday, April 27, 2012 at 8:59 PM, Eric Hodel wrote:

I got your log, here's the important part, edited for clarity and brevity:

D, [2012-04-27T20:18:18.505330 #8010] DEBUG -- : response-header: transfer-encoding => chunked
[…]
-> "497\r\n"
reading 1175 bytes...
-> ""
D, [2012-04-27T20:18:18.508102 #8010] DEBUG -- : Read 0 bytes (0 total)
-> "\xC5[…]"
D, [2012-04-27T20:18:18.592762 #8010] DEBUG -- : Read 1175 bytes (1175 total)
read 1175 bytes
reading 2 bytes...
-> "\r\n"
read 2 bytes
-> "0\r\n"
Conn close because of error end of file reached

The second-to-last line is the problem, RFC 2616 states that transfer-encoding: chunked must end with "0\r\n\r\n" and the lexis-nexis servers have omitted it. Since this comes from all the way down in net/http I can't directly work around this broken server. mechanize is supposed to raise Mechanize::ResponseReadError in a case like this, but it didn't happen which is a bug.

I'll look into fixing mechanize to raise the proper error and after that try to add detection for this type of error.


Reply to this email directly or view it on GitHub:
https://github.com/tenderlove/mechanize/issues/116#issuecomment-5393310

@drbrain
Copy link
Member

drbrain commented Apr 28, 2012

I need to fix some bugs, but it will be scrapable after that.

@metalfingers
Copy link

Cool. Thanks again!

drbrain added a commit that referenced this issue May 4, 2012
…::ResponseReadError in case of bad servers. Issue #116
@drbrain drbrain closed this as completed in dd65e11 May 4, 2012
@drbrain
Copy link
Member

drbrain commented May 4, 2012

@metalfingers the following should work for you after @dd65e11

mech = Mechanize.new
mech.ignore_bad_chunking = true

# … your script

Note that ignore_bad_chunking may cause data loss since mechanize can't tell if the EOF occurred mid-transfer or due to the missing CRLF. If you're concerned about data loss you can leave ignore_bad_chunking off and rescue an exception instead:

# … your script

begin
  page = agent.get('https://www.lexis.com/research/xlink')
rescue Mechanize::ChunkTerminationError => e
  # check e.body_io for completeness
  page = e.force_parse
end

Mechanize::ChunkTerminationError is a subclass of Mechanize::ResponseReadError so you can handle it the same as a content-length error:

http://mechanize.rubyforge.org/Mechanize.html#label-Problems+with+content-length

@metalfingers
Copy link

Thanks. I'll give it a go!

@aantix
Copy link

aantix commented May 14, 2012

I don't think the above exception handling helps with form submissions receiving this very error. When I encounter the EOF exception after a form submit, I receive the following backtrace :

/Users/jjones/.rvm/gems/ruby-1.9.2-p290@rbanyan/gems/net-http-persistent-2.6/lib/net/http/persistent.rb:839:in `rescue in request'
/Users/jjones/.rvm/gems/ruby-1.9.2-p290@rbanyan/gems/net-http-persistent-2.6/lib/net/http/persistent.rb:848:in `request'
/Users/jjones/.rvm/gems/ruby-1.9.2-p290@rbanyan/gems/mechanize-2.5/lib/mechanize/http/agent.rb:258:in `fetch'
/Users/jjones/.rvm/gems/ruby-1.9.2-p290@rbanyan/gems/mechanize-2.5/lib/mechanize.rb:1229:in `post_form'
/Users/jjones/.rvm/gems/ruby-1.9.2-p290@rbanyan/gems/mechanize-2.5/lib/mechanize.rb:515:in `submit'
/Users/jjones/.rvm/gems/ruby-1.9.2-p290@rbanyan/gems/mechanize-2.5/lib/mechanize/form.rb:178:in `submit'

And the exception object is of type Net::HTTP::Persistent::Error, so you can't call e.force_parse .

Still looking into this.

@aantix
Copy link

aantix commented May 14, 2012

By disabling keep_alive, we were able to alleviate the EOFError's that we were receiving. We found the fix outlined here :

http://rubyforge.org/pipermail/mechanize-users/2010-January/000486.html

bot.keep_alive = false

@drbrain
Copy link
Member

drbrain commented May 15, 2012

@aantix can you reproduce with mechanize 2.5.1? There was an unfortunate bug with form submission introduced 2.5 (see #229)

@SohumB
Copy link

SohumB commented Jun 2, 2012

Mechanize 2.5.1? I can.

Net::HTTP::Persistent::Error: too many connection resets (due to end of file reached - EOFError) after 175 requests on 37900880, last used 0.567640916 seconds ago
        from /var/lib/gems/1.9.1/gems/net-http-persistent-2.6/lib/net/http/persistent.rb:839:in `rescue in request'
        from /var/lib/gems/1.9.1/gems/net-http-persistent-2.6/lib/net/http/persistent.rb:848:in `request'
        from /var/lib/gems/1.9.1/gems/mechanize-2.5.1/lib/mechanize/http/agent.rb:258:in `fetch'
        from /var/lib/gems/1.9.1/gems/mechanize-2.5.1/lib/mechanize.rb:1229:in `post_form'
        from /var/lib/gems/1.9.1/gems/mechanize-2.5.1/lib/mechanize.rb:515:in `submit'

Most of my form submissions go through, it's just a few that don't, so I don't think it's the same bug. Trialling disabling keep_alive now.

@drbrain
Copy link
Member

drbrain commented Jun 2, 2012

What is the keep-alive timeout for this host?

Does reducing the idle_timeout to 0.5 help?

@SohumB
Copy link

SohumB commented Jun 2, 2012

Disabling keep_alive worked. The header is:

Keep-Alive: timeout=5, max=75

I'll check the idle_timeout as soon as I can.

@SohumB
Copy link

SohumB commented Jun 12, 2012

Setting agent.idle_timeout = 0.5 instead of agent.keep_alive = false helps in making the problem error out earlier, but there are still form submissions that don't go through.

Net::HTTP::Persistent::Error: too many connection resets (due to end of file reached - EOFError) after 10 requests on 42428560, last used 0.25454795 seconds ago
        from /var/lib/gems/1.9.1/gems/net-http-persistent-2.6/lib/net/http/persistent.rb:839:in `rescue in request'
        from /var/lib/gems/1.9.1/gems/net-http-persistent-2.6/lib/net/http/persistent.rb:848:in `request'
        from /var/lib/gems/1.9.1/gems/mechanize-2.5.1/lib/mechanize/http/agent.rb:258:in `fetch'
        from /var/lib/gems/1.9.1/gems/mechanize-2.5.1/lib/mechanize.rb:1229:in `post_form'
        from /var/lib/gems/1.9.1/gems/mechanize-2.5.1/lib/mechanize.rb:515:in `submit'
        from import.rb:134:in `block in <top (required)>'
        from import.rb:101:in `each'
        from import.rb:101:in `<top (required)>'
        from /var/lib/gems/1.9.1/gems/activesupport-2.3.12/lib/active_support/dependencies.rb:171:in `load'
        from /var/lib/gems/1.9.1/gems/activesupport-2.3.12/lib/active_support/dependencies.rb:171:in `block in load_with_new_constant_marking'
        from /var/lib/gems/1.9.1/gems/activesupport-2.3.12/lib/active_support/dependencies.rb:547:in `new_constants_in'
        from /var/lib/gems/1.9.1/gems/activesupport-2.3.12/lib/active_support/dependencies.rb:171:in `load_with_new_constant_marking'
        from (irb):1
        from /usr/bin/irb:12:in `<main>'

@SohumB
Copy link

SohumB commented Jun 12, 2012

Correction: The form submissions do appear to be going through; I misread my logs.

@drbrain
Copy link
Member

drbrain commented Jun 13, 2012

Since the "last used" value is 0.254 seconds, setting your idle_timeout to 0.25 may be better. It seems your server has a particularly aggressive idle timeout.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants