Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SSL session reuse may fail #30

Open
nirvdrum opened this issue Jan 19, 2012 · 1 comment
Open

SSL session reuse may fail #30

nirvdrum opened this issue Jan 19, 2012 · 1 comment

Comments

@nirvdrum
Copy link

I've just run into a situation where the reuse of an SSL session caused an exception and Spidr subsequently skipped the page. Currently, the exception is silently swallowed, so I modified it to grab the following trace:

EOFError (end of file reached):
  /home/nirvdrum/.rvm/rubies/ruby-1.9.3-p0/lib/ruby/1.9.1/openssl/buffering.rb:174:in `sysread_nonblock'
  /home/nirvdrum/.rvm/rubies/ruby-1.9.3-p0/lib/ruby/1.9.1/openssl/buffering.rb:174:in `read_nonblock'
  /home/nirvdrum/.rvm/rubies/ruby-1.9.3-p0/lib/ruby/1.9.1/net/protocol.rb:141:in `rbuf_fill'
  /home/nirvdrum/.rvm/rubies/ruby-1.9.3-p0/lib/ruby/1.9.1/net/protocol.rb:122:in `readuntil'
  /home/nirvdrum/.rvm/rubies/ruby-1.9.3-p0/lib/ruby/1.9.1/net/protocol.rb:132:in `readline'
  /home/nirvdrum/.rvm/rubies/ruby-1.9.3-p0/lib/ruby/1.9.1/net/http.rb:2562:in `read_status_line'
  /home/nirvdrum/.rvm/rubies/ruby-1.9.3-p0/lib/ruby/1.9.1/net/http.rb:2551:in `read_new'
  /home/nirvdrum/.rvm/rubies/ruby-1.9.3-p0/lib/ruby/1.9.1/net/http.rb:1319:in `block in transport_request'
  /home/nirvdrum/.rvm/rubies/ruby-1.9.3-p0/lib/ruby/1.9.1/net/http.rb:1316:in `catch'
  /home/nirvdrum/.rvm/rubies/ruby-1.9.3-p0/lib/ruby/1.9.1/net/http.rb:1316:in `transport_request'
  /home/nirvdrum/.rvm/rubies/ruby-1.9.3-p0/lib/ruby/1.9.1/net/http.rb:1293:in `request'
  rest-client (1.6.7) lib/restclient/net_http_ext.rb:51:in `request'
  /home/nirvdrum/.rvm/rubies/ruby-1.9.3-p0/lib/ruby/1.9.1/net/http.rb:1026:in `get'
  spidr (0.4.1) lib/spidr/agent.rb:513:in `block in get_page'
  spidr (0.4.1) lib/spidr/agent.rb:684:in `prepare_request'
  spidr (0.4.1) lib/spidr/agent.rb:512:in `get_page'
  app/models/cookie_login_option.rb:150:in `fetch_remote_form'
  app/models/cookie_login_option.rb:158:in `block in fetch_remote_form'
  spidr (0.4.1) lib/spidr/agent.rb:518:in `block in get_page'
  spidr (0.4.1) lib/spidr/agent.rb:684:in `prepare_request'
  spidr (0.4.1) lib/spidr/agent.rb:512:in `get_page'
  app/models/cookie_login_option.rb:150:in `fetch_remote_form'
  app/models/cookie_login_option.rb:158:in `block in fetch_remote_form'
  spidr (0.4.1) lib/spidr/agent.rb:518:in `block in get_page'
  spidr (0.4.1) lib/spidr/agent.rb:684:in `prepare_request'
  spidr (0.4.1) lib/spidr/agent.rb:512:in `get_page'

If I modify the code to remove the session cache, I am able to fetch the page okay. It might be good to catch EOFError and retry with a new session in the event this happens. Catching the error all over the place could be messy though.

@a-yiorgos
Copy link

Could this be a version issue? I had something like this happen to me with a simple spider that printed the urls from a site. Using ree it would fail, while with 2.0.0 is would work fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants