100% CPU usage attempting to get() https URL without ssl verification #312

Closed
Phrogz opened this Issue May 9, 2013 · 6 comments

Comments

Projects
None yet
3 participants

Phrogz commented May 9, 2013

Using Mechanize 2.6.0 on Ruby 1.9.3 I'm trying to fetch a web page over HTTPS from Windows 7x64. When I attempt to get() the URL the CPU usage goes to 100% and the method never returns:

require 'mechanize'
uri = "https://my.com/wiki/api.php?action=query&titles=US4&prop=info&format=xml"
agent = Mechanize.new
u,p   = %w[myusername mypassword]
agent.add_auth( uri, u, p )
agent.agent.http.verify_mode = OpenSSL::SSL::VERIFY_NONE 
info = agent.get( uri )

When I interrupt it, I get these stack traces (three different runs):

>> info = agent.get( page_api )
IRB::Abort: abort then interrupt!
    from C:/Ruby193/lib/ruby/gems/1.9.1/gems/mechanize-2.6.0/lib/mechanize/http/www_authenticate_parser.rb:27:in `call'
    from C:/Ruby193/lib/ruby/gems/1.9.1/gems/mechanize-2.6.0/lib/mechanize/http/www_authenticate_parser.rb:27:in `parse'
    from C:/Ruby193/lib/ruby/gems/1.9.1/gems/mechanize-2.6.0/lib/mechanize/http/agent.rb:716:in `response_authenticate'
    from C:/Ruby193/lib/ruby/gems/1.9.1/gems/mechanize-2.6.0/lib/mechanize/http/agent.rb:306:in `fetch'
    from C:/Ruby193/lib/ruby/gems/1.9.1/gems/mechanize-2.6.0/lib/mechanize.rb:431:in `get'
    from (irb):10
    from C:/Ruby193/bin/irb:12:in `<main>'
>> info = agent.get( page_api )
IRB::Abort: abort then interrupt!
    from C:/Ruby193/lib/ruby/gems/1.9.1/gems/mechanize-2.6.0/lib/mechanize/http/www_authenticate_parser.rb:29:in `call'
    from C:/Ruby193/lib/ruby/gems/1.9.1/gems/mechanize-2.6.0/lib/mechanize/http/www_authenticate_parser.rb:29:in `new'
    from C:/Ruby193/lib/ruby/gems/1.9.1/gems/mechanize-2.6.0/lib/mechanize/http/www_authenticate_parser.rb:29:in `parse'
    from C:/Ruby193/lib/ruby/gems/1.9.1/gems/mechanize-2.6.0/lib/mechanize/http/agent.rb:716:in `response_authenticate'
    from C:/Ruby193/lib/ruby/gems/1.9.1/gems/mechanize-2.6.0/lib/mechanize/http/agent.rb:306:in `fetch'
    from C:/Ruby193/lib/ruby/gems/1.9.1/gems/mechanize-2.6.0/lib/mechanize.rb:431:in `get'
    from (irb):11
    from C:/Ruby193/bin/irb:12:in `<main>'
>> info = agent.get( page_api )
IRB::Abort: abort then interrupt!
    from C:/Ruby193/lib/ruby/gems/1.9.1/gems/mechanize-2.6.0/lib/mechanize/http/www_authenticate_parser.rb:114:in `call'
    from C:/Ruby193/lib/ruby/gems/1.9.1/gems/mechanize-2.6.0/lib/mechanize/http/www_authenticate_parser.rb:114:in `token'
    from C:/Ruby193/lib/ruby/gems/1.9.1/gems/mechanize-2.6.0/lib/mechanize/http/www_authenticate_parser.rb:31:in `parse'
    from C:/Ruby193/lib/ruby/gems/1.9.1/gems/mechanize-2.6.0/lib/mechanize/http/agent.rb:716:in `response_authenticate'
    from C:/Ruby193/lib/ruby/gems/1.9.1/gems/mechanize-2.6.0/lib/mechanize/http/agent.rb:306:in `fetch'
    from C:/Ruby193/lib/ruby/gems/1.9.1/gems/mechanize-2.6.0/lib/mechanize.rb:431:in `get'
    from (irb):12
    from C:/Ruby193/bin/irb:12:in `<main>'

Trying the same code on OS X produces the same result.

Owner

leejarvis commented May 10, 2013

I can't reproduce this on OS X but I think that's because I don't have the correct auth. Could you enable logging and post your results? agent.log = Logger.new(STDOUT)

Phrogz commented May 13, 2013

With logging (and a ctrl-c about 15s after the last log line):

I, [2013-05-13T08:55:06.199593 #2832]  INFO -- : Net::HTTP::Get: /engwiki/api.php?action=query&titles=Devtools/UI_Composer/DesignSpec/US7294&prop=info&format=xm
D, [2013-05-13T08:55:06.200592 #2832] DEBUG -- : request-header: accept => */*
D, [2013-05-13T08:55:06.200592 #2832] DEBUG -- : request-header: user-agent => Mechanize/2.6.0 Ruby/1.9.3p194 (http://github.com/sparklemotion/mechanize/)
D, [2013-05-13T08:55:06.201591 #2832] DEBUG -- : request-header: accept-encoding => gzip,deflate,identity
D, [2013-05-13T08:55:06.201591 #2832] DEBUG -- : request-header: accept-charset => ISO-8859-1,utf-8;q=0.7,*;q=0.7
D, [2013-05-13T08:55:06.202591 #2832] DEBUG -- : request-header: accept-language => en-us,en;q=0.5
D, [2013-05-13T08:55:06.202591 #2832] DEBUG -- : request-header: host => wiki.nvidia.com
I, [2013-05-13T08:55:08.136043 #2832]  INFO -- : status: Net::HTTPUnauthorized 1.1 401 Unauthorized
D, [2013-05-13T08:55:08.143037 #2832] DEBUG -- : response-header: content-length => 1656
D, [2013-05-13T08:55:08.143037 #2832] DEBUG -- : response-header: content-type => text/html
D, [2013-05-13T08:55:08.144036 #2832] DEBUG -- : response-header: server => Microsoft-IIS/6.0
D, [2013-05-13T08:55:08.145035 #2832] DEBUG -- : response-header: www-authenticate => Negotiate, NTLM, Basic realm="nvidia.com"
D, [2013-05-13T08:55:08.145035 #2832] DEBUG -- : response-header: x-powered-by => ASP.NET
D, [2013-05-13T08:55:08.145035 #2832] DEBUG -- : response-header: date => Mon, 13 May 2013 14:55:07 GMT
D, [2013-05-13T08:55:08.146035 #2832] DEBUG -- : Read 1656 bytes (1656 total)
C:/Ruby193/lib/ruby/gems/1.9.1/gems/mechanize-2.6.0/lib/mechanize/http/www_authenticate_parser.rb:33:in `parse': Interrupt
        from C:/Ruby193/lib/ruby/gems/1.9.1/gems/mechanize-2.6.0/lib/mechanize/http/agent.rb:716:in `response_authenticate'
        from C:/Ruby193/lib/ruby/gems/1.9.1/gems/mechanize-2.6.0/lib/mechanize/http/agent.rb:306:in `fetch'
        from C:/Ruby193/lib/ruby/gems/1.9.1/gems/mechanize-2.6.0/lib/mechanize.rb:431:in `get'
        from tmp.rb:9:in `<main>'

Here is the desired result (same uri, user, and password). Perhaps of note is the (syntactically-invalid) leading whitespace before the XML declaration.

def fetch_https_without_ssl_verification( uri, user=nil, pass=nil )
  `curl -s -k #{%Q{-u "#{user}#{":"<<pass if pass}"} if user} "#{uri}"`
end
p fetch_https_without_ssl_verification(uri, u, p)
#=> "\t\t       <?xml version=\"1.0\"?><api><query><normalized><n from=\"Devtools/UI_Composer/DesignSpec/US7294\" to=\"Devtools/UI Composer/DesignSpec/US7294\" /></normalized><pages><page ns=\"0\" title=\"Devtools/UI Composer/DesignSpec/US7294\" missing=\"\" /></pages></query></api>"
Owner

leejarvis commented May 17, 2013

OK tthis is because it's authenticating over the NTLM protocol. It's essentially the same issue as #273. Unfortunately Mechanize has no tests implementing NTLM authentication it was just loosely thrown together with nothing more than user feedback to go on. I've thought about removing support for NTLM because its use is no longer recommended by Microsoft and in some cases discouraged, mixed with the inability for me to test this locally I don't think it's going to be a quick fix.

I'm going to close this issue in favour of #273 so we can keep everything there.

leejarvis closed this May 17, 2013

Owner

leejarvis commented Jun 10, 2013

@Phrogz I'm trying to reproduce this but I can't for the life of me find public hosts that are using NTLM. Could you help out here at all?

iGallina commented Apr 2, 2014

@leejarvis usually the NTLM hosts available are old Microsoft Intranets.
I am facing problems with NTLM authentication myself, and I am considering using some of the Rack NTLM gems.

Owner

leejarvis commented Apr 7, 2014

@iGallina I've actually removed support for NTLM from Mechanize per #321 (it's not released yet, though). I'm not keen on continuing to work around the crap for an auth system that is rarely used and no longer officially supported.

I hope you figure it out, though!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment