urllib2 uncatched exception when request is not answered #2

newlog · 2012-09-19T13:29:14Z

The problem is in the google.py file, line 96 (the line might change given some modifications I've done to the code). The exception goes like this:

Traceback (most recent call last):
File "mini_qa.py", line 325, in
pretty_qa("Who is the world's best tennis player?")
File "mini_qa.py", line 78, in pretty_qa
for (j, (answer, score)) in enumerate(qa(question, source)[:num]):
File "mini_qa.py", line 96, in qa
gqa = google_qa(question)
File "mini_qa.py", line 111, in google_qa
for summary in get_summaries(query.query):
File "mini_qa.py", line 187, in get_summaries
results = search(query)
File "/Users/newlog/Documents/Proyectos/misc/github/mini_qa/google.py", line 181, in search
html = get_page(url)
File "/Users/newlog/Documents/Proyectos/misc/github/mini_qa/google.py", line 95, in get_page
response = urllib2.urlopen(request)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 126, in urlopen
return _opener.open(url, data, timeout)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 400, in open
response = meth(req, response)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 513, in http_response
'http', request, response, code, msg, hdrs)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 432, in error
result = self._call_chain(_args)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 372, in _call_chain
result = func(_args)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 619, in http_error_302
return self.parent.open(new, timeout=req.timeout)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 400, in open
response = meth(req, response)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 513, in http_response
'http', request, response, code, msg, hdrs)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 438, in error
return self._call_chain(_args)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 372, in _call_chain
result = func(_args)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 521, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)

urllib2.HTTPError: HTTP Error 503: Service Unavailable

My suggested solution is:

1 try:¬
0 response = urllib2.urlopen(request)¬
1 cookie_jar.extract_cookies(response, request)¬
2 html = response.read()¬
3 cookie_jar.save()¬
4 response.close()¬
5 return html¬
6 except urllib2.URLError, err:¬
7 print "[-] Error making the request: " + str(err)¬

In the get_page method.

And in the in the search method from the same file (the last lines):

10 # Request the Google Search results page.¬
9 html = get_page(url)¬
8 ¬
7 # Parse the response and extract the summaries¬
6 if html:¬
5 soup = BeautifulSoup.BeautifulSoup(html)¬
4 return soup.findAll("div", {"class": "s"})¬
3 else:¬
2 return []¬

Thanks for your work.

mnielsen · 2012-09-26T22:32:36Z

If you're getting a 503 it's possible that Google is unhappy you're making multiple requests over a short period of time. In earlier versions I had the same problem.

If that's the case, then you may wish to change the line which reads (in google.py)

time.sleep(pause+(random.random()-0.5)*5)

to something with a longer pause, say 10 or even 20 seconds. The problem, of course, is that this slows things down. But at least with the caching of results it should only be a problem once.

newlog · 2012-09-26T22:45:09Z

Hello Michael,

I know, I know. The reason is what you said ;) I just meant that the exception was not caught, just for being purist.

BTW, I commented that line hahaha.

Again, great work.

mnielsen closed this as completed Sep 27, 2012

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

urllib2 uncatched exception when request is not answered #2

urllib2 uncatched exception when request is not answered #2

newlog commented Sep 19, 2012

mnielsen commented Sep 26, 2012

newlog commented Sep 26, 2012

urllib2 uncatched exception when request is not answered #2

urllib2 uncatched exception when request is not answered #2

Comments

newlog commented Sep 19, 2012

urllib2.HTTPError: HTTP Error 503: Service Unavailable

mnielsen commented Sep 26, 2012

newlog commented Sep 26, 2012