You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The problem is in the google.py file, line 96 (the line might change given some modifications I've done to the code). The exception goes like this:
Traceback (most recent call last):
File "mini_qa.py", line 325, in
pretty_qa("Who is the world's best tennis player?")
File "mini_qa.py", line 78, in pretty_qa
for (j, (answer, score)) in enumerate(qa(question, source)[:num]):
File "mini_qa.py", line 96, in qa
gqa = google_qa(question)
File "mini_qa.py", line 111, in google_qa
for summary in get_summaries(query.query):
File "mini_qa.py", line 187, in get_summaries
results = search(query)
File "/Users/newlog/Documents/Proyectos/misc/github/mini_qa/google.py", line 181, in search
html = get_page(url)
File "/Users/newlog/Documents/Proyectos/misc/github/mini_qa/google.py", line 95, in get_page
response = urllib2.urlopen(request)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 126, in urlopen
return _opener.open(url, data, timeout)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 400, in open
response = meth(req, response)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 513, in http_response
'http', request, response, code, msg, hdrs)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 432, in error
result = self._call_chain(_args)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 372, in _call_chain
result = func(_args)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 619, in http_error_302
return self.parent.open(new, timeout=req.timeout)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 400, in open
response = meth(req, response)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 513, in http_response
'http', request, response, code, msg, hdrs)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 438, in error
return self._call_chain(_args)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 372, in _call_chain
result = func(_args)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 521, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 503: Service Unavailable
If you're getting a 503 it's possible that Google is unhappy you're making multiple requests over a short period of time. In earlier versions I had the same problem.
If that's the case, then you may wish to change the line which reads (in google.py)
time.sleep(pause+(random.random()-0.5)*5)
to something with a longer pause, say 10 or even 20 seconds. The problem, of course, is that this slows things down. But at least with the caching of results it should only be a problem once.
The problem is in the google.py file, line 96 (the line might change given some modifications I've done to the code). The exception goes like this:
Traceback (most recent call last):
File "mini_qa.py", line 325, in
pretty_qa("Who is the world's best tennis player?")
File "mini_qa.py", line 78, in pretty_qa
for (j, (answer, score)) in enumerate(qa(question, source)[:num]):
File "mini_qa.py", line 96, in qa
gqa = google_qa(question)
File "mini_qa.py", line 111, in google_qa
for summary in get_summaries(query.query):
File "mini_qa.py", line 187, in get_summaries
results = search(query)
File "/Users/newlog/Documents/Proyectos/misc/github/mini_qa/google.py", line 181, in search
html = get_page(url)
File "/Users/newlog/Documents/Proyectos/misc/github/mini_qa/google.py", line 95, in get_page
response = urllib2.urlopen(request)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 126, in urlopen
return _opener.open(url, data, timeout)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 400, in open
response = meth(req, response)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 513, in http_response
'http', request, response, code, msg, hdrs)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 432, in error
result = self._call_chain(_args)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 372, in _call_chain
result = func(_args)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 619, in http_error_302
return self.parent.open(new, timeout=req.timeout)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 400, in open
response = meth(req, response)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 513, in http_response
'http', request, response, code, msg, hdrs)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 438, in error
return self._call_chain(_args)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 372, in _call_chain
result = func(_args)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 521, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 503: Service Unavailable
My suggested solution is:
1 try:¬
0 response = urllib2.urlopen(request)¬
1 cookie_jar.extract_cookies(response, request)¬
2 html = response.read()¬
3 cookie_jar.save()¬
4 response.close()¬
5 return html¬
6 except urllib2.URLError, err:¬
7 print "[-] Error making the request: " + str(err)¬
In the get_page method.
And in the in the search method from the same file (the last lines):
10 # Request the Google Search results page.¬
9 html = get_page(url)¬
8 ¬
7 # Parse the response and extract the summaries¬
6 if html:¬
5 soup = BeautifulSoup.BeautifulSoup(html)¬
4 return soup.findAll("div", {"class": "s"})¬
3 else:¬
2 return []¬
Thanks for your work.
The text was updated successfully, but these errors were encountered: