exceeding maximum number of tries #465

gboeing · 2022-12-11T22:07:14Z

Describe the bug
I have a simple script that runs once a week for to collect citation counts. It has always worked, until last night, when it started failing with the error detailed below. I have tried several times over several hours on multiple machines.

To Reproduce

I have two machines. The following code fails with different errors on the different machines.

from scholarly import scholarly
query = scholarly.search_author('james watson')
author = scholarly.fill(next(query), ['publications'])

Error on machine 1 (ubuntu, python 3.9, scholarly 1.7.5):

Traceback (most recent call last):
  File "/home/g/gb/gboeing/apps/citations/app/citations.py", line 15, in <module>
    author = scholarly.fill(next(query), ['publications'])
  File "/home/g/gb/gboeing/apps/citations/lib/python3.9/site-packages/scholarly/_navigator.py", line 237, in search_authors
    soup = self._get_soup(url)
  File "/home/g/gb/gboeing/apps/citations/lib/python3.9/site-packages/scholarly/_navigator.py", line 226, in _get_soup
    html = self._get_page('https://scholar.google.com{0}'.format(url))
  File "/home/g/gb/gboeing/apps/citations/lib/python3.9/site-packages/scholarly/_navigator.py", line 175, in _get_page
    return self._get_page(pagerequest, True)
  File "/home/g/gb/gboeing/apps/citations/lib/python3.9/site-packages/scholarly/_navigator.py", line 177, in _get_page
    raise MaxTriesExceededException("Cannot Fetch from Google Scholar.")
scholarly._proxy_generator.MaxTriesExceededException: Cannot Fetch from Google Scholar.

Error on machine 2 (ubuntu, python 3.11, scholarly 1.7.5)::

Traceback (most recent call last):
  File "/home/geoff/mambaforge/envs/citations/lib/python3.11/site-packages/fake_useragent/utils.py", line 139, in load
    browsers_dict[browser_name] = get_browser_user_agents(
                                  ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/geoff/mambaforge/envs/citations/lib/python3.11/site-packages/fake_useragent/utils.py", line 123, in get_browser_user_agents
    raise FakeUserAgentError(
fake_useragent.errors.FakeUserAgentError: No browser user-agent strings found for browser: chrome

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/geoff/mambaforge/envs/citations/lib/python3.11/urllib/request.py", line 1348, in do_open
    h.request(req.get_method(), req.selector, req.data, headers,
  File "/home/geoff/mambaforge/envs/citations/lib/python3.11/http/client.py", line 1282, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/home/geoff/mambaforge/envs/citations/lib/python3.11/http/client.py", line 1328, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/home/geoff/mambaforge/envs/citations/lib/python3.11/http/client.py", line 1277, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/home/geoff/mambaforge/envs/citations/lib/python3.11/http/client.py", line 1037, in _send_output
    self.send(msg)
  File "/home/geoff/mambaforge/envs/citations/lib/python3.11/http/client.py", line 975, in send
    self.connect()
  File "/home/geoff/mambaforge/envs/citations/lib/python3.11/http/client.py", line 1447, in connect
    super().connect()
  File "/home/geoff/mambaforge/envs/citations/lib/python3.11/http/client.py", line 941, in connect
    self.sock = self._create_connection(
                ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/geoff/mambaforge/envs/citations/lib/python3.11/socket.py", line 850, in create_connection
    raise exceptions[0]
  File "/home/geoff/mambaforge/envs/citations/lib/python3.11/socket.py", line 835, in create_connection
    sock.connect(sa)
TimeoutError: timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/geoff/mambaforge/envs/citations/lib/python3.11/site-packages/fake_useragent/utils.py", line 64, in get
    urlopen(
  File "/home/geoff/mambaforge/envs/citations/lib/python3.11/urllib/request.py", line 216, in urlopen
    return opener.open(url, data, timeout)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/geoff/mambaforge/envs/citations/lib/python3.11/urllib/request.py", line 519, in open
    response = self._open(req, data)
               ^^^^^^^^^^^^^^^^^^^^^
  File "/home/geoff/mambaforge/envs/citations/lib/python3.11/urllib/request.py", line 536, in _open
    result = self._call_chain(self.handle_open, protocol, protocol +
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/geoff/mambaforge/envs/citations/lib/python3.11/urllib/request.py", line 496, in _call_chain
    result = func(*args)
             ^^^^^^^^^^^
  File "/home/geoff/mambaforge/envs/citations/lib/python3.11/urllib/request.py", line 1391, in https_open
    return self.do_open(http.client.HTTPSConnection, req,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/geoff/mambaforge/envs/citations/lib/python3.11/urllib/request.py", line 1351, in do_open
    raise URLError(err)
urllib.error.URLError: <urlopen error timed out>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/geoff/Dropbox/Documents/School/Projects/Code/citations/citations.py", line 1, in <module>
    from scholarly import scholarly
  File "/home/geoff/mambaforge/envs/citations/lib/python3.11/site-packages/scholarly/__init__.py", line 4, in <module>
    scholarly = _Scholarly()
                ^^^^^^^^^^^^
  File "/home/geoff/mambaforge/envs/citations/lib/python3.11/site-packages/scholarly/_scholarly.py", line 34, in __init__
    self.__nav = Navigator()
                 ^^^^^^^^^^^
  File "/home/geoff/mambaforge/envs/citations/lib/python3.11/site-packages/scholarly/_navigator.py", line 26, in __call__
    cls._instances[cls] = super(Singleton, cls).__call__(*args,
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/geoff/mambaforge/envs/citations/lib/python3.11/site-packages/scholarly/_navigator.py", line 42, in __init__
    self.pm1 = ProxyGenerator()
               ^^^^^^^^^^^^^^^^
  File "/home/geoff/mambaforge/envs/citations/lib/python3.11/site-packages/scholarly/_proxy_generator.py", line 54, in __init__
    self._new_session()
  File "/home/geoff/mambaforge/envs/citations/lib/python3.11/site-packages/scholarly/_proxy_generator.py", line 454, in _new_session
    'User-Agent': UserAgent().random,
                  ^^^^^^^^^^^
  File "/home/geoff/mambaforge/envs/citations/lib/python3.11/site-packages/fake_useragent/fake.py", line 64, in __init__
    self.load()
  File "/home/geoff/mambaforge/envs/citations/lib/python3.11/site-packages/fake_useragent/fake.py", line 70, in load
    self.data_browsers = load_cached(
                         ^^^^^^^^^^^^
  File "/home/geoff/mambaforge/envs/citations/lib/python3.11/site-packages/fake_useragent/utils.py", line 209, in load_cached
    update(path, browsers, use_cache_server=use_cache_server, verify_ssl=verify_ssl)
  File "/home/geoff/mambaforge/envs/citations/lib/python3.11/site-packages/fake_useragent/utils.py", line 203, in update
    path, load(browsers, use_cache_server=use_cache_server, verify_ssl=verify_ssl)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/geoff/mambaforge/envs/citations/lib/python3.11/site-packages/fake_useragent/utils.py", line 154, in load
    jsonLines = get(
                ^^^^
  File "/home/geoff/mambaforge/envs/citations/lib/python3.11/site-packages/fake_useragent/utils.py", line 87, in get
    raise FakeUserAgentError("Maximum amount of retries reached")
fake_useragent.errors.FakeUserAgentError: Maximum amount of retries reached

Expected behavior
I expected the code to succeed without error, like it used to.

Screenshots
n/a

Desktop (please complete the following information):
(see my platform and version details above in reproduction section)

Do you plan on contributing?
Your response below will clarify whether the maintainers can expect you to fix the bug you reported.

Yes, I will create a Pull Request with the bugfix.

Additional context
n/a

The text was updated successfully, but these errors were encountered:

loiseaujc · 2022-12-12T15:10:46Z

I have precisely the same error message for a similar piece of code. Note, however, that if I run it using a Google Colab notebook, it does work. I hope it helps :)

swhussain110 · 2022-12-13T11:06:58Z

Is there any solution for this.?

raise MaxTriesExceededException("Cannot Fetch from Google Scholar.")
scholarly._proxy_generator.MaxTriesExceededException: Cannot Fetch from Google Scholar.

arunkannawadi · 2022-12-13T16:03:41Z

Have you tried running this with FreeProxy or other proxy services? There are examples on the documentation.

gboeing · 2022-12-13T16:34:07Z

Yes, it is the same error with this code snippet that uses FreeProxy:

from scholarly import ProxyGenerator, scholarly

pg = ProxyGenerator()
pg.FreeProxies()
scholarly.use_proxy(pg)

query = scholarly.search_author('james watson')
author = scholarly.fill(next(query), ['publications'])

---------------------------------------------------------------------------
HTTPError                                 Traceback (most recent call last)
File ~/mambaforge/envs/citations/lib/python3.11/site-packages/fake_useragent/utils.py:64, in get(url, verify_ssl)
     61     context = None
     63 with contextlib.closing(
---> 64     urlopen(
     65         request,
     66         timeout=settings.HTTP_TIMEOUT,
     67         context=context,
     68     )
     69 ) as response:
     70     return response.read()

File ~/mambaforge/envs/citations/lib/python3.11/urllib/request.py:216, in urlopen(url, data, timeout, cafile, capath, cadefault, context)
    215     opener = _opener
--> 216 return opener.open(url, data, timeout)

File ~/mambaforge/envs/citations/lib/python3.11/urllib/request.py:525, in OpenerDirector.open(self, fullurl, data, timeout)
    524     meth = getattr(processor, meth_name)
--> 525     response = meth(req, response)
    527 return response

File ~/mambaforge/envs/citations/lib/python3.11/urllib/request.py:634, in HTTPErrorProcessor.http_response(self, request, response)
    633 if not (200 <= code < 300):
--> 634     response = self.parent.error(
    635         'http', request, response, code, msg, hdrs)
    637 return response

File ~/mambaforge/envs/citations/lib/python3.11/urllib/request.py:563, in OpenerDirector.error(self, proto, *args)
    562 args = (dict, 'default', 'http_error_default') + orig_args
--> 563 return self._call_chain(*args)

File ~/mambaforge/envs/citations/lib/python3.11/urllib/request.py:496, in OpenerDirector._call_chain(self, chain, kind, meth_name, *args)
    495 func = getattr(handler, meth_name)
--> 496 result = func(*args)
    497 if result is not None:

File ~/mambaforge/envs/citations/lib/python3.11/urllib/request.py:643, in HTTPDefaultErrorHandler.http_error_default(self, req, fp, code, msg, hdrs)
    642 def http_error_default(self, req, fp, code, msg, hdrs):
--> 643     raise HTTPError(req.full_url, code, msg, hdrs, fp)

HTTPError: HTTP Error 403: Forbidden

During handling of the above exception, another exception occurred:

FakeUserAgentError                        Traceback (most recent call last)
File ~/mambaforge/envs/citations/lib/python3.11/site-packages/fake_useragent/utils.py:139, in load(browsers, use_cache_server, verify_ssl)
    138         browser_name = browser_name.lower().strip()
--> 139         browsers_dict[browser_name] = get_browser_user_agents(
    140             browser_name,
    141             verify_ssl=verify_ssl,
    142         )
    143 except Exception as exc:

File ~/mambaforge/envs/citations/lib/python3.11/site-packages/fake_useragent/utils.py:100, in get_browser_user_agents(browser, verify_ssl)
     97 """
     98 Retrieve browser user agent strings
     99 """
--> 100 html = get(
    101     settings.BROWSER_BASE_PAGE.format(browser=quote_plus(browser)),
    102     verify_ssl=verify_ssl,
    103 )
    104 html = html.decode("iso-8859-1")

File ~/mambaforge/envs/citations/lib/python3.11/site-packages/fake_useragent/utils.py:87, in get(url, verify_ssl)
     86 if attempt == settings.HTTP_RETRIES:
---> 87     raise FakeUserAgentError("Maximum amount of retries reached")
     88 else:

FakeUserAgentError: Maximum amount of retries reached

During handling of the above exception, another exception occurred:

TimeoutError                              Traceback (most recent call last)
File ~/mambaforge/envs/citations/lib/python3.11/urllib/request.py:1348, in AbstractHTTPHandler.do_open(self, http_class, req, **http_conn_args)
   1347 try:
-> 1348     h.request(req.get_method(), req.selector, req.data, headers,
   1349               encode_chunked=req.has_header('Transfer-encoding'))
   1350 except OSError as err: # timeout error

File ~/mambaforge/envs/citations/lib/python3.11/http/client.py:1282, in HTTPConnection.request(self, method, url, body, headers, encode_chunked)
   1281 """Send a complete request to the server."""
-> 1282 self._send_request(method, url, body, headers, encode_chunked)

File ~/mambaforge/envs/citations/lib/python3.11/http/client.py:1328, in HTTPConnection._send_request(self, method, url, body, headers, encode_chunked)
   1327     body = _encode(body, 'body')
-> 1328 self.endheaders(body, encode_chunked=encode_chunked)

File ~/mambaforge/envs/citations/lib/python3.11/http/client.py:1277, in HTTPConnection.endheaders(self, message_body, encode_chunked)
   1276     raise CannotSendHeader()
-> 1277 self._send_output(message_body, encode_chunked=encode_chunked)

File ~/mambaforge/envs/citations/lib/python3.11/http/client.py:1037, in HTTPConnection._send_output(self, message_body, encode_chunked)
   1036 del self._buffer[:]
-> 1037 self.send(msg)
   1039 if message_body is not None:
   1040 
   1041     # create a consistent interface to message_body

File ~/mambaforge/envs/citations/lib/python3.11/http/client.py:975, in HTTPConnection.send(self, data)
    974 if self.auto_open:
--> 975     self.connect()
    976 else:

File ~/mambaforge/envs/citations/lib/python3.11/http/client.py:1447, in HTTPSConnection.connect(self)
   1445 "Connect to a host on a given (SSL) port."
-> 1447 super().connect()
   1449 if self._tunnel_host:

File ~/mambaforge/envs/citations/lib/python3.11/http/client.py:941, in HTTPConnection.connect(self)
    940 sys.audit("http.client.connect", self, self.host, self.port)
--> 941 self.sock = self._create_connection(
    942     (self.host,self.port), self.timeout, self.source_address)
    943 # Might fail in OSs that don't implement TCP_NODELAY

File ~/mambaforge/envs/citations/lib/python3.11/socket.py:850, in create_connection(address, timeout, source_address, all_errors)
    849 if not all_errors:
--> 850     raise exceptions[0]
    851 raise ExceptionGroup("create_connection failed", exceptions)

File ~/mambaforge/envs/citations/lib/python3.11/socket.py:835, in create_connection(address, timeout, source_address, all_errors)
    834     sock.bind(source_address)
--> 835 sock.connect(sa)
    836 # Break explicitly a reference cycle

TimeoutError: timed out

During handling of the above exception, another exception occurred:

URLError                                  Traceback (most recent call last)
File ~/mambaforge/envs/citations/lib/python3.11/site-packages/fake_useragent/utils.py:64, in get(url, verify_ssl)
     61     context = None
     63 with contextlib.closing(
---> 64     urlopen(
     65         request,
     66         timeout=settings.HTTP_TIMEOUT,
     67         context=context,
     68     )
     69 ) as response:
     70     return response.read()

File ~/mambaforge/envs/citations/lib/python3.11/urllib/request.py:216, in urlopen(url, data, timeout, cafile, capath, cadefault, context)
    215     opener = _opener
--> 216 return opener.open(url, data, timeout)

File ~/mambaforge/envs/citations/lib/python3.11/urllib/request.py:519, in OpenerDirector.open(self, fullurl, data, timeout)
    518 sys.audit('urllib.Request', req.full_url, req.data, req.headers, req.get_method())
--> 519 response = self._open(req, data)
    521 # post-process response

File ~/mambaforge/envs/citations/lib/python3.11/urllib/request.py:536, in OpenerDirector._open(self, req, data)
    535 protocol = req.type
--> 536 result = self._call_chain(self.handle_open, protocol, protocol +
    537                           '_open', req)
    538 if result:

File ~/mambaforge/envs/citations/lib/python3.11/urllib/request.py:496, in OpenerDirector._call_chain(self, chain, kind, meth_name, *args)
    495 func = getattr(handler, meth_name)
--> 496 result = func(*args)
    497 if result is not None:

File ~/mambaforge/envs/citations/lib/python3.11/urllib/request.py:1391, in HTTPSHandler.https_open(self, req)
   1390 def https_open(self, req):
-> 1391     return self.do_open(http.client.HTTPSConnection, req,
   1392         context=self._context, check_hostname=self._check_hostname)

File ~/mambaforge/envs/citations/lib/python3.11/urllib/request.py:1351, in AbstractHTTPHandler.do_open(self, http_class, req, **http_conn_args)
   1350 except OSError as err: # timeout error
-> 1351     raise URLError(err)
   1352 r = h.getresponse()

URLError: <urlopen error timed out>

During handling of the above exception, another exception occurred:

FakeUserAgentError                        Traceback (most recent call last)
Cell In[1], line 1
----> 1 from scholarly import ProxyGenerator, scholarly
      3 pg = ProxyGenerator()
      4 pg.FreeProxies()

File ~/mambaforge/envs/citations/lib/python3.11/site-packages/scholarly/__init__.py:4
      2 from .data_types import Author, Publication
      3 from ._proxy_generator import ProxyGenerator, DOSException, MaxTriesExceededException
----> 4 scholarly = _Scholarly()

File ~/mambaforge/envs/citations/lib/python3.11/site-packages/scholarly/_scholarly.py:34, in _Scholarly.__init__(self)
     32 load_dotenv(find_dotenv())
     33 self.env = os.environ.copy()
---> 34 self.__nav = Navigator()
     35 self.logger = self.__nav.logger
     36 self._journal_categories = None

File ~/mambaforge/envs/citations/lib/python3.11/site-packages/scholarly/_navigator.py:26, in Singleton.__call__(cls, *args, **kwargs)
     24 def __call__(cls, *args, **kwargs):
     25     if cls not in cls._instances:
---> 26         cls._instances[cls] = super(Singleton, cls).__call__(*args,
     27                                                              **kwargs)
     28     return cls._instances[cls]

File ~/mambaforge/envs/citations/lib/python3.11/site-packages/scholarly/_navigator.py:42, in Navigator.__init__(self)
     38 self._max_retries = 5
     39 # A Navigator instance has two proxy managers, each with their session.
     40 # `pm1` manages the primary, premium proxy.
     41 # `pm2` manages the secondary, inexpensive proxy.
---> 42 self.pm1 = ProxyGenerator()
     43 self.pm2 = ProxyGenerator()
     44 self._session1 = self.pm1.get_session()

File ~/mambaforge/envs/citations/lib/python3.11/site-packages/scholarly/_proxy_generator.py:54, in ProxyGenerator.__init__(self)
     52 self._webdriver = None
     53 self._TIMEOUT = 5
---> 54 self._new_session()

File ~/mambaforge/envs/citations/lib/python3.11/site-packages/scholarly/_proxy_generator.py:454, in ProxyGenerator._new_session(self)
    449 # Suppress the misleading traceback from UserAgent()
    450 with self._suppress_logger('fake_useragent'):
    451     _HEADERS = {
    452         'accept-language': 'en-US,en',
    453         'accept': 'text/html,application/xhtml+xml,application/xml',
--> 454         'User-Agent': UserAgent().random,
    455     }
    456 self._session.headers.update(_HEADERS)
    458 if self._proxy_works:

File ~/mambaforge/envs/citations/lib/python3.11/site-packages/fake_useragent/fake.py:64, in FakeUserAgent.__init__(self, cache, use_cache_server, path, fallback, browsers, verify_ssl, safe_attrs)
     61 # initial empty data
     62 self.data_browsers = {}
---> 64 self.load()

File ~/mambaforge/envs/citations/lib/python3.11/site-packages/fake_useragent/fake.py:70, in FakeUserAgent.load(self)
     68 with self.load.lock:
     69     if self.cache:
---> 70         self.data_browsers = load_cached(
     71             self.path,
     72             self.browsers,
     73             use_cache_server=self.use_cache_server,
     74             verify_ssl=self.verify_ssl,
     75         )
     76     else:
     77         self.data_browsers = load(
     78             self.browsers,
     79             use_cache_server=self.use_cache_server,
     80             verify_ssl=self.verify_ssl,
     81         )

File ~/mambaforge/envs/citations/lib/python3.11/site-packages/fake_useragent/utils.py:209, in load_cached(path, browsers, use_cache_server, verify_ssl)
    207 def load_cached(path, browsers, use_cache_server=True, verify_ssl=True):
    208     if not exist(path):
--> 209         update(path, browsers, use_cache_server=use_cache_server, verify_ssl=verify_ssl)
    211     return read(path)

File ~/mambaforge/envs/citations/lib/python3.11/site-packages/fake_useragent/utils.py:203, in update(path, browsers, use_cache_server, verify_ssl)
    199 def update(path, browsers, use_cache_server=True, verify_ssl=True):
    200     rm(path)
    202     write(
--> 203         path, load(browsers, use_cache_server=use_cache_server, verify_ssl=verify_ssl)
    204     )

File ~/mambaforge/envs/citations/lib/python3.11/site-packages/fake_useragent/utils.py:154, in load(browsers, use_cache_server, verify_ssl)
    152 try:
    153     data = {}
--> 154     jsonLines = get(
    155         settings.CACHE_SERVER,
    156         verify_ssl=verify_ssl,
    157     ).decode("utf-8")
    158     for line in jsonLines.splitlines():
    159         data.update(json.loads(line))

File ~/mambaforge/envs/citations/lib/python3.11/site-packages/fake_useragent/utils.py:87, in get(url, verify_ssl)
     80 logger.debug(
     81     "Error occurred during fetching %s",
     82     url,
     83     exc_info=exc,
     84 )
     86 if attempt == settings.HTTP_RETRIES:
---> 87     raise FakeUserAgentError("Maximum amount of retries reached")
     88 else:
     89     logger.debug(
     90         "Sleeping for %s seconds",
     91         settings.HTTP_DELAY,
     92     )

FakeUserAgentError: Maximum amount of retries reached

gboeing · 2022-12-16T21:33:44Z

Tested again today. Same errors persist both with and without using a proxy.

giswqs · 2022-12-17T14:36:51Z

I just ran into the same error.

jkbren · 2022-12-17T19:32:46Z

Same error on my end, but I got it running again by upgrading pip install fake-useragent --upgrade

arunkannawadi · 2022-12-17T22:39:19Z

Thank you @jkbren I can confirm that I was getting the error, and after updating fake-useragent, it works. And there were a couple of updates to that library in the past few weeks, so this all makes sense.

Closing the issue for now, but if the issue persists even after upgrading fake-useragent, please feel free to reopen it.

gboeing · 2022-12-18T06:37:45Z

I upgraded to fake-useragent 1.1.1, free-proxy 1.0.6, and scholarly 1.7.6 but the same scholarly._proxy_generator.MaxTriesExceededException: Cannot Fetch from Google Scholar error persists, unchanged.

arunkannawadi · 2022-12-18T06:59:24Z

Do you still get the exactly same errors on both machine 1 and 2 as you initially posted?

zhubonan · 2022-12-18T20:03:15Z

I am also having this, the problem is that the (equivalent) of requests.get("https://scholar.google.com/citations?hl=en&user=8XFlTFIAAAAJ") gets a HTTP/1.1 429 Too Many Requests response. This happens on multiple machines that I have.
However, the same url works on my browser and also with httpx libraray.

Try to tweak the User-Agent but no success. Perhaps Google is blocking certain requests?

gboeing · 2022-12-19T18:24:04Z

@arunkannawadi yes here is the other machine: today I deleted and re-created its virtualenv for scholarly, but the error persists both with and without a proxy. Here's the environment packages list:

Package                       Version
----------------------------- -----------
alabaster                     0.7.12
arrow                         1.2.3
async-generator               1.10
attrs                         22.1.0
Babel                         2.11.0
beautifulsoup4                4.11.1
bibtexparser                  1.4.0
certifi                       2022.12.7
charset-normalizer            2.1.1
Deprecated                    1.2.13
docutils                      0.17.1
exceptiongroup                1.0.4
fake-useragent                1.1.1
free-proxy                    1.0.6
h11                           0.14.0
idna                          3.4
imagesize                     1.4.1
importlib-metadata            5.2.0
importlib-resources           5.10.1
Jinja2                        3.1.2
lxml                          4.9.2
MarkupSafe                    2.1.1
numpy                         1.24.0
outcome                       1.2.0
packaging                     22.0
pandas                        1.5.2
pip                           20.3.4
pkg-resources                 0.0.0
Pygments                      2.13.0
pyparsing                     3.0.9
PySocks                       1.7.1
python-dateutil               2.8.2
python-dotenv                 0.21.0
pytz                          2022.7
requests                      2.28.1
scholarly                     1.7.6
selenium                      4.7.2
setuptools                    44.1.1
six                           1.16.0
sniffio                       1.3.0
snowballstemmer               2.2.0
sortedcontainers              2.4.0
soupsieve                     2.3.2.post1
sphinx                        5.3.0
sphinx-rtd-theme              1.1.1
sphinxcontrib-applehelp       1.0.2
sphinxcontrib-devhelp         1.0.2
sphinxcontrib-htmlhelp        2.0.0
sphinxcontrib-jsmath          1.0.1
sphinxcontrib-qthelp          1.0.3
sphinxcontrib-serializinghtml 1.1.5
trio                          0.22.0
trio-websocket                0.9.2
typing-extensions             4.4.0
urllib3                       1.26.13
wrapt                         1.14.1
wsproto                       1.2.0
zipp                          3.11.0

And the error again:

Traceback (most recent call last):
  File "/home/g/gb/gboeing/apps/citations/app/citations.py", line 15, in <module>
    author = scholarly.fill(next(query), ['publications'])
  File "/home/g/gb/gboeing/apps/citations/lib/python3.9/site-packages/scholarly/_navigator.py", line 237, in search_authors
    soup = self._get_soup(url)
  File "/home/g/gb/gboeing/apps/citations/lib/python3.9/site-packages/scholarly/_navigator.py", line 226, in _get_soup
    html = self._get_page('https://scholar.google.com{0}'.format(url))
  File "/home/g/gb/gboeing/apps/citations/lib/python3.9/site-packages/scholarly/_navigator.py", line 175, in _get_page
    return self._get_page(pagerequest, True)
  File "/home/g/gb/gboeing/apps/citations/lib/python3.9/site-packages/scholarly/_navigator.py", line 177, in _get_page
    raise MaxTriesExceededException("Cannot Fetch from Google Scholar.")
scholarly._proxy_generator.MaxTriesExceededException: Cannot Fetch from Google Scholar.

gureckis · 2022-12-22T05:56:06Z

same problem for me btw!

arunkannawadi · 2022-12-24T01:49:49Z

I can confirm that from my home proxy and with FreeProxies, I get the error. I can get run the snippet successfully if I used ScraperAPI. I can only suppose that Google Scholar got better at detecting automated requests. If the following snippet fails, then there's little chance that scholarly can fetch your page successfully.

import requests
resp = requests.get("https://scholar.google.com/citations?hl=en&view_op=search_authors&mauthors=james%20watson")
if resp.status_code != 200:
    print(f"Request failed with {resp.status_code} because {resp.reason}")

I could try to experiment with httpx library as @zhubonan mentioned, but at the moment, using one of the premium proxies seems to be the only way.

zhubonan · 2022-12-27T10:04:55Z

Thanks for looking into this.
Any idea how google detects the request? When I use the browser or even just curl, the page loads perfectly fine.

zhubonan · 2023-01-16T16:51:23Z

I have updated to 1.7.10 and the error no longer happens.

arunkannawadi · 2023-01-16T17:21:42Z

Thank you for confirming. This issue must have been fixed since 1.7.8. I'll close this issue now.

To answer your earlier question, I do not understand how Google Scholar detects requests, but it was not responding to any requests sent from Python's request library, even the ones that are not forbidden by their policy. Changing the underlying library helped.

gboeing · 2023-01-16T18:59:39Z

Confirmed working for me now as well.

syheliel · 2023-01-17T09:38:49Z

Problem still exists when I'm using colab, here is my code:
https://colab.research.google.com/drive/1cJVvQBGLyKVNBI9YmgMgoW3ecNPYeFZE?usp=sharing

arunkannawadi · 2023-01-17T14:27:33Z

@syheliel please read our documentation. You're running queries that Google Scholar actively blocks without using proxies, which can get your IP address banned temporarily.

AndreaUnige · 2023-01-31T15:07:16Z

Hi @arunkannawadi, @gboeing , @zhubonan ,
I am trying the same query but got the same error:

raise MaxTriesExceededException("Cannot Fetch from Google Scholar.")
scholarly._proxy_generator.MaxTriesExceededException: Cannot Fetch from Google Scholar.

I am using scholarly 1.7.11, Ubuntu 2022, Python 3.10.

The code I'm trying is the simplest:

pg = ProxyGenerator()
pg.FreeProxies()
scholarly.use_proxy(pg)

 search_query = scholarly.search_pubs('Perception of physical stability and center of mass of 3D objects')
scholarly.pprint(next(search_query))

Is anyone able to help here ?
Many thanks guys!

abubelinha · 2023-05-02T08:49:54Z

Yes I can confirm the error persists after upgrading both scholarly and fake_useragent.

I am on Windows 7, Python 3.8, scholarly 1.7.11, fake_useragent 1.1.3

Side question: is it possible to get scholarly version from code? (tried scholarly.__version__ but didn't work)

PS - FWIW, I think updating fake_useragent might not be that relevant for this issue.
The few times I got a successful FreeProxies() scraping run, I was still using fake-useragent 0.1.11 (see #500).
That success happened yesterday, before I upgraded fake_useragent as suggested above.
(and now that I have fake-useragent 1.1.3 I keep on getting MaxTriesExceededException).
So I think getting FreeProxies() to work is just a matter of waiting for your lucky moment.

gboeing added the bug label Dec 11, 2022

arunkannawadi added the proxy Proxy/Network issue. May not be exactly reproducible. label Dec 12, 2022

arunkannawadi closed this as completed Dec 17, 2022

arunkannawadi reopened this Dec 24, 2022

This was referenced Dec 24, 2022

'Cannot fetch from Scholar' Error #472

Closed

Fill author cannot handle 302 redirects #469

Closed

This was referenced Dec 27, 2022

Use httpx instead of requests #475

Merged

citedby function still returns empty results? #471

Closed

scholarly.search_pubs runs forever #463

Closed

arunkannawadi closed this as completed Jan 16, 2023

abubelinha mentioned this issue May 2, 2023

MaxTriesExceededException: Cannot fetch from Google Scholar with free proxies #500

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

exceeding maximum number of tries #465

exceeding maximum number of tries #465

gboeing commented Dec 11, 2022

loiseaujc commented Dec 12, 2022

swhussain110 commented Dec 13, 2022

arunkannawadi commented Dec 13, 2022

gboeing commented Dec 13, 2022

gboeing commented Dec 16, 2022

giswqs commented Dec 17, 2022

jkbren commented Dec 17, 2022

arunkannawadi commented Dec 17, 2022

gboeing commented Dec 18, 2022

arunkannawadi commented Dec 18, 2022

zhubonan commented Dec 18, 2022 •

edited

Loading

gboeing commented Dec 19, 2022

gureckis commented Dec 22, 2022

arunkannawadi commented Dec 24, 2022

zhubonan commented Dec 27, 2022

zhubonan commented Jan 16, 2023

arunkannawadi commented Jan 16, 2023

gboeing commented Jan 16, 2023

syheliel commented Jan 17, 2023

arunkannawadi commented Jan 17, 2023

AndreaUnige commented Jan 31, 2023

abubelinha commented May 2, 2023 •

edited

Loading

exceeding maximum number of tries #465

exceeding maximum number of tries #465

Comments

gboeing commented Dec 11, 2022

loiseaujc commented Dec 12, 2022

swhussain110 commented Dec 13, 2022

arunkannawadi commented Dec 13, 2022

gboeing commented Dec 13, 2022

gboeing commented Dec 16, 2022

giswqs commented Dec 17, 2022

jkbren commented Dec 17, 2022

arunkannawadi commented Dec 17, 2022

gboeing commented Dec 18, 2022

arunkannawadi commented Dec 18, 2022

zhubonan commented Dec 18, 2022 • edited Loading

gboeing commented Dec 19, 2022

gureckis commented Dec 22, 2022

arunkannawadi commented Dec 24, 2022

zhubonan commented Dec 27, 2022

zhubonan commented Jan 16, 2023

arunkannawadi commented Jan 16, 2023

gboeing commented Jan 16, 2023

syheliel commented Jan 17, 2023

arunkannawadi commented Jan 17, 2023

AndreaUnige commented Jan 31, 2023

abubelinha commented May 2, 2023 • edited Loading

zhubonan commented Dec 18, 2022 •

edited

Loading

abubelinha commented May 2, 2023 •

edited

Loading