Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Raising StopIteration Errors for some queries even when the http requests are successful (using ScraperAPI). #508

Open
1 task
EthanC111 opened this issue Jul 21, 2023 · 3 comments
Labels

Comments

@EthanC111
Copy link

Describe the bug
Both of the queries provided below will throw StopIteration errors even when the http requests are successful.

To Reproduce

import logging
import sys

# Configure logging
logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s - %(levelname)s - %(message)s",
    handlers=[
        logging.StreamHandler(sys.stdout),
    ],
)

from scholarly import scholarly
from scholarly import ProxyGenerator

scraper_api_key = "YOUR_SCRAPER_API_KEY"
# query = "A Comparison of Transformer and Recurrent Neural Networks on Multilingual Neural Machine Translation"
query = "Reducing the Dimensionality of Data with Neural Networks."

pg = ProxyGenerator()
success = pg.ScraperAPI(scraper_api_key)
scholarly.use_proxy(pg)
results = scholarly.search_pubs(query)
paper_info = next(results)
print(paper_info)

Expected behavior
Should be printing the paper information.

Screenshots
scholary_bug

Desktop (please complete the following information):

  • Proxy service: ScraperAPI
  • python version: 3.11.4
  • OS: linux
  • Version 1.7.11

Do you plan on contributing?
Your response below will clarify whether the maintainers can expect you to fix the bug you reported.

  • Yes, I will create a Pull Request with the bugfix.

Additional context
Add any other context about the problem here.

@EthanC111 EthanC111 added the bug label Jul 21, 2023
@ronny3
Copy link

ronny3 commented Aug 29, 2023

I believe this is when the result is the new google scholar UI that came in this June or so. It happens when it's a single result most of the time.
You can try this in publication_parser.py. Add this to line 61.
+ self._soup.find_all('div', class_='gs_r gs_or gs_scl gs_fmar')

@kostrykin
Copy link

I believe this is when the result is the new google scholar UI that came in this June or so. It happens when it's a single result most of the time.
You can try this in publication_parser.py. Add this to line 61.
+ self._soup.find_all('div', class_='gs_r gs_or gs_scl gs_fmar')

Thanks for pointing this out, just to be clear, the full line 61 should be changed from

self._rows = self._soup.find_all('div', class_='gs_r gs_or gs_scl') + self._soup.find_all('div', class_='gsc_mpat_ttl')

to

self._rows = self._soup.find_all('div', class_='gs_r gs_or gs_scl gs_fmar') + self._soup.find_all('div', class_='gsc_mpat_ttl')

Then it works.

@gdudek
Copy link

gdudek commented Apr 14, 2024

Was seeing intermittent failures again in April 2024. Needed to update that line (around 61) to be:
self._rows = self._soup.find_all('div', class_='gs_r gs_or gs_scl gs_fmar') + self._soup.find_all('div', class_='gsc_mpat_ttl') + self._soup.find_all('div', class_='gs_r gs_or gs_scl')

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants