Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scholarly.search_pubs runs forever #463

Closed
1 task
kostrykin opened this issue Dec 1, 2022 · 6 comments
Closed
1 task

scholarly.search_pubs runs forever #463

kostrykin opened this issue Dec 1, 2022 · 6 comments
Labels
proxy Proxy/Network issue. May not be exactly reproducible.

Comments

@kostrykin
Copy link

kostrykin commented Dec 1, 2022

Describe the bug
scholarly.search_pubs runs forever and does not return.

To Reproduce

from scholarly import scholarly, ProxyGenerator

pg = ProxyGenerator()
assert pg.FreeProxies()
scholarly.use_proxy(pg)

print('searching')
search_query = scholarly.search_pubs('10.1007/978-3-031-09037-0_20')
pub = next(search_query)
scholarly.pprint(pub)

Expected behavior
Data associated with the publication should be printed. This was working a month ago (I used an older version of scholarly back then and also did not use proxies).

Desktop:

  • Proxy service: FreeProxies
  • python version: tested with 3.7 and 3.9
  • OS: tested with macOS and Ubuntu Linux
  • Version 1.7.5

Do you plan on contributing?
Your response below will clarify whether the maintainers can expect you to fix the bug you reported.

  • Yes, I will create a Pull Request with the bugfix.
@kostrykin kostrykin added the bug label Dec 1, 2022
@arunkannawadi
Copy link
Collaborator

This is likely a transient issue due to unavailability of reliable proxies. If you tried it again with no proxies (not recommended to do regularly), it should work or try running the code as it is after some time.

@arunkannawadi
Copy link
Collaborator

And whenever possible, try fetching a paper via any of the author's profile. In this instance, you could use search_author_id routine to look for papers by 9TqkClQAAAAJ and iterate through the publication list.

@kostrykin
Copy link
Author

This is likely a transient issue due to unavailability of reliable proxies. If you tried it again with no proxies (not recommended to do regularly), it should work or try running the code as it is after some time.

Yes, running the code without the proxies works.

And whenever possible, try fetching a paper via any of the author's profile. In this instance, you could use search_author_id routine to look for papers by 9TqkClQAAAAJ and iterate through the publication list.

Why is that? I am looking for a specific paper, which has a uniquely identified by its DOI. Performing a search query for the author instead of the unique DOI and then filtering the results seems like a very circuitous way.

@arunkannawadi
Copy link
Collaborator

arunkannawadi commented Dec 1, 2022

Google Scholar actively tries to block programmatic queries that search its publication database but allows queries that search authors database. If you especially want to get info about a specific publication many times over a time period (say regularly track its citation count), I'd recommend going the author's profile page way.

@arunkannawadi arunkannawadi added proxy Proxy/Network issue. May not be exactly reproducible. and removed bug labels Dec 1, 2022
@kostrykin
Copy link
Author

Google Scholar actively tries to block programmatic queries that search its publication database but allows queries that search authors database. If you especially want to get info about a specific publication many times over a time period (say regularly track its citation count), I'd recommend going the author's profile page way.

Thanks for pointing this out!

@arunkannawadi
Copy link
Collaborator

I tried running your snippet again with FreeProxies and after a few attempts, it did successfully print the paper details. This was likely due to #465 which has now been fixed. Closing this issue as completed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
proxy Proxy/Network issue. May not be exactly reproducible.
Projects
None yet
Development

No branches or pull requests

2 participants