Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase number of citations returned by citedby #446

Merged
merged 5 commits into from
Oct 19, 2022

Conversation

jjshoots
Copy link
Contributor

@jjshoots jjshoots commented Oct 18, 2022

Fixes #444

Description

This allows citedby to return more than 1k citations, bypassing the problem with Google Scholar only display 100 pages.

Checklist

  • Check that the base branch is set to develop and not main.
  • Ensure that the documentation will be consistent with the code upon merging.
  • Add a line or a few lines that check the new features added. Doing this would require quite an amount of scraping.
  • Ensure that unit tests pass.
    If you don't have a premium proxy, some of the tests will be skipped.
    The tests that are run should pass without raising
    MaxTriesExceededException or other exceptions.

@arunkannawadi
Copy link
Collaborator

I would normally ask people to add a unit test that covers the new addition, but I don't think we should have a test that will scrape 100+ pages :-|

scholarly/_scholarly.py Outdated Show resolved Hide resolved
scholarly/_scholarly.py Outdated Show resolved Hide resolved
@arunkannawadi arunkannawadi linked an issue Oct 18, 2022 that may be closed by this pull request
1 task
@arunkannawadi arunkannawadi added this to the v.1.7.3 milestone Oct 18, 2022
@arunkannawadi
Copy link
Collaborator

I want to merge #445 to develop branch before I can merge these changes. I'll get back to it at the end of the day.

Copy link
Collaborator

@arunkannawadi arunkannawadi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good starting point for me to work on this, but I don't think I want to merge this as it is. I'm going to change the base branch to another branch so I could clean it up a bit more. I will also cook up a unit test that will still cover the newly added code.

self.logger.warning("Object not supported for bibtex exportation")
return

if object["bib"]["citedby"] < 999:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be <= 1000

pub_id = int(object["citedby_url"].split("=")[1].split("&")[0])
iter_list = []
while year_low < year_end:
iter_list.append(self.search_citedby(publication_id=pub_id, year_low=year_low, year_high=year_low+1))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

year_high should be the same as year_low. This actually fetches citations from two years instead of one year.

iter_list.append(self.search_citedby(publication_id=pub_id, year_low=year_low, year_high=year_low+1))
year_low += 1

return itertools.chain(*iter_list)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using yield from syntax would be much cleaner and would avoid importing itertools.

@arunkannawadi arunkannawadi changed the base branch from develop to citedby1k October 19, 2022 02:36
@arunkannawadi arunkannawadi merged commit 0fd909e into scholarly-python-package:citedby1k Oct 19, 2022
@jjshoots
Copy link
Contributor Author

@arunkannawadi Gotcha, looking forward to the next release :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Can't scrape more than ~1000 citedby papers
2 participants