Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"year" column is not accurate #4

Open
lintool opened this issue Aug 24, 2019 · 1 comment
Open

"year" column is not accurate #4

lintool opened this issue Aug 24, 2019 · 1 comment

Comments

@lintool
Copy link
Owner

lintool commented Aug 24, 2019

Noted by @dragomirradev

The "year" column is based on the earliest year in the citation count histogram, which in fact is not the earliest year in terms of publications.

For example:
Screen Shot 2019-08-24 at 10 26 34 AM

But see:

Screen Shot 2019-08-24 at 10 26 57 AM

One reasonable hypothesis is that the histogram is capped at 20 years... but here's a counterexample:

Screen Shot 2019-08-24 at 10 28 54 AM

No idea what's going on.

From a crawling perspective, the histogram is easy to get. Getting actual earliest requires sort pubs by time and then "scrolling".

@mahtab-nejati
Copy link

One explanation for this discrepancy would be the histogram captures when citations occur and not the citations to the papers published in the year. For example, if a paper is published in the year 2010 and receives a citation in the year 2016, in the histogram, this citation is added to the year 2016.

As for the crawling issue, I have resolved it in a python scraper. I will link to it in a subsequent comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants