Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get_citation_history(id) gives error for some id's #29

Closed
wdarling opened this issue Feb 18, 2016 · 7 comments
Closed

get_citation_history(id) gives error for some id's #29

wdarling opened this issue Feb 18, 2016 · 7 comments

Comments

@wdarling
Copy link

Hi. For some reason when I try to get my citation history I get the following error:
Error in data.frame(year = years, cites = vals) :
arguments imply differing number of rows: 9, 8
This doesn't happen for many other IDs that I've tried. My id is 'xpECwJQAAAAJ'... do you know why it doesn't work? Too few citations? ;-)

@jefferis
Copy link
Collaborator

Too few citations? ;-)

Sort of. The code relies on finding years and bars indicating number of citations separately. You have one year (2009) in your chart without citations. There is no 0 height bar for 2009. Looking at the code I don't see an easy fix without reworking the code e.g. to look at the x axis position of the bars and figure out where there are gaps.

@jefferis
Copy link
Collaborator

Actually there seems some information in the style tag. This fixes for me (see if(length(years)>length(vals)){):

get_citation_history <- function(id) {

    ## Ensure only one ID
    id <- tidy_id(id)

    ## Read the page and parse the key data
    url_template <- "http://scholar.google.com/citations?hl=en&user=%s&pagesize=100&view_op=list_works"
    url <- sprintf(url_template, id)

    ## A better way would actually be to read out the plot of citations
    page <- GET(url, handle=getOption("scholar_handle")) %>% read_html()
    years <- page %>% html_nodes(xpath="//*/span[@class='gsc_g_t']") %>%
        html_text() %>% as.numeric()
    vals <- page %>% html_nodes(xpath="//*/span[@class='gsc_g_al']") %>%
        html_text() %>% as.numeric()
    if(length(years)>length(vals)){
      # Some years don't have citations. 
      # We need to match the citation counts and years
      # <a href="javascript:void(0)" class="gsc_g_a" style="left:8px;height:5px;z-index:9">\n  <span class="gsc_g_al">2</span>\n</a>
      style_tags=page %>% html_nodes(css = '.gsc_g_a') %>% 
        html_attr('style')
      # these z indices seem to be the indices starting with the last year
      zindices=as.integer(stringr::str_match(style_tags, 'z-index:([0-9]+)')[,2])
      # empty vector of 0s
      allvals=integer(length=length(years))
      # fill in 
      allvals[zindices]=rev(vals)
      # and then reverse
      vals=rev(allvals)
    }
    df <- data.frame(year=years, cites=vals)

    return(df)
}

@wdarling
Copy link
Author

Yes that did the trick! Thanks 👍

@wdarling
Copy link
Author

Hi, unfortunately I spoke too soon. The fix makes the function not crash, but it doesn't get the years and citations lined up properly. The problem is these two lines:

# fill in
allvals[zindices]=rev(vals)
# and then reverse
vals=rev(allvals)

When you do a reverse, the missing value then lines up with the second most recent year instead of the second oldest year. Here is some code that works (though you might have a more efficient way of doing it):

zindices=as.integer(stringr::str_match(style_tags, 'z-index:([0-9]+)')[,2])
allvals=integer(length=length(years))
i = 1
j = 1
prev = length(years)+1
for(z in zindices) {
  if((z+1) != prev) {
    i = i+1
  }
  allvals[i] = vals[j]    
  i = i+1
  j = j+1
  prev = z
}

Basically what I'm doing is checking if the previous zindex has a break in continuity. If it does then there were no citations for this year and we should just use the 0 that was already placed in allvals.

@jefferis
Copy link
Collaborator

Sorry you're right there was one rev too many. It should have been:

      # fill in 
      allvals[zindices]=vals
      # and then reverse
      vals=rev(allvals)

@jefferis
Copy link
Collaborator

Try devtools::install_github("jefferis/scholar@develop")

@wdarling
Copy link
Author

Ok -- this time it works 😄 Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants