Fixing Wiley issues and adding file-types #73

willpearse · 2015-11-13T19:56:58Z

Where the suffix seems reasonable (*.4-letters) an attr of suffix is added to the return that is the file's type. This happens on download (...if you rename a file I can't figure out its type...), but given caching is the default it shouldn't be difficult for the user to just download, double-check the file type (...though, honestly, they should probably know this before downloading!) and then go from there. Address discussion on #3 and #69

The Wiley issue ( #71 ) is addressed in this, in that Ecology Letters articles download (which have been updated), and other Wiley journals that haven't been updated also download (with check and tests for that as well). It looks like Wiley haven't finished updating their website yet and (frankly) the new layout for Ecology Letters doesn't load on my browser, let alone in R. If they retain the backwards-compatibility page on the same link they have now, this code will continue to work (hopefully) whatever they do to the new version. I'll have to revisit this; I have at least written a number of Wiley tests now so we should know when other journals etc. break.

sckott · 2015-11-14T00:54:51Z

awesome, thanks

Testing locally, getting some failed tests, e.,g, :

<r> expect_true(file.exists(ft_get_si("10.1371/journal.pone.0127900", 1)))
#> Error in get_si_pub(x) : 
#>  Cannot find publisher for DOI: 10.1371/journal.pone.0127900 

<r> expect_true(file.exists(ft_get_si("10.1111/ele.12437", 1)))
#>  Error in get_si_pub(x) : Cannot find publisher for DOI: 10.1111/ele.12437

<r> expect_true(file.exists(ft_get_si("10.1126/science.1255768", "Appendix_BanksLeite_etal.txt")))
#> Error in get_si_pub(x) : 
#>  Cannot find publisher for DOI: 10.1126/science.1255768

Seems like this stems from https://github.com/willpearse/fulltext/blob/master/R/ft_get_si_plugins.R#L10 - the object pub is a data.frame, and at least for me that logical test sometimes gives TRUE when it shouldn't -

willpearse · 2015-11-15T19:51:13Z

Sorry for the delay getting back to you. This is disturbing to me, because these work locally for me. I do run all tests before PR-ing; clearly something is going wrong from my end. More soon; sorry about this.

sckott · 2015-11-15T19:55:03Z

thanks 👍

willpearse · 2015-11-15T20:05:44Z

I've got rid of what could be causing it, but it's hard for me to say anything as (as I say) I can't trigger those errors. I do, however, sometimes get errors when using Wiley et al., and I'm concerned that's something server-side because sometimes it works, and sometimes it doesn't. Again, could just be me - frankly, I can't get the Ecology Letters website to load sometime on my machine!

sckott · 2015-11-15T20:30:21Z

Thanks. I'll dig in to it a little bit. Just to be sure, can you paste in your R session info, just in case this is due to diff. versions of dependencies.

willpearse · 2015-11-15T20:42:03Z

No worries. Again, I'm sorry about this:

R version 3.2.2 (2015-08-14)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 15.10

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_CA.UTF-8        LC_COLLATE=en_GB.UTF-8    
 [5] LC_MONETARY=en_CA.UTF-8    LC_MESSAGES=en_GB.UTF-8   
 [7] LC_PAPER=en_CA.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] fulltext_0.1.4.9000

loaded via a namespace (and not attached):
 [1] NLP_0.1-8         Rcpp_0.12.0       plyr_1.8.2        R.methodsS3_1.7.0
 [5] R.utils_2.1.0     tools_3.2.2       digest_0.6.8      jsonlite_0.9.16  
 [9] lubridate_1.3.3   memoise_0.2.1     gtable_0.1.2      R.cache_0.10.0   
[13] rcrossref_0.3.4   aRxiv_0.5.10      bibtex_0.4.0      DBI_0.3.1        
[17] parallel_3.2.2    proto_0.3-10      xml2_0.1.1        dplyr_0.4.2      
[21] httr_1.0.0        stringr_1.0.0     rredis_1.7.0      grid_3.2.2       
[25] R6_2.0.1          rentrez_0.4.2     XML_3.98-1.1      ggplot2_1.0.1    
[29] reshape2_1.4.1    solr_0.1.4        magrittr_1.5      whisker_0.3-2    
[33] rplos_0.5.2       scales_0.2.4      MASS_7.3-43       assertthat_0.1   
[37] colorspace_1.2-6  stringi_1.0-1     munsell_0.4.2     slam_0.1-32      
[41] tm_0.6-2          rjson_0.2.15      R.oo_1.19.0

sckott · 2015-11-17T02:06:29Z

conferencing for the next 2 days, i'll get to it as soon as I can

willpearse · 2015-11-17T15:04:19Z

No worries; I completely understand and I'm grateful you're looking at it. Maybe something will occur to me...

sckott · 2015-11-26T02:06:54Z

hey @willpearse - seems like they give the entire URL in the html. are there problems with just pulling out the entire url, rather than part of it, then constructing by hand? seems like pulling out entire URL will help avoid problems where articles differ in their base urls. Perhaps this

html <- xml2::read_html(html)
urls <- xml2::xml_attr(xml2::xml_find_all(html, '//a[contains(@href,"supinfo")]'), "href")
url <- urls[si]

could replace the lines https://github.com/willpearse/fulltext/blob/master/R/ft_get_si_plugins.R#L79-L95 or some of them at least - I'm not sure if there's some checks that need to stay there

willpearse · 2015-11-27T15:29:58Z

Potentially, but that's not going to fix the tests you flag that fail locally above, right?

sckott · 2015-11-27T16:20:27Z

tests pass with that change. of course there could be unwanted behavior that the tests don;t catch

but it seems like at least the change is capturing URLs correctly

willpearse · 2015-11-27T16:27:05Z

...then I think this is something very weird because that code is in the Wiley checker, and those tests use the PLoS and Science plugins!

Either way, I'll check and then commit the changes later today...

willpearse · 2015-11-27T17:43:01Z

Checked and good to go, as far as I can see. As I say, none of these changes affect the other problems you were having that I can see, but if they're working your end and they're working my end then I'm happy :D

sckott · 2015-11-27T17:44:46Z

Okay, the changes I suggested were to make it easier to pull out the URL, and not related to the problem brought up in #73 (comment)

willpearse · 2015-11-27T17:51:22Z

Ah, OK, I see. Either way, I'm grateful of the suggestions :D

Fixing Wiley issues and adding file-types

willpearse added 3 commits October 19, 2015 14:40

Added attr for file type

85398c2

Wiley hotfix

361d5ae

Address ropensci-archive#69 (filetype) & ropensci-archive#71 (Wiley)

af2e8a9

get_si_pub fix

ddf6451

Wiley changes

c940e55

sckott added a commit that referenced this pull request Dec 2, 2015

Merge pull request #73 from willpearse/master

1e51f3e

Fixing Wiley issues and adding file-types

sckott merged commit 1e51f3e into ropensci-archive:master Dec 2, 2015

sckott mentioned this pull request Dec 2, 2015

Wiley website change --> hang on Wiley ft_get_si download #71

Closed

sckott added this to the v0.2 milestone Dec 2, 2015

sckott mentioned this pull request Jan 20, 2016

issue with win-builder? #79

Closed

sckott modified the milestones: v0.2, v0.1.6 Jan 20, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixing Wiley issues and adding file-types #73

Fixing Wiley issues and adding file-types #73

willpearse commented Nov 13, 2015

sckott commented Nov 14, 2015

willpearse commented Nov 15, 2015

sckott commented Nov 15, 2015

willpearse commented Nov 15, 2015

sckott commented Nov 15, 2015

willpearse commented Nov 15, 2015

sckott commented Nov 17, 2015

willpearse commented Nov 17, 2015

sckott commented Nov 26, 2015

willpearse commented Nov 27, 2015

sckott commented Nov 27, 2015

willpearse commented Nov 27, 2015

willpearse commented Nov 27, 2015

sckott commented Nov 27, 2015

willpearse commented Nov 27, 2015

Fixing Wiley issues and adding file-types #73

Fixing Wiley issues and adding file-types #73

Conversation

willpearse commented Nov 13, 2015

sckott commented Nov 14, 2015

willpearse commented Nov 15, 2015

sckott commented Nov 15, 2015

willpearse commented Nov 15, 2015

sckott commented Nov 15, 2015

willpearse commented Nov 15, 2015

sckott commented Nov 17, 2015

willpearse commented Nov 17, 2015

sckott commented Nov 26, 2015

willpearse commented Nov 27, 2015

sckott commented Nov 27, 2015

willpearse commented Nov 27, 2015

willpearse commented Nov 27, 2015

sckott commented Nov 27, 2015

willpearse commented Nov 27, 2015