Fixing Wiley issues and adding file-types #73
Conversation
awesome, thanks Testing locally, getting some failed tests, e.,g, : <r> expect_true(file.exists(ft_get_si("10.1371/journal.pone.0127900", 1)))
#> Error in get_si_pub(x) :
#> Cannot find publisher for DOI: 10.1371/journal.pone.0127900
<r> expect_true(file.exists(ft_get_si("10.1111/ele.12437", 1)))
#> Error in get_si_pub(x) : Cannot find publisher for DOI: 10.1111/ele.12437
<r> expect_true(file.exists(ft_get_si("10.1126/science.1255768", "Appendix_BanksLeite_etal.txt")))
#> Error in get_si_pub(x) :
#> Cannot find publisher for DOI: 10.1126/science.1255768 Seems like this stems from https://github.com/willpearse/fulltext/blob/master/R/ft_get_si_plugins.R#L10 - the object |
Sorry for the delay getting back to you. This is disturbing to me, because these work locally for me. I do run all tests before PR-ing; clearly something is going wrong from my end. More soon; sorry about this. |
thanks 👍 |
I've got rid of what could be causing it, but it's hard for me to say anything as (as I say) I can't trigger those errors. I do, however, sometimes get errors when using Wiley et al., and I'm concerned that's something server-side because sometimes it works, and sometimes it doesn't. Again, could just be me - frankly, I can't get the Ecology Letters website to load sometime on my machine! |
Thanks. I'll dig in to it a little bit. Just to be sure, can you paste in your R session info, just in case this is due to diff. versions of dependencies. |
No worries. Again, I'm sorry about this:
|
conferencing for the next 2 days, i'll get to it as soon as I can |
No worries; I completely understand and I'm grateful you're looking at it. Maybe something will occur to me... |
hey @willpearse - seems like they give the entire URL in the html. are there problems with just pulling out the entire url, rather than part of it, then constructing by hand? seems like pulling out entire URL will help avoid problems where articles differ in their base urls. Perhaps this html <- xml2::read_html(html)
urls <- xml2::xml_attr(xml2::xml_find_all(html, '//a[contains(@href,"supinfo")]'), "href")
url <- urls[si] could replace the lines https://github.com/willpearse/fulltext/blob/master/R/ft_get_si_plugins.R#L79-L95 or some of them at least - I'm not sure if there's some checks that need to stay there |
Potentially, but that's not going to fix the tests you flag that fail locally above, right? |
tests pass with that change. of course there could be unwanted behavior that the tests don;t catch but it seems like at least the change is capturing URLs correctly |
...then I think this is something very weird because that code is in the Wiley checker, and those tests use the PLoS and Science plugins! Either way, I'll check and then commit the changes later today... |
Checked and good to go, as far as I can see. As I say, none of these changes affect the other problems you were having that I can see, but if they're working your end and they're working my end then I'm happy :D |
Okay, the changes I suggested were to make it easier to pull out the URL, and not related to the problem brought up in #73 (comment) |
Ah, OK, I see. Either way, I'm grateful of the suggestions :D |
Fixing Wiley issues and adding file-types
Where the suffix seems reasonable (
*.4-letters
) anattr
ofsuffix
is added to the return that is the file's type. This happens on download (...if you rename a file I can't figure out its type...), but given caching is the default it shouldn't be difficult for the user to just download, double-check the file type (...though, honestly, they should probably know this before downloading!) and then go from there. Address discussion on #3 and #69The Wiley issue ( #71 ) is addressed in this, in that Ecology Letters articles download (which have been updated), and other Wiley journals that haven't been updated also download (with check and tests for that as well). It looks like Wiley haven't finished updating their website yet and (frankly) the new layout for Ecology Letters doesn't load on my browser, let alone in R. If they retain the backwards-compatibility page on the same link they have now, this code will continue to work (hopefully) whatever they do to the new version. I'll have to revisit this; I have at least written a number of Wiley tests now so we should know when other journals etc. break.