Warning message: no plugin for Crossref member 345 yet #163
Comments
thanks for the report @low-decarie will have a look into this |
Hmm, it's kind of messy 😮 So I think we need a pattern like
And then so I think can be done but a bit messy |
I was trying to get an alternative using phantomjs to download the full-text html page after javascript interpretation, but for some reason the html full-text section is still missing. Thanks for your efforts. |
okay - try these on the command line
and now remotes::install_github("ropensci/fulltext")
ft_get('10.1099/ijsem.0.002809') |
That is fantastic. Thank you! It worked once, but I now get:
but if I check online, I do have access to 10.1099/ijs.0.006387-0 (and many others for which I get the same error) (its actually open access) |
looks like because they use a different URL pattern for that DOI http://www.microbiologyresearch.org/docserver/fulltext/ijsem/59/8/1919.pdf whereas we were looking for https://github.com/ropenscilabs/pubpatterns/blob/master/src/microbiology.json#L23 |
Naive question, in the output from |
What do you mean by "any url containing pdf" ? ftdoi API works not by scraping publisher pages, but by using rules described in https://github.com/ropenscilabs/pubpatterns repo - so we only give back URLs from rules that we state ourselves. or by "any url" did you mean URLs for other articles? |
yeah, so publishers really suck. They have a different URL pattern for papers in press vs. papers assigned to a volume/issue/page numbers. not sure yet how I'll deal with that. |
phew, okay i think this works now:
|
I knew my comment was naive, the magic you are doing here escapes me. Thanks again for this great piece of work! The issues I was having were all with articles that are already in a volume. Articles more than 6 months old become OA and those are the only articles to which I have access / want to dowload. A PDF file is temporarily created in the cach folder, but it gets deleted when the command fails (I guess this is planned behaviour). Here is a list of all IJSEM DOI. There are three formats of DOI. |
yes, we don't want to cache a bad file so we clean it up (delete it) if something goes wrong. So does |
@low-decarie does |
If I do ft_get() on the whole list of DOIs, I get : If I sample repeatedly 30 DOIs from this list to which I apply ft_get(), I get fails ~29/30 times (eg. of warnings):
I have access to most through the browser (I don't have access to 10.1099/ijsem.0.002131 as it is less than 6 months old).
... |
thanks for the details here @low-decarie - will have a look i think the message i included |
😢 oof, another exception to the rules i thought i figured out. so for now i changed the internals of ftdoi.org to just scrape the html landing page to get the pdf url, this does mean that requests for this publisher will be a bit slower unfortunately. I tried with up to 50 DOIs and they all work now for me. side note that the first DOI in the list I think had a typo, instead of |
Fantastic! 🥇 ! Thank you! |
should speed up in the future, will do caching sckott/pubpatternsapi#7 of the JSON response from the API which is used inside |
caching added to the API - 2nd and so on request to the same route will be cached for 24 hrs. |
It does not seem to be possible to download full text from "International Journal of Systematic and Evolutionary Microbiology" though most articles are open access.
I guess similar to previously reported issue of
example url for DOI: 10.1099/ijs.0.006767-0
main page:
http://ijs.microbiologyresearch.org/content/journal/ijsem/10.1099/ijs.0.006767-0
html
http://ijs.microbiologyresearch.org/content/journal/ijsem/10.1099/ijs.0.006767-0#tab2
ie adding "#tab2" to the url retrieved by dx.doi.org from the DOI find the full html text.
pdf:
http://www.microbiologyresearch.org/docserver/fulltext/ijsem/59/6/1508.pdf? [+++ user specific specific bits]
The text was updated successfully, but these errors were encountered: