Warning message: no plugin for Crossref member 345 yet #163

low-decarie · 2018-05-31T13:06:47Z

It does not seem to be possible to download full text from "International Journal of Systematic and Evolutionary Microbiology" though most articles are open access.

Warning message:
no plugin for Crossref member 345 yet

I guess similar to previously reported issue of

no plugin for Crossref member 8215 yet #117

example url for DOI: 10.1099/ijs.0.006767-0
main page:
http://ijs.microbiologyresearch.org/content/journal/ijsem/10.1099/ijs.0.006767-0
html
http://ijs.microbiologyresearch.org/content/journal/ijsem/10.1099/ijs.0.006767-0#tab2
ie adding "#tab2" to the url retrieved by dx.doi.org from the DOI find the full html text.

pdf:
http://www.microbiologyresearch.org/docserver/fulltext/ijsem/59/6/1508.pdf? [+++ user specific specific bits]

sckott · 2018-05-31T16:59:53Z

thanks for the report @low-decarie

will have a look into this

sckott · 2018-06-01T06:31:03Z

Hmm, it's kind of messy 😮

So I think we need a pattern like

http://ijs.microbiologyresearch.org/deliver/fulltext/ijsem/<doi suffix dot separated>.zip/<doi suffix concatenated no spaces>.pdf

And then ijsem is also specific to the journal, as well asijs in ijs.microbiologyresearch.org

so I think can be done but a bit messy

low-decarie · 2018-06-01T10:32:56Z

I was trying to get an alternative using phantomjs to download the full-text html page after javascript interpretation, but for some reason the html full-text section is still missing. Thanks for your efforts.

…text#163 fix #5

…chive/fulltext#163

sckott · 2018-06-01T18:22:39Z

okay - try these on the command line

curl https://ftdoi.org/api/doi/10.1099/ijsem.0.002809/ | jq .
curl https://ftdoi.org/api/doi/10.1099/mic.0.000664/ | jq .
curl https://ftdoi.org/api/doi/10.1099/jgv.0.001056/ | jq .
curl https://ftdoi.org/api/doi/10.1099/mgen.0.000182/ | jq .
curl https://ftdoi.org/api/doi/10.1099/jmmcr.0.005152/ | jq .
curl https://ftdoi.org/api/doi/10.1099/jmm.0.000647/ | jq .

and now fulltext should hopefully work on these now, try

remotes::install_github("ropensci/fulltext")
ft_get('10.1099/ijsem.0.002809')

low-decarie · 2018-06-07T13:27:28Z

That is fantastic. Thank you!

It worked once, but I now get:

Warning message:
you may not have access to 10.1099/ijs.0.006387-0 or an error occurred

but if I check online, I do have access to 10.1099/ijs.0.006387-0 (and many others for which I get the same error) (its actually open access)

sckott · 2018-06-07T13:59:50Z

looks like because they use a different URL pattern for that DOI

http://www.microbiologyresearch.org/docserver/fulltext/ijsem/59/8/1919.pdf

whereas we were looking for https://github.com/ropenscilabs/pubpatterns/blob/master/src/microbiology.json#L23

low-decarie · 2018-06-07T14:38:02Z

Naive question, in the output from
curl https://ftdoi.org/api/doi/10.1099/ijsem.0.002809/ | jq .
that is being parsed by microbiology.json
is it not possible to have a very/more liberal/non-restrictive search for any url containing pdf?

sckott · 2018-06-07T16:43:31Z

What do you mean by "any url containing pdf" ? ftdoi API works not by scraping publisher pages, but by using rules described in https://github.com/ropenscilabs/pubpatterns repo - so we only give back URLs from rules that we state ourselves.

or by "any url" did you mean URLs for other articles?

…/fulltext#163

sckott · 2018-06-07T17:47:03Z

yeah, so publishers really suck. They have a different URL pattern for papers in press vs. papers assigned to a volume/issue/page numbers.

not sure yet how I'll deal with that.

sckott · 2018-06-07T18:12:55Z

phew, okay i think this works now:

an article not in a volume yet: https://ftdoi.org/api/doi/10.1099/ijsem.0.002809/
an article in a volume https://ftdoi.org/api/doi/10.1099/ijsem.0.002699/

low-decarie · 2018-06-08T08:32:23Z

I knew my comment was naive, the magic you are doing here escapes me. Thanks again for this great piece of work!

The issues I was having were all with articles that are already in a volume. Articles more than 6 months old become OA and those are the only articles to which I have access / want to dowload.

A PDF file is temporarily created in the cach folder, but it gets deleted when the command fails (I guess this is planned behaviour).

Here is a list of all IJSEM DOI. There are three formats of DOI.
10.1099/ijs.
10.1099/ijsem.
10.1099/00207713

sckott · 2018-06-08T17:48:18Z

A PDF file is temporarily created in the cach folder, but it gets deleted when the command fails (I guess this is planned behaviour).

yes, we don't want to cache a bad file so we clean it up (delete it) if something goes wrong.

So does ft_get work then for the most part with your DOI list?

sckott · 2018-06-11T21:57:09Z

@low-decarie does ft_get work then for the most part with your DOI list?

low-decarie · 2018-06-12T21:16:33Z

If I do ft_get() on the whole list of DOIs, I get :
Error in names(z$data) <- tolower(names(z$data)) : attempt to set an attribute on NULL In addition: Warning message: 404: Resource not found. - (10.1099/ijs.0-011122-0)

If I sample repeatedly 30 DOIs from this list to which I apply ft_get(), I get fails ~29/30 times (eg. of warnings):

Warning messages: 1: you may not have access to 10.1099/ijs.0.64812-0 or an error occurred 2: you may not have access to 10.1099/ijs.0.000090 or an error occurred 3: you may not have access to 10.1099/ijs.0.020628-0 or an error occurred 4: you may not have access to 10.1099/00207713-51-3-731 or an error occurred 5: you may not have access to 10.1099/ijs.0.000125 or an error occurred 6: you may not have access to 10.1099/ijs.0.63769-0 or an error occurred 7: you may not have access to 10.1099/ijs.0.049106-0 or an error occurred 8: you may not have access to 10.1099/ijsem.0.001928 or an error occurred 9: you may not have access to 10.1099/ijs.0.65467-0 or an error occurred 10: you may not have access to 10.1099/ijsem.0.002131 or an error occurred 11: you may not have access to 10.1099/00207713-50-4-1655 or an error occurred 12: you may not have access to 10.1099/00207713-51-2-489 or an error occurred 13: you may not have access to 10.1099/ijs.0.068296-0 or an error occurred 14: you may not have access to 10.1099/ijs.0.038844-0 or an error occurred 15: you may not have access to 10.1099/ijs.0.053009-0 or an error occurred 16: you may not have access to 10.1099/ijs.0.022517-0 or an error occurred 17: you may not have access to 10.1099/ijs.0.023580-0 or an error occurred 18: you may not have access to 10.1099/ijs.0.064345-0 or an error occurred 19: you may not have access to 10.1099/ijs.0.02735-0 or an error occurred 20: you may not have access to 10.1099/ijsem.0.002212 or an error occurred 21: you may not have access to 10.1099/ijs.0.041178-0 or an error occurred 22: you may not have access to 10.1099/ijs.0.009258-0 or an error occurred 23: you may not have access to 10.1099/ijs.0.02505-0 or an error occurred 24: you may not have access to 10.1099/ijs.0.02377-0 or an error occurred 25: you may not have access to 10.1099/ijsem.0.001064 or an error occurred 26: you may not have access to 10.1099/ijs.0.056499-0 or an error occurred 27: you may not have access to 10.1099/ijs.0.001149-0 or an error occurred 28: you may not have access to 10.1099/ijs.0.000167 or an error occurred 29: you may not have access to 10.1099/ijsem.0.000979 or an error occurred

I have access to most through the browser (I don't have access to 10.1099/ijsem.0.002131 as it is less than 6 months old).

https://ftdoi.org/api/doi/10.1099/ijsem.0.001064/ gives a URL that actually works. Tried it again seperatly ft_get('10.1099/ijsem.0.001064') and it worked. Same for 10.1099/ijsem.0.000979.

https://ftdoi.org/api/doi/10.1099/ijs.0.64812-0/
has faulty file link:
http://ijs.microbiologyresearch.org/deliver/fulltext/ijsem/57/7/1442_ijsem0.pdf
but the file is actually found at:
http://ijs.microbiologyresearch.org/deliver/fulltext/ijsem/57/7/1442.pdf

https://ftdoi.org/api/doi/10.1099/00207713-51-3-731
has faulty file link:
http://ijs.microbiologyresearch.org/deliver/fulltext/ijsem/51/3/731_ijsem731.pdf
but I can't ID a non-user specific url that works

...

sckott · 2018-06-12T21:34:31Z

thanks for the details here @low-decarie - will have a look

i think the message i included you may not have access to DOI or an error occurred doesn't necessarily mean you don't have access

sckott · 2018-06-12T23:35:04Z

😢 oof, another exception to the rules i thought i figured out. so for now i changed the internals of ftdoi.org to just scrape the html landing page to get the pdf url, this does mean that requests for this publisher will be a bit slower unfortunately.

I tried with up to 50 DOIs and they all work now for me.

side note that the first DOI in the list I think had a typo, instead of 10.1099/ijs.0-011122-0 should be 10.1099/ijs.0.011122-0

low-decarie · 2018-06-13T19:10:42Z

Fantastic! 🥇 ! Thank you!

sckott · 2018-06-13T19:19:47Z

should speed up in the future, will do caching sckott/pubpatternsapi#7 of the JSON response from the API which is used inside fulltext

sckott · 2018-06-13T23:16:32Z

caching added to the API - 2nd and so on request to the same route will be cached for 24 hrs.

sckott mentioned this issue Jun 1, 2018

add microbiology society sckott/pubpatterns#5

Closed

sckott added a commit to sckott/pubpatterns that referenced this issue Jun 1, 2018

added microbiology society journal mappers, via ropensci-archive/full…

5c5f5c5

…text#163 fix #5

sckott added a commit to sckott/pubpatternsapi that referenced this issue Jun 1, 2018

add microbiology society handler sckott/pubpatterns#5 and ropensci-ar…

91c11e4

…chive/fulltext#163

sckott added a commit that referenced this issue Jun 1, 2018

#163 changes for publisher 345

eb35253

sckott added a commit to sckott/pubpatterns that referenced this issue Jun 7, 2018

attempt fix for ijs microbiology journals pdf links, ropensci-archive…

0fe449a

…/fulltext#163

sckott mentioned this issue Jun 7, 2018

handle when publishers have diff. URL patterns for different articles sckott/pubpatterns#6

Open

sckott added this to the v1.1.0 milestone Jun 13, 2018

sckott closed this as completed Jun 13, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Warning message: no plugin for Crossref member 345 yet #163

Warning message: no plugin for Crossref member 345 yet #163

low-decarie commented May 31, 2018 •

edited

sckott commented May 31, 2018

sckott commented Jun 1, 2018

low-decarie commented Jun 1, 2018

sckott commented Jun 1, 2018

low-decarie commented Jun 7, 2018

sckott commented Jun 7, 2018

low-decarie commented Jun 7, 2018

sckott commented Jun 7, 2018

sckott commented Jun 7, 2018

sckott commented Jun 7, 2018

low-decarie commented Jun 8, 2018 •

edited

sckott commented Jun 8, 2018

sckott commented Jun 11, 2018

low-decarie commented Jun 12, 2018 •

edited

sckott commented Jun 12, 2018

sckott commented Jun 12, 2018

low-decarie commented Jun 13, 2018

sckott commented Jun 13, 2018

sckott commented Jun 13, 2018

Warning message: no plugin for Crossref member 345 yet #163

Warning message: no plugin for Crossref member 345 yet #163

Comments

low-decarie commented May 31, 2018 • edited

sckott commented May 31, 2018

sckott commented Jun 1, 2018

low-decarie commented Jun 1, 2018

sckott commented Jun 1, 2018

low-decarie commented Jun 7, 2018

sckott commented Jun 7, 2018

low-decarie commented Jun 7, 2018

sckott commented Jun 7, 2018

sckott commented Jun 7, 2018

sckott commented Jun 7, 2018

low-decarie commented Jun 8, 2018 • edited

sckott commented Jun 8, 2018

sckott commented Jun 11, 2018

low-decarie commented Jun 12, 2018 • edited

sckott commented Jun 12, 2018

sckott commented Jun 12, 2018

low-decarie commented Jun 13, 2018

sckott commented Jun 13, 2018

sckott commented Jun 13, 2018

low-decarie commented May 31, 2018 •

edited

low-decarie commented Jun 8, 2018 •

edited

low-decarie commented Jun 12, 2018 •

edited