Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOI-based context file is causing errors now #34

Closed
cboettig opened this issue Oct 10, 2017 · 11 comments
Closed

DOI-based context file is causing errors now #34

cboettig opened this issue Oct 10, 2017 · 11 comments

Comments

@cboettig
Copy link
Member

Not sure why, but sometime in the last week, any code that needs to perform JSON-LD operations using the DOI-based context file seems to fail. (e.g. codemeta::crosswalk() functions).

@mbjones @gothub Would you know if anything changed with the handling of the DOI header or what not? I also don't think we ever got that working with DOI-based context files on json-ld playground, but it was definitely working in R a week ago (as evidenced by that travis log!).

@cboettig
Copy link
Member Author

Okay, looks like these 502 errors are random and due to the DOI server(s) throwing errors, probably due to be overloaded. @mfenner would you know anything about server overload issues on the DataCite side?

@jeroen has been helping trouble-shoot this with basic example like:

jsonld::jsonld_compact("https://raw.githubusercontent.com/codemeta/codemetar/master/codemeta.json", "https://doi.org/10.5063/schema/codemeta-2.0")


@jeroen
Copy link
Member

jeroen commented Oct 10, 2017

If it's helpful, you can reproduce the 502 errors using the curl command line:

curl -Lv "https://doi.org/10.5063/schema/codemeta-2.0" -H "Accept: application/ld+json, application/json" > /dev/null

If you do it a few times you start to see random failures.

@cboettig
Copy link
Member Author

Looks like the DOI resolution server is no longer overheated, tests are passing now.

@cboettig
Copy link
Member Author

cboettig commented Nov 1, 2017

@mfenner looks like this issue keeps coming back: the DOI servers seem to be sufficiently overloaded that any JSON-LD operation with a DOI-based context can fail (even when more vanilla DOI resolution tends to work). Any suggestions around this?

At the moment we would get much more robust behavior with GitHub based URLs, which is what our DOI redirects to anyway. Thoughts @mbjones @gothub ?

Should we go with purl.org instead? Seems like every time I go to release this package to CRAN the DOI servers are throwing 502 errors at me....

@cboettig cboettig reopened this Nov 1, 2017
@mfenner
Copy link

mfenner commented Nov 2, 2017

@cboettig we are having some issues this week, and will work on making the content negotiation infrastructure more robust in the coming weeks.

To be clear, this is not an issue of the DOI handle infrastructure (which redirects any DOI to a URL), but the DataCite infrastructure. The easiest step would be to not use content negotiation, which would give you the same behavior as using GitHub URLs or purl.org:

curl -Lv "https://doi.org/10.5063/schema/codemeta-2.0"

Content negotiation isn't really useful if you get exactly the same response as without content negotiation.

While we will of course improve the performance of DataCite DOI content negotiation (and have made good progress in the last two weeks), it will always be much slower than resolving a DOI directly, as it has to pass through an additional server and needs much more complex processing than looking up a URL for a DOI.

@cboettig
Copy link
Member Author

cboettig commented Nov 2, 2017

@mfenner thanks much, guess I was unlucky in timing. Good to know we can skip the CN.

@jeroen Any idea if/how I could tell your jsonld package not to use content negotiation when making calls against the URLs in a context line? Or is content-negotiation needed there to tell that the URL is a JSON context?

@mbjones
Copy link
Member

mbjones commented Nov 3, 2017

I don't know what @jeroen did, but we noticed that tools like the JSON-LD playground and other JSON-LD clients are configured to use content negotiation automatically in their requests, asking for Accept: application/ld+json. So I expect the majority of LD clients will be requesting the context file using content negotiation and asking for json-ld format. See discussion in codemeta/codemeta#125

@cboettig
Copy link
Member Author

cboettig commented Nov 3, 2017

Right, I think content negotiation is probably required for the algorithm to work -- e.g. if the context is http://schema.org, content negotiation gives you the actual JSON file with the schema data:

curl -LH "Accept: application/ld+json" http://schema.org

whereas without it you just get the schema.org HTML homepage:

curl -Lv http://schema.org

So we probably cannot simply disable content negotiation since it would break these cases. Not sure if we can turn CN off on the R side anyway -- the R package just wraps the javascript library, so not sure if that's possible, but it does also depend on the curl library and @jeroen is known to work magic ✨ . Even if we can, we'd have to somehow do it only for doi links(?). Sounds messy.

@mfenner I don't suppose there's any other work around? (e.g. some way to avoid triggering the full set of content negotiation labor on the DataCite servers without having to drop the "Accept: application/ld+json" part?

@jeroen
Copy link
Member

jeroen commented Nov 3, 2017

@cboettig I have added an option that you can use to disable the Accept request header:

options(jsonld_use_accept = FALSE)

Let me know if this helps.

@cboettig
Copy link
Member Author

cboettig commented Nov 3, 2017

@jeroen Thanks.

This still fails:

library(jsonld)
options(jsonld_use_accept = FALSE)
jsonld::jsonld_compact(
  "https://raw.githubusercontent.com/codemeta/codemetar/master/codemeta.json", 
  "https://raw.githubusercontent.com/codemeta/codemeta/2.0/codemeta.jsonld")

with error:

Error in context_eval(join(src), private$context) : 
  TypeError: Object #<Object> has no method 'match'

note the URL for the context we are compacting into is already a github url, but the current context specified in https://raw.githubusercontent.com/codemeta/codemetar/master/codemeta.json points to a doi, and also a schema.org url (which will need context negotiation):

"@context": [
    "https://doi.org/doi:10.5063/schema/codemeta-2.0",
    "http://schema.org"
  ]

even without the schema.org part it's not clear to me if the CN is getting turned off for this internal call.

@cboettig
Copy link
Member Author

cboettig commented Apr 5, 2018

resolved

@cboettig cboettig closed this as completed Apr 5, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants