Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: set correct Accept header for CONSTRUCT queries #48

Merged
merged 12 commits into from May 27, 2024

Conversation

simontaurus
Copy link

fixes #47

@k00ni
Copy link
Member

k00ni commented May 9, 2024

@simontaurus Thank you for the PR! I tried to reproduce the result as described in easyrdf#301 (comment), but was not able to. EasyRdf generates a Graph instance, as expected. Please see this test, which passes with and without your adaptions.

I used DBpedia's SPARQL endpoint which runs the following Virtuoso version: Virtuoso version 08.03.3330 (879a32f2c3) on Linux (x86_64-generic-linux-glibc212).

The content of $accept (the result of Format::getHttpAcceptHeader()) is application/x-httpd-php-source,application/json,application/ld+json,application/n-triples,text/json;q=0.9,application/rdf+json;q=0.9,text/plain;q=0.9,text/ntriples;q=0.9,application/ntriples;q=0.9,application/x-ntriples;q=0.9,text/turtle;q=0.8,application/rdf+xml;q=0.8,application/turtle;q=0.7,application/x-turtle;q=0.7,text/xml;q=0.5,application/xml;q=0.5,text/html;q=0.4,application/xhtml+xml;q=0.4. It looks alright.

Its good to solve old bugs, but this one seems to be fixed in the meantime. What am I missing here?

@k00ni k00ni added the fix label May 9, 2024
@TallTed
Copy link

TallTed commented May 9, 2024

Looking at the new Accept: header, I see a few weird things. It's easier for humans to decipher with one media type per line, and with some whitespace to align things --

application/x-httpd-php-source         ,
application/json                       ,
application/ld+json                    ,
application/n-triples                  ,
       text/json               ; q=0.9 ,
application/rdf+json           ; q=0.9 ,
       text/plain              ; q=0.9 ,
       text/ntriples           ; q=0.9 ,
application/ntriples           ; q=0.9 ,
application/x-ntriples         ; q=0.9 ,
       text/turtle             ; q=0.8 ,
application/rdf+xml            ; q=0.8 ,
application/turtle             ; q=0.7 ,
application/x-turtle           ; q=0.7 ,
       text/xml                ; q=0.5 ,
application/xml                ; q=0.5 ,
       text/html               ; q=0.4 ,
application/xhtml+xml          ; q=0.4

The first several lines imply q=1.0.

Now, several of these are deprecated (e.g., application/x-ntriples;q=0.9), and I would suggest that while they may be acceptable, they should have a lower q than the type(s) that replaced them.

Further, several of the higher q are not associated with RDF, and do not have standard conversion methods (e.g., text/plain;q=0.9, application/x-httpd-php-source). I suggest that these should have a lower q value than all RDF-ish types.

If I were building this list, I'd order the types in my actual order of preference, and then set the q values to step down the same way. Note that q values between 0 and 1 may have 3 digits after the decimal (i.e., q=0.001 to q=0.999 ), so there are plenty of values to use to stack this list in order of actual preference.

Giving the Accept: header (for all SPARQL queries, not just CONSTRUCT) proper consideration should radically improve user experience, especially where users target multiple SPARQL engines over time..

@simontaurus
Copy link
Author

simontaurus commented May 10, 2024

@k00ni, @TallTed: Many thanks for evaluating this further.
To summarize, easyrdf sends an ill-formed Accept: header. Technically any server returning e.g.
application/x-httpd-php-source, application/json (q=1.0) or text/html (q=0.4) would behave correct but this may lead to parsing errors on the client side.
While dbpedias endpoint implementations 'cures' this, specifically blazegraph e.g. at https://query.wikidata.org/sparql returns application/sparql-results+json confronted with the easyrdf default header (which may be ill-formed too, because application/sparql-results+json was never accepted but maybe is interpreted as subtype of the accepted application/json). In this case the following line will create a Result instead of a Graph:

if (strpos($content_type, 'application/sparql-results') === 0) {
$result = new Result($response->getBody(), $content_type);

The fix will remove application/json and non-RDF types from Accept: fixing the behavior observed for blazegraph.

@simontaurus
Copy link
Author

I've added a second test with wikidata as endpoint which fails with the current main branch, see https://github.com/RepoLab/easyrdf/actions/runs/9030250747/job/24814145081

@k00ni
Copy link
Member

k00ni commented May 10, 2024

First, thanks to both of you for taking the time to make this fork better.

@TallTed I agree with your statements. EasyRdf should use proper communication so users can get what they expect. With these changes we can take a first step in this direction.

@simontaurus thanks for the feedback, I refined your adaptions and added a few more.

I dug around and have to say EasyRdf is a freaking mess in this regard. For instance, why is SVG part of the format list (reference)? 😆

There is another instance where the "broken" getHttpAcceptHeader is used: in Graph::load > https://github.com/sweetrdf/easyrdf/blob/master/lib/Graph.php#L328-L330. I adapted it using the same code as in Client::executeQuery (ref).

This test demonstrates that the change in Graph::load works with "strict" services such as WikiData now: https://github.com/sweetrdf/easyrdf/pull/48/files#diff-4e89689682eaed7ea1f99496cf46701cf4f4f5d92f4c0e74ad35ea6ec5b48a44R400. Without this change it throws the following exception: No parser class available for format: sparql-json

I hope I didn't miss something important here.

In my opinion these fixes are more like a patch + some tape and the code should be refactored in a broader sense. But I am afraid this might need quite some time. I would rather prefer to fix usage of Format's content itself than just place a list of valid things next to it.

CC @zozlak: you also have a deep knowledge in these areas, is there anything you wanna add here?

tests/EasyRdf/Sparql/ClientTest.php Outdated Show resolved Hide resolved
tests/EasyRdf/Sparql/ClientTest.php Outdated Show resolved Hide resolved
tests/EasyRdf/GraphTest.php Outdated Show resolved Hide resolved
k00ni and others added 3 commits May 13, 2024 08:31
Co-authored-by: Ted Thibodeau Jr <tthibodeau@openlinksw.com>
Co-authored-by: Ted Thibodeau Jr <tthibodeau@openlinksw.com>
Co-authored-by: Ted Thibodeau Jr <tthibodeau@openlinksw.com>
@k00ni
Copy link
Member

k00ni commented May 22, 2024

@simontaurus is there anything you wanna add/change here?

@simontaurus
Copy link
Author

@k00ni: From my side we are good to go 👍

@k00ni k00ni merged commit b5eeeb3 into sweetrdf:master May 27, 2024
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

EasyRdf incorrectly uses 'Accept:' headers
3 participants