New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
N3/Turtle reader does not allow non-lowercase language tags #2
Comments
For what it's worth, I made a similar improvement to RDF.rb's N-Triples parser recently, given how frequently mixed-case language tags are encountered in the wild. RDF.rb 0.3.0's parser will now accept language tags in any case, but the serializer emits them in lowercase only. |
Rule is currently [a-z]+ ( "-" [a-z0-9]+ )*, I can easily change this to allow upper class too. I note that RDF::Literal#canonicalize should fix this. This is based on RFC3066, which has actually been replaced by RFC4646. I suspect that work out of RDF Next will update this, but for the time being, I'll just relax the parsing to allow both upper and lower case language expression. |
Yes, I just had a look at In any case, we should probably have both RDF.rb's bundled N-Triples parser and the N3 parser handle language tag case-sensitivity questions consistently if possible, so let me know if something needs changing in the N-Triples reader. |
The particular sparql test at issue is testing whether or not lowercase and uppercase language tags are the same in the endpoint. I have no idea why this decision was made, but it would seem parsers should not canonicalize this. |
Fixed in d9726a6 |
Note that the fix did include the c14n. The reader canonicalizes all input, which seems to be required to pass other W3C tests. We could add an option to the reader to perform c14n, which I would use for tests, but would allow your usage. Let me know how it goes after you re-check with the updated Gem. |
Thanks for implementing this, Gregg. We should probably define some standard options for all RDF.rb-compatible readers, such as indeed |
Yes, I think this is the right set. I'll add support for this in my readers. If you add it to RDF::Reader, that would be great. |
OK, I'll add and document them shortly. |
I implemented this, plus :prefixes option in rdf-n3, which is pushed to GitHub. I'll wait until 0.3.0 issues are resolved across other gems before releasing to rubygems. |
I've now defined and documented five standard options for https://github.com/bendiken/rdf/commit/e7b325b9ffd445781a0390f4aab51d2625f7cd4d See http://rdf.rubyforge.org/RDF/Reader.html#initialize-instance_method for a readable summary. Not all reader implementations need to necessarily implement all options, but I'll work on implementing these (except for the prefixes, obviously) in RDF.rb's bundled N-Triples parser. |
Also, I should mention that I will cut a 0.3.0.pre release today or tomorrow, so that all gems depending on RDF.rb have a chance to get updated before the official 0.3.0 release. |
Despite the fact that "abc"@en is invalid turtle according to both the N3 and Turtle grammars, this is in widespread use, including in the W3C's own SPARQL tests (what is the emoticon for 'irony'?):
http://www.w3.org/2001/sw/DataAccess/tests/r2#lang-case-insensitive-eq
The rdf-n3 gem currently, correctly, fails to parse the data file given above, dying at the @en. Could this be made more accepting, allowing non-lowercase language tags?
The text was updated successfully, but these errors were encountered: