Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

N3/Turtle reader does not allow non-lowercase language tags #2

Closed
bhuga opened this issue Nov 10, 2010 · 12 comments
Closed

N3/Turtle reader does not allow non-lowercase language tags #2

bhuga opened this issue Nov 10, 2010 · 12 comments

Comments

@bhuga
Copy link

bhuga commented Nov 10, 2010

Despite the fact that "abc"@en is invalid turtle according to both the N3 and Turtle grammars, this is in widespread use, including in the W3C's own SPARQL tests (what is the emoticon for 'irony'?):

http://www.w3.org/2001/sw/DataAccess/tests/r2#lang-case-insensitive-eq

The rdf-n3 gem currently, correctly, fails to parse the data file given above, dying at the @en. Could this be made more accepting, allowing non-lowercase language tags?

@artob
Copy link
Member

artob commented Nov 10, 2010

For what it's worth, I made a similar improvement to RDF.rb's N-Triples parser recently, given how frequently mixed-case language tags are encountered in the wild. RDF.rb 0.3.0's parser will now accept language tags in any case, but the serializer emits them in lowercase only.

@gkellogg
Copy link
Member

Rule is currently [a-z]+ ( "-" [a-z0-9]+ )*, I can easily change this to allow upper class too. I note that RDF::Literal#canonicalize should fix this. This is based on RFC3066, which has actually been replaced by RFC4646. I suspect that work out of RDF Next will update this, but for the time being, I'll just relax the parsing to allow both upper and lower case language expression.

@artob
Copy link
Member

artob commented Nov 11, 2010

Yes, I just had a look at RDF::Literal#canonicalize in RDF.rb HEAD and we do indeed downcase the language tag when canonicalized. Of course, parsers could choose to do that even earlier, when first reading in the language tag; the N-Triples parser doesn't at present, but we could change that if you think it's appropriate.

In any case, we should probably have both RDF.rb's bundled N-Triples parser and the N3 parser handle language tag case-sensitivity questions consistently if possible, so let me know if something needs changing in the N-Triples reader.

@bhuga
Copy link
Author

bhuga commented Nov 11, 2010

The particular sparql test at issue is testing whether or not lowercase and uppercase language tags are the same in the endpoint. I have no idea why this decision was made, but it would seem parsers should not canonicalize this.

@gkellogg
Copy link
Member

Fixed in d9726a6

@gkellogg
Copy link
Member

Note that the fix did include the c14n. The reader canonicalizes all input, which seems to be required to pass other W3C tests. We could add an option to the reader to perform c14n, which I would use for tests, but would allow your usage.

Let me know how it goes after you re-check with the updated Gem.

@artob
Copy link
Member

artob commented Nov 14, 2010

Thanks for implementing this, Gregg.

We should probably define some standard options for all RDF.rb-compatible readers, such as indeed :canonicalize => true || false, but also e.g. :intern => true || false to control whether or not the reader instance will return interned URIs. The former could be false by default and the latter true by default, which (mostly) reflects the current default situation. What do you think?

@gkellogg
Copy link
Member

Yes, I think this is the right set. I'll add support for this in my readers. If you add it to RDF::Reader, that would be great.

@artob
Copy link
Member

artob commented Nov 14, 2010

OK, I'll add and document them shortly.

@gkellogg
Copy link
Member

I implemented this, plus :prefixes option in rdf-n3, which is pushed to GitHub. I'll wait until 0.3.0 issues are resolved across other gems before releasing to rubygems.

@artob
Copy link
Member

artob commented Nov 15, 2010

I've now defined and documented five standard options for RDF::Reader.new in:

https://github.com/bendiken/rdf/commit/e7b325b9ffd445781a0390f4aab51d2625f7cd4d

See http://rdf.rubyforge.org/RDF/Reader.html#initialize-instance_method for a readable summary. Not all reader implementations need to necessarily implement all options, but I'll work on implementing these (except for the prefixes, obviously) in RDF.rb's bundled N-Triples parser.

@artob
Copy link
Member

artob commented Nov 15, 2010

Also, I should mention that I will cut a 0.3.0.pre release today or tomorrow, so that all gems depending on RDF.rb have a chance to get updated before the official 0.3.0 release.

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants