Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Literals delimited with """ are serialized in an invalid format when they end with one or more " #16

Closed
no-reply opened this issue Feb 8, 2019 · 2 comments · Fixed by #17

Comments

@no-reply
Copy link
Member

no-reply commented Feb 8, 2019

The writer outputs """ delimited strings in some cases (e.g. when there is a newline). When the literal serialized in this way ends in ".

Specifically: we produce a literal ending in """". I can't see that the Turtle grammar actually prohibits this, but other parsers (at least Jena and Raptor) have a problem with it. It might be better for us to avoid producing literals in this form either way.

require 'rdf'
require 'rdf/turtle'

g << RDF::Statement.new(RDF::URI('http://example.com/moomin'), RDF.value, "has a newline\n and \"ends in a quote\"")

puts g.dump :ttl
# <http://example.com/moomin> <http://www.w3.org/1999/02/22-rdf-syntax-ns#value> """has a newline
# and "ends in a quote"""" .
# => nil

There may also be an issue with ''' delimited strings, but I don't know of any cases where we produce those as a default.


EDIT: on a closer look, the grammar does explicitly disallow ending """ delimited strings in ". The string must end in ([^"\] | ECHAR | UCHAR)

@no-reply
Copy link
Member Author

no-reply commented Feb 8, 2019

Proposing something along the lines of string = string.gsub('\\', '\\\\\\\\').gsub('"', '\\"') at https://github.com/ruby-rdf/rdf-turtle/blob/develop/lib/rdf/turtle/writer.rb#L450

@no-reply
Copy link
Member Author

no-reply commented Feb 8, 2019

A related issue arises with strings with internal " appearing in multiples of 4 or 5 (that are not also multiples of 3).

For example:

graph = RDF::Graph.new
graph << RDF::Statement.new(RDF::URI('http://example.com/moomin'), RDF.value, "has a newline\n and many internal \"\"\"\"\" quotes")

puts graph.dump :ttl

# <http://example.com/moomin> <http://www.w3.org/1999/02/22-rdf-syntax-ns#value> """has a newline
#  and many internal \""""" quotes""" .

no-reply pushed a commit that referenced this issue Feb 8, 2019
Fixes two bugs associated with `"""` delimited literals.

First: such literals cannot end with a `"` (see `STRING_LITERAL_LONG_QUOTE` at
https://www.w3.org/TR/turtle/#sec-grammar-grammar).

Second: they cannot contain a sequence of three `"`. The prior `.gsub('"""',
'\"""')` approach addresses this for cases where `"` appears in multiples of
three, but fails for other cases (e.g. `""""`).

Both bugs are fixed by escaping all `"` characters.

An alternative addressing only the second bug might be: `.gsub('"""', '\""\"')`,
which would ensure three quotes aren't ever left in a row.

Closes #16.
no-reply pushed a commit that referenced this issue Feb 8, 2019
Fixes two bugs associated with `"""` delimited literals.

First: such literals cannot end with a `"` (see `STRING_LITERAL_LONG_QUOTE` at
https://www.w3.org/TR/turtle/#sec-grammar-grammar).

Second: they cannot contain a sequence of three `"`. The prior `.gsub('"""',
'\"""')` approach addresses this for cases where `"` appears in multiples of
three, but fails for other cases (e.g. `""""`).

Both bugs are fixed by escaping all `"` characters.

An alternative addressing only the second bug might be: `.gsub('"""', '\""\"')`,
which would ensure three quotes aren't ever left in a row.

Closes #16.
gkellogg pushed a commit that referenced this issue Feb 8, 2019
Fixes two bugs associated with `"""` delimited literals.

First: such literals cannot end with a `"` (see `STRING_LITERAL_LONG_QUOTE` at
https://www.w3.org/TR/turtle/#sec-grammar-grammar).

Second: they cannot contain a sequence of three `"`. The prior `.gsub('"""',
'\"""')` approach addresses this for cases where `"` appears in multiples of
three, but fails for other cases (e.g. `""""`).

Both bugs are fixed by escaping all `"` characters.

An alternative addressing only the second bug might be: `.gsub('"""', '\""\"')`,
which would ensure three quotes aren't ever left in a row.

Closes #16.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant