Content negotiation in RDF.rb clients #12

Closed
njh opened this Issue May 27, 2010 · 8 comments

Comments

Projects
None yet
3 participants

njh commented May 27, 2010

http://lists.w3.org/Archives/Public/public-rdf-ruby/2010May/0024.html

Implement content negotiation in RDF.rb clients. Ideally with q= values for each of the supported parsers.

I would like to be able to do this:

repo = RDF::Repository.new
repo.load('http://www.bbc.co.uk/programmes/b00jnwlc#programme')
repo.each { |s| s.inspect! }

njh commented Jun 3, 2010

It would be great if multiple HTTP clients were supported, for example http://github.com/toland/patron

Owner

gkellogg commented Jun 7, 2010

Readers/format classes should implement simple Regexp test on content to be parsed, if format not detected from extension, mime-type or explicit request. For instance:

input.match(/<html/i) && RDF::RDFa::Format
Owner

bendiken commented Jun 11, 2010

Related issue #24 (with regards to improving the HTTP client functionality).

Owner

gkellogg commented Jun 30, 2010

More from a recent email response to hellekin at cepheide.org:

RDF::Reader.for needs to be somewhat smarter.

The symbol case is limited to using an element of the classname (e.g. RDF::RDFXML => :rdfxml). It would be nice to specify alternate symbols (e.g., :rdf). Of course, this can be done through for(:extension => "rdf").
RDF::Reader.open, when loading a remote resource, should look at the returned Mime-Type to do a format match, rather than requiring it be provided explicitly. Arto seems to be of the opinion that this is done via LinkedData, but it seems to be a fair thing to do directly in RDF.rb
I believe that Format specifications should also provide a RegExp to match against the beginning of the content (I use the first 1000 bytes in RdfContext). This would be used within RDF::Reader.open in case a format couldn't be found through other uses, consider the following:

Heuristically detect the input stream

def detect_format(stream)

Got to look into the file to see

if stream.respond_to?(:rewind)
stream.rewind
string = stream.read(1000)
stream.rewind
else
string = stream.to_s
end
case string
when /<(\w+:)?RDF/ then :rdfxml
when /<\w+:)?html/i then :rdfa
when / then :n3
else :ntriples
end
end

This could instead be found by looping through available Format subclasses and looking for a #match method. Within RDFXML::Format, I could perform the following:

class Format < RDF::Format
MATCH = %r(<(\w+:)?RDF))

content_type 'text/turtle', :extension => :ttl
content_type 'text/n3', :extension => :n3
content_encoding 'utf-8'

reader { RDF::N3::Reader }
writer { RDF::N3::Writer }

def match(content)
content.to_s.match(MATCH)
end
end

In RDF::Reader.open, first look for a reader using the options. Then, failing that, open the file and look for a mime-type, failing that, loop through Format instances and see if the Format matches the string content.

In most cases, this will do what the user expects.

Owner

gkellogg commented Aug 31, 2010

This one looks pretty interesting too:
http://github.com/eric1234/open_uri_db_cache

Owner

gkellogg commented Sep 14, 2010

Recently I needed to re-visit this issue in RdfContext to support RDFa 1.1 profiles. Profiles are a mechanism for defining RDF prefixes and terms in a separate document. The spec encourages implementer to cache these vocabularies, for obvious reasons. I implemented this using a ConjunctiveGraph, which is a graph over all quads within a Store (or Repository). When I see a profile, I look for it as a context within the ProfileGraph and download, parse it and add it to the store as necessary.

To do this in RDF.rb is difficult, because RDF::Reader.open inverts finding the reader and opening the resource. Ideally, the resource should be opened first so that, for example, mime-type can be retrieved to perform content-negotation, and the resource can be inspected to see if it is up-to-date. The following is a potential refactor of RDF::Reader.open that extracts the open and provides the same simple Kernel.open implementation. This makes it easier for another module to override this, or perhaps to register an alternative reader to provide better HTTP semantics.

module RDF
class Reader
def self.open(filename, options = {}, &block)
resource = URLResource.new(filename)
reader = self.for(options.slice(:format).merge(:content_type => resource.mime_type))
reader ||= self.for(filename)
raise FormatError.new("unknown RDF format: #{options[:format] || filename}") unless reader

    reader.new(resource.io, options, &block)
  end

  class URLResource
    attr_reader :url, :mime_type, :etag, :format
    attr_reader :modified_at, :checked_at,

    def initialize(url)
      @file = Kernel.open(url, "r")
    end

    def io; @file; end
  end
end

end

There still remains the question of how best to implement this in RDF::RDFa, but that is a different conversation.

Owner

gkellogg commented Feb 6, 2012

Since 0.3.4, RDF.rb can perform format detection in RDF::Reader.for (or RDF::Format.for with :sample option or a block which returns a sample). Of course, content-negotiation is handled using rack-linkeddata or sinatra-linkeddata (and soon rack-sparql or sinatra-sparql).

@bendiken bendiken pushed a commit that referenced this issue Jan 19, 2013

@gkellogg gkellogg Merge pull request #12 from dwbutler/master
Try running the specs in Ruby 2.0.0 and head
395f1d1

gkellogg closed this Jan 23, 2013

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment