Skip to content
RDFa Parser for java
Java Shell
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.

Welcome to java-rdfa

The cruftiest RDFa parser in the world, I'll bet. Apologies that there isn't much documentation. Things may explode: you have been warned.

Currently passing all conformance tests for XHTML, and the HTML 4 and 5 tests with one exception.

This was written by Damian Steer. It is an offshoot of the Stars Project which was funded by JISC

Useful Links

Basic Use

$ ls
htmlparser-1.2.1.jar    java-rdfa-0.4.jar

$ java -jar java-rdfa-0.4.jar
<> <> <> .

or (equivalent):

$ java -cp '*' rdfa.simpleparse
<> <> <> .

For HTML sources add the format argument, and you will need the parser:

$ java -cp '*' rdfa.simpleparse --format HTML
<> <> <> .

The output of simpleparse is n-triples, and hard to read. If you have jena try adding it to you classpath and using rdfa.parse instead:

$ java -cp '*:/path/to/jena/lib/*' rdfa.parse --format HTML
@prefix dc:      <> .
@prefix hx:      <> .
... nice turtle output ...

Java Use

To use the parser directly, without the assistance of an RDF toolkit (a bold choice) implement a StatementSink to collect the triples, then use a parser from the Factory to make a reader:

XMLReader reader = ParserFactory.createReaderForFormat(sink, Format.XHTML); // or HTML, still an XMLReader
reader.parse(source); // Your sink will be sent triples

java-rdfa can be used from jena. Simply invoke:


Which will hook the two readers in to jena, then you will be able to:, "XHTML"); // xml parsing, "HTML"); // html parsing

java-rdfa is available in the maven central repositories. Note that it does not depend on jena.

A sesame reader provided by Henry Story is also available.

Open Graph Protocol

A very simple OGP reader is provided. This follows what (I think) Toby Inkster did:

    Map<String, String> prop =


    title => 'Kick-Ass' => '326803741017' => '' => ''
    image => ''
    site_name => 'Rotten Tomatoes'
    type => 'movie'
    url => '' => '1106591'

Form Mode

There is a secret form mode (that prompted the development of this parser). In this mode you can generate basic graph patterns by including ?variables where curies are allowed, and INPUT tags generate @name variables.

Simple example (from the tests) and the query that results.



  • (Finally) support overlapping literals. No one noticed this didn't work!
  • Added turtle-ish output. Slightly less nasty than N-Triples.
  • Bug fixes...
  • Turned OFF html 5 streaming. Such a bad idea on my part.
  • Started RDFa 1.1 support.
  • Added simple OGP reader.


  • Updated to current conformance tests
  • Switched to streaming mode (may live to regret this).
  • Created very simple n-triple and rdf/xml streaming serialisers.
  • Usual bug fixes etc.
  • Jena is now a provided maven dependency. Using java-rdfa won't pull in jena.
  • Sesame reader create by Henry Story added. Can't be added to central maven repository since Sesame isn't available, so spun out in small module.
  • Tests for query, and some utilities.
Something went wrong with that request. Please try again.