Skip to content
This repository has been archived by the owner on Mar 17, 2018. It is now read-only.
/ rdf-nx-parser Public archive

Non-validating tokenizer / parser for the RDF N-Triples and N-Quads serializations (or any “N-x”)

Notifications You must be signed in to change notification settings

j13z/rdf-nx-parser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

rdf-nx-parser

A non-validating tokenizer and parser for the RDF N-Triples and N-Quads serializations (or any “N-x”).

Provides parsing of N-Triples and N-Quads from strings, or tokenizing any “N-x” string.

Coverage Status

Why?

There are enough parsers already that are faster (see last section), but having a parser for Node.js is useful for building smaller tools.

Usage

npm install --save rdf-nx-parser

The module exports a parser object:

var parser = require('rdf-nx-parser');

Parsing

Use parseTriple() to parse an N-Triples statement, parseQuads() for N-Quads. Both return an objects, or null if the input can't be parsed.

var quad = parser.parseQuad(
    '_:foo ' + 
    '<http://example.com/bar> ' + 
    '"\\u9B3C\\u8ECA"@jp ' + 
    '<http://example.com/baz> .'
);

console.log(JSON.stringify(quad, null, 4));
{
    "subject": {
        "type": "blankNode",
        "value": "foo"
    },
    "predicate": {
        "type": "iri",
        "value": "http://example.com/bar"
    },
    "object": {
        "type": "literal",
        "value": "鬼車",
        "language": "jp"
    },
    "graphLabel": {
        "type": "iri",
        "value": "http://example.com/baz"
    }
}

Literal objects can have an additional language or datatypeIri property.

The parser does not verify that the data adheres to the [grammar] 1. It will instead happily parse anything as good as it can:

> parser.parseQuad('<foo> <:///baz>     "bar"  <$!#]&> .');

{ subject: { type: 'iri', value: 'foo' },
  predicate: { type: 'iri', value: ':///baz' },
  object: { type: 'literal', value: 'bar' },
  graphLabel: { type: 'iri', value: '$!#]&' } }

You can optionally pass an options object to these methods as a second parameter, shown with the defaults here:

parser.parseTriple(input, {
    // Set to `true` to get unparsed strings as `value`
    //properties
    asString: false,  
    
    // Include the unparsed token as `valueRaw` property
    // when returning objects
    includeRaw: false,

    // Decode unicode escapes, `\uxxxx` and `Uxxxxxxxx`
    // (but not percent encoding or punycode)
    unescapeUnicode: true
});

Parsing a whole file of N-Triples / N-Quads lines can easily be done e. g. with Node's readline module, see the example.

Tokenization

An arbitrary number of “N-x” tokens can be extracted from a string into an array of token objects with the tokenize() method:

> parser.tokenize(
    '<foo> _:bar . "123"^^<http://example.com/int> ' +
    '"\u0068\u0065\u006C\u006C\u006F"@en-US . .'
);

[ { type: 'iri', value: 'foo' },
  { type: 'blankNode', value: 'bar' },
  { type: 'endOfStatement', value: '.' },
  { type: 'literal',
    value: '123',
    datatypeIri: 'http://example.com/int' },
  { type: 'literal',
    value: 'hello',
    language: 'en-US' },
  { type: 'endOfStatement', value: '.' },
  { type: 'endOfStatement', value: '.' } ]

Each token has at least a type and a value property. There are four token types: iri, literal, blankNode and endOfStatement (can be listed with the getTokenTypes() method).

Implementation

The implementation is based on regular expressions (to split the input into tokens) – they are pretty fast on V8. This regex-based implementation is faster than a previous simple state machine (that read the input in one scan). Seems like regexes can be compiled more effectively into machine code.

Node.js version support

Works with Node.js 0.10 and higher.

Tests

Run with: npm test (mocha, Chai, Istanbul)

Similar projects

About

Non-validating tokenizer / parser for the RDF N-Triples and N-Quads serializations (or any “N-x”)

Resources

Stars

Watchers

Forks

Packages

No packages published