Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove built-in N-Triples parser/serializer #31

Closed
artob opened this issue Jun 16, 2016 · 5 comments
Closed

Remove built-in N-Triples parser/serializer #31

artob opened this issue Jun 16, 2016 · 5 comments
Assignees

Comments

@artob
Copy link
Contributor

artob commented Jun 16, 2016

As discussed in #12, @RubenVerborgh and I would propose to remove the buggy built-in N-Triples parser and serializer implementation and rely on Serd instead.

This does mean that Serd would be required in order for the command-line utilities to be of any use (and it might hence make sense to make Serd a required dependency, instead of optional), but that's practically the case right now as well for anything nontrivial, given that the built-in parser is buggy.

Also, Serd is a compact enough library that potentially (if really needed) we could import it into our source tree, at which point there would be no external dependency to install.

Thoughts?

@artob artob self-assigned this Jun 16, 2016
@RubenVerborgh
Copy link
Member

One usability issue: the built-in N-Triples parser would skip invalid lines. Many RDF dumps are invalid, and SERD just stops when this happens (without clear error message at the moment; would be useful to know which line is invalid). This means you'd have to correct the error and restart generation all over again.

That's not a real reason to keep the built-in parser though, because its definition of "invalid" was too narrow: it would happily parse some invalid IRIs, resulting in an invalid HDT file. At least with SERD, we know that the generated HDT file will be valid.

@artob
Copy link
Contributor Author

artob commented Jun 17, 2016

One minor bump in the road of removing the built-in N-Triples parser/serializer is that there are actually call sites in the code base that are hardcoded to use them, instead of cleanly going through a factory function that would encapsulate the multiple parser/serializer implementations.

For example, header parsing in src/header/PlainHeader.cpp is hardcoded to use the built-in parser. I'm working through these now.

@artob
Copy link
Contributor Author

artob commented Jun 17, 2016

This is overall a little bit of a larger task than we might have thought, since the varying reader/writer implementations aren't cleanly abstracted nor dispatched through a registry.

I think what I'll do is lift and adapt existing known-good code (that already abstracts over Serd and Raptor, and more) from our datagraph/librdf++ library (used in Dydra), and replace the I/O layer with that. That library also works on FILE handles, meaning that the transparent gzip (etc) compression code could be made to work with it more readily than with C++ iostreams.

@RubenVerborgh
Copy link
Member

Seems like a good idea, especially given the FILE handles thing.

@RubenVerborgh
Copy link
Member

Removed the built-in parser with bc7a258 and 09a1e6b because people kept hitting this issue (LinkedDataFragments/Client.js#26, LinkedDataFragments/Server.js#37, …).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants