Skip to content

v0.3 - Many bug fixes and important performance improvements

Pre-release
Pre-release
Compare
Choose a tag to compare
@samuell samuell released this 15 Aug 22:31
· 56 commits to master since this release

This release marks a point when somewhat usable results, with reasonable processing time (< 20 s) have been achieved with datasets of sizes around 0.5M triples.

See the commit history for more details, but some highlights:

  • Don't add duplicate facts or categories
  • Shorten titles to MediaWiki's max
  • Fixed silly code that allocated insane amounts of memory
  • Better RDF parsing error checking
  • Collapse multiple argument to same variable to comma-separated list

The usage is also slightly updated, with a dedicated flag for the out-file:

./rdf2smw --in mydataset.nt --out mydataset.xml