Skip to content
zverok edited this page Aug 18, 2015 · 5 revisions

Why the Infoboxer, in the first place?

Wikipedia has lots of informations. Some can say, all of world "common" information (and others can say it is misleading, incomplete, or broken in any way possible).

For many cases and topics, Wikipedia has more well-organized, up-to-date, reliable, structured information than many specialized APIs and datasets.

Yet it is not easy to extract this information -- neither from rendered HTML pages nor from raw Wikitext. Or it rather was not easy -- before Infoboxer. It tries to be useful for extracting chunks (even large chunks) of uniform and clean data from whatever structures Wikipedia pages contain.

Infoboxer is neither AI nor some panacea, it is just a tool (a handy one, I hope).

So, why not DBPedia?

DBPedia is a great effort for extracting data from Wikipedia and store them in structured form, it does an appropriate use of Semantic Web technologies (RDF, SPARQL), interoperates with existing ontologies and overall awesome.

But DBPedia also is:

  • incomplete (and information is lost unrecoverably) -- DBPedia maps only subset of properties and areas of Wikipedia pages, and everything left behind that mapping can not be received through DBPedia at all;
  • ambiguous -- trying to interweave existing ontologies, languages, means of representing same properties, DBPedia leaves you with several ways to query even the basic properties (like "name" or "type"), and any of them can be broken in strange way for very similar page;
  • complicated -- for querying the simplest data, you should have some understanding of Semantic Web technologies -- RDF, triples, namespaces, literals representation, sometimes SPARQL...
  • outdated -- at a time I'm writing this (May 26, 2015), DBPedia resources, accessible online, are from Wikipedia dump of May 02, 2014. Yep, 2014. More than year old. Enough for some topics, dramatically outdated for others (governments, movies, solar eclypse, births and deaths...); UPD: I've been pointed to live.dbpedia.org, so, this point seems to be obsolete.

So, I've tried to implement simpler and cleaner (and also more up-to-date) way to grab your data.

Still, DBPedia is useful for complex (SPARQL) queries, when you need something like "all Argentinian cities with population more than X, to the south of Y", which Wikipedia API can not.