Turn your plain old tabular data (POTD) into Web data with web.instata: it takes CSV as input and generates a HTML document with the data items marked up with Schema.org terms.
+--------------------+
+-------+ | | +--------------+
| CSV | | | | |
|-------| | | | HTML5 |
| | +----------->| web.instata |+---------> | |
| | | | | Schema.org |
| | | | | |
+-------+ | | +--------------+
+--------------------+
Note: web.instata only works for CSV files that use Schema.org types or properties as column names.
In order to publish a HTML+microdata document from a CSV file:
python web.instata.py -p {path to CSV file} {base URI for publishing}
Example:
python web.instata.py -p test/potd_0.csv http://example.org/instata/potd_0
... and you should see the following on the command line:
[web.instata] processing [test/potd_0.csv] with base URI [http://example.org/instata/potd_0]
[web.instata] loading DBpedia2Schema.org mapping ...
[web.instata] got DBpedia2Schema.org mapping!
[web.instata] trying to find a match for http://schema.org/Recipe
[web.instata] trying to find a match for http://schema.org/publishDate
[web.instata] trying to find a match for http://schema.org/name
[web.instata] trying to find a match for http://schema.org/author
[web.instata] match(es) found: {'http://schema.org/author': ('http://www.w3.org/2002/07/owl#equivalentProperty', 'http://dbpedia.org/ontology/author')}
[web.instata] result is now available at [output/potd_0.html]
As a result of the above command, an HTML+microdata document potd_0.html is created that should look like the following:
The generated HTML document, potd_0.html, contains Schema.org terms marked up in microdata as follows:
<table id="instatable">
<thead>
<tr itemscope itemtype="http://purl.org/NET/schema-org-csv#HeaderRow">
<th itemscope itemtype="http://schema.org/Thing" itemid="http://example.org/instata/potd_0#row:1,col:1">Recipe</th>
<th itemscope itemtype="http://schema.org/Thing" itemid="http://example.org/instata/potd_0#row:1,col:2">name</th>
...
</tr>
</thead>
<tbody>
<tr itemscope itemtype="http://schema.org/Recipe" itemid="http://example.org/instata/potd_0#row:2">
<td><a href="http://example.org/instata/potd_0#row:2" itemprop="http://schema.org/url">bb</a></td>
<td itemprop="http://schema.org/name">Mom's World Famous Banana Bread</td>
<td itemprop="http://schema.org/author">John Smith</td>
<td itemprop="http://schema.org/publishDate">May 8, 2009</td>
</tr>
...
</tbody>
</table>
A more flexible but also slightly more complex case is that of using a web.instata configuration file to specify input and output as well as schema matching options. The syntax of the web.instata configuration file is Turtle.
In order to publish a HTML+microdata document from a CSV file using a configuration file:
python web.instata.py -c {path to configuration file}
Example:
python web.instata.py -c web.instata.config
... where a configuration file looks as follows:
@prefix dc: <http://purl.org/dc/terms/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix c: <#> .
c:default-config
# publishing options
c:csv_input "test/potd_0.csv" ;
c:output_base_uri <http://example.org/instata/potd_0> ;
c:schema_matching "dbpedia-2011-07-31.rdf" ;
# directory and file options
c:templates_dir "templates/" ;
c:mappings_dir "mappings/" ;
c:output_dir "output/" ;
c:base_template "base.tpl" ;
c:base_style_file "web.instata-style.css" ;
# metadata about the config file
dc:title "The default configuration for web.instata" ;
dc:modified "2011-08-01"^^xsd:date ;
dc:creator <http://sw-app.org/mic.xhtml#i> ;
.
Note that in the configuration file you can specify one or more schema matchings (via c:schema_matching
) as well as customise the output (c:base_template
as well as c:base_style_file
). The last block (metadata) is for completeness purposes and currently not used by web.instata - you may remove it if you want.
In order to check if the input CSV file uses Schema.org terms:
python web.instata.py -v {path to CSV file} {base URI for publishing}
Example:
python web.instata.py -v test/potd_0.csv http://example.org/instata/potd_0
... and you should see the following on the command line:
[web.instata] validating schema ...
[web.instata] all column headings in the input file test/potd_0.csv seem to be valid Schema.org terms :)
In order to get a RDF/Turtle data dump from a CSV file:
python web.instata.py -d {path to CSV file} {base URI for publishing}
Example:
python web.instata.py -d test/potd_0.csv http://example.org/instata/potd_0
... and you should see something like the following on the command line:
@prefix dc: <http://purl.org/dc/terms/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix scsv: <http://purl.org/NET/schema-org-csv#> .
<http://example.org/instata/potd_0#table> a <http://purl.org/NET/schema-org-csv#Table>;
scsv:row <http://example.org/instata/potd_0#row:1>,
<http://example.org/instata/potd_0#row:2>,
<http://example.org/instata/potd_0#row:3>;
dc:source <http://example.org/instata/potd_0>;
dc:title "potd_0" .
<http://example.org/instata/potd_0#row:1> a <http://purl.org/NET/schema-org-csv#HeaderRow>;
scsv:cell <http://example.org/instata/potd_0#row:1,col:1>,
<http://example.org/instata/potd_0#row:1,col:2>,
<http://example.org/instata/potd_0#row:1,col:3>,
<http://example.org/instata/potd_0#row:1,col:4>;
dc:title "header" .
<http://example.org/instata/potd_0#row:1,col:1> dc:title "Recipe" .
<http://example.org/instata/potd_0#row:2> a <http://purl.org/NET/schema-org-csv#Row>;
scsv:cell <http://example.org/instata/potd_0#row:2,col:1>,
<http://example.org/instata/potd_0#row:2,col:2>,
<http://example.org/instata/potd_0#row:2,col:3>,
<http://example.org/instata/potd_0#row:2,col:4>;
dc:title "row 2" .
<http://example.org/instata/potd_0#row:2,col:1> a <http://schema.org/Recipe>;
rdf:value "bb" .
Thanks to asciiflow.com for providing a useful tool.
- DONE: use Bottle as templating system for output
- DONE: use DBpedia2Schema.org mapping to enrich output (related link, etc.)
- Use the JS dump from Schema.RDF.org to check if term exists
- Provide new option
-c
to check input data - Provide new option
-d
to create data dump in RDF
This software is Public Domain.