WordNet RDF export
HTML Python XSLT Other
Switch branches/tags
Nothing to show
Clone or download

README.md

WordNet RDF Framework

The framework consists of the following elements

  • WNRDF.py: The module for converting from the SQLite database to RDF
  • WNRDFWeb.py: The WSGI interface for rendering pages based on RDF data
  • WNFromRDF.py: Convert the RDF data back into SQLite format
  • WNRDFTest.py: Unit tests for the conversion

Other files are

  • build_ontology.py: Generates ontology.rdf from the current state of the database
  • footer and header: The nearly-static content returned at the beginning and end of HTML pages generated by WNRDFWeb
  • index.html: The static welcome page of WNRDFWeb (without header/footer)
  • ontology.rdf: The OWL ontology for WN
  • rdf2html.xsl: XSLT for generating HTML from RDF
  • sparql.html: The static page for the SPARQL query interface
  • sparql2html.xsl: XSLT for generating HTML from SPARQL XML results
  • sparql_load.py: Script for generating database for SPARQL queries
  • wnrdf.css: CSS file for the web interface
  • wordnet_3.1+.db: Please symlink the database here
  • wordnet.nt.gz: All the RDF data
  • flag/*.gif: Flags used to show language

The following files are not necessary to deploy the web interface

  • roundtrip.sh: Test if database can be converted to RDF and reloaded into SQL
  • wn_schema.py: Contains data from some of the small tables (e.g., linktype) as Python dicts
  • write_schema.py: Generates wn_schema.py from the SQLite database
  • write_sql_schema.sh: Generates the header (SQL CREATE commands) necessary to create a new SQLite database from an existing database
  • extra_indexes.sql: Generates extra indexes in the SQLite database to speed up page load time

Deployment

The application can be deployed either by configuring a WSGI application, this can be done by simply adding the following to the apache2.conf or httpd.conf file:

WSGIScriptAlias /rdf /path/to/WNRDFWeb.py

More details here

Or by starting the server as a standalone, e.g.,

python WNRDFWeb.py -p 8051 

Requirements

To run this the following are required

RDFLib and LXML should be part of most Linux distributions, e.g., in Ubuntu/Debian:

apt-get install python-rdflib python-lxml

rdflib-jsonld should also be installed see https://github.com/RDFLib/rdflib-jsonld

Mappings

All mappings are stored in the mapping folder. To run most of the mapping scripts it is necessary to create the mapping database; this can be done as follows:

gunzip wn20-30.csv.gz wn30-31.csv.gz w3c-wn20.csv.gz
sqlite3 mapping.db < mapping.sql
gzip wn20-30.csv wn30-31.csv w3c-wn20.csv

The following mappings can be generated and added to the database

All files can either be generated from the appropriate .py script or by running the NTriple file through WNFromRDF.py (see next section)

Adding Mappings to DB

First add the extra indexes and tables by

sqlite3 wordnet_3.1+.db < extra_indexes.sql    

Then, each of the mappings can be added as follows

zcat mapping/omwn.nt.gz | python WNFromRDF.py | sqlite3 wordnet_3.1+.db
zcat mapping/uby.nt.gz | python WNFromRDF.py | sqlite3 wordnet_3.1+.db
zcat mapping/vn.nt.gz | python WNFromRDF.py | sqlite3 wordnet_3.1+.db
zcat mapping/w3c-synsets.nt.gz | python WNFromRDF.py | sqlite3 wordnet_3.1+.db

Generating dumps and enabling SPARQL

The file wordnet.nt.gz should be generated each time the database is changed, this is done as follows

python WNRDF.py
gzip wordnet.nt

Once the dump is generated the SPARQL index in the folder store must be generated as follows

python sparql_load.py