Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
valid for Ontop Version 1
Table of Contents
NOTE: Please contact the authors if you would like to use this sample scenario in your publication.
The IMDB Movie Ontology is a sample scenario of using controlled vocabulary to semantically describe the movie domain in a OBDA system. The scenario uses IMDB data as its data source which contains comprehensive movie information such as title, genre, director and actor, e.g., "Finding Nemo", "Animation", "Andrew Stanton" and "Alexander Gould".
- movieontology.owl: the movie ontology file
- dbpedia_3.7.owl: an additional ontology to complement the movie ontology.
- movieontology.obda: the mapping file
- movieontology.q: the query file
- the SQL script to generate the full IMDB database (requires a postgres database)
- imdbpg.sql: the SQL script to generate the schema only/no data IMDB database (requires postgres, doesn't include any data, just tables). Use it if you don't have space to restore the full IMDB dataset.
The ontology is developed by the Department of Informatics at the University of Zurich. The ontology contains concept hierarchies for movie categorization that enables user-friendly presentation of movie descriptions in the appropriate detail.
There are several additions to the ontology terminology due to the requirements in the test cases:
- Concept: :TVSeries and :Actress
- Data property: dbpedia:productionStartYear, dbpedia:budget, dbpedia:gross, :countryCode, :companyName
- Object property: :hasCompanyLocation
- Replace xsd:gYear to xsd:int
- Replace xsd:date to xsd:dateTime
IMDB's data is provided as text files which need to be converted into an SQL file using a third party tool. Our IMDB raw data was downloaded in 2010 and the SQL script that you can download form this site was generated using IMDbPY. The tool generates an SQL schema (tables) appropriate for storing IMDB data and then reads the IMDB plain text data files to generate the SQL INSERT commands that populate the tables. It can generate Postgres, MySQL and DB2 SQL scripts. In this site you can download a Postgres compatible script.
Please find the final SQL script in our site http://obdavm.inf.unibz.it/mo/ as well as the import instructions.
As said before, our copy is from 2010, you may re-generate the SQL scripts using current data, however, please be aware that the latest version of IMDbPY generates slightly different tables and those differences may make some of our current mappings invalid. An update to the mappings may be required, if you do so, please let us know and we will update the files and acknowledge your contribution.
Copyright: Please refer to the copyright/license information for instructions on allowed usage. As stated in IMDB website that the data is NOT FREE but it can used for personal and non-commercial use. Information courtesy of The Internet Movie Database (http://www.imdb.com)/. Used with permission..
The mappings for this scenario are natural mappings that associate the data in the SQL database to the movie ontology's vocabulary. They are "natural" mappings, in that the only purpose of the mappings was to be able to query the data through the ontology. There was no intention to highlight the benefits of any algorithm or technique used in Quest.
The first version of the mappings for this scenario were developed by UNIBZ students as part as an lab assignment. The current mappings are the improved version of those create by our development team. Both the mapping and the ontology files can be loaded using Protégé and -ontopPro- plugin.
The total number of RDF triples (i.e. ABox assertions) coming from the dataset (the database together with the mappings) is 42,495,953 triples.
We included around 40 queries in this scenario, they are in the file movieontology.q and can be used to explore the data set. The queries have different complexities, going from very simple to fairly complex. In particular, the last 5 queries involve joins on VERY large dataset.
Note that some form of inference (beyond simple query evaluation) is involved in most of these queries, in particular, hierarchies are often involved.
Performance measures can be found here.