-
Notifications
You must be signed in to change notification settings - Fork 2
Home
The RDF Data Cube vocabulary (QB) is a W3C standard that allows to publish statistical data on the web using the RDF (Resource Description Framework) standard. The QB4OLAP vocabulary is an extension to the Data Cube vocabulary that allows to represent OLAP cubes in RDF, and also allows to implement OLAP operators (such as Roll-up, Slice, and Dice) as SPARQL queries directly on this RDF representation.
Business intelligence (BI) comprises a collection of techniques used for extracting and analyzing business data, to support decision-making. As part of the BI machinery, On-Line Analytical Processing (OLAP) tools and algorithms allow querying large multidimensional databases called data warehouses (DW).
Since the mid 90's, DW and BI applications have been built to consolidate enterprise business data, allowing taking timely and informed decisions based on these data. The availability of enormous amounts of data from different domains is calling for a shift in the way DW and BI practices are being carried out. It is becoming clear that the traditional approach, where day-to-day business data produced in an organization is collected, cleansed and consolidated in a DW for data analysis, needs to be revised. We believe that the Semantic Web (SW) will most likely be an scenario where OLAP-style data analysis will be crucial in the near future.
Therefore, SW technologies will be needed to model, manipulate, and share multidimensional data. To achieve this, the definition of a precise vocabulary allowing representing adequately OLAP data on the SW is required. Over these vocabulary, multidimensional models and OLAP operators can be defined. The Data Cube vocabulary (a W3C standard since january 2014) follows a model initially devised for analyzing statistical data, which does not cover all the needs of a vocabulary oriented to support BI analysis on the SW.
The present work is oriented to cover such need. Concretely, we propose the QB4OLAP vocabulary which adds to QB the capability of representing dimension levels, level members, rollup relations between levels and level members, and associating aggregate functions to measures. The QB4OLAP vocabulary is compatible with QB, in the sense that QB4OLAP cube schemas can be built on top of data cube instances (observations) already published using QB. Existing applications, or applications that do not require OLAP style-analysis, can still use the QB schema and instances. Therefore, the cost of adding OLAP capabilities to existing datasets is the cost of building the new schema, in other words, the cost of building the analysis dimensions. Conversely, cubes built over QB4OLAP from scratch can be transformed into QB cubes in order to be exploited by existing applications supporting the latter. As in the case above, the cube instances remain unchanged.
Although QB can define the structure of a cube (via the Data Structure Definition), it does not provide a mechanism to represent an OLAP dimension structure (i.e., the dimension levels and the relationships between levels). However, QB allows representing hierarchical relationships between level members in the dimension instances, using the SKOS vocabulary.
Also, the QB vocabulary does not provide direct support for OLAP operations. In spite of this, OLAP operations can be defined over a structure based on QB, although in a limited way. For example, Roll-Up is not supported since dimension levels are not modeled, neither are the aggregate functions for each measure (the latter also prevents support for the Slice operator). The same issues apply to Drill-down. Finally, Dice is partially supported by QB, given that the FO formula σ can only involve cube measures (again, because of the lack of support of dimension levels) It is worth noting that Slices, as defined in QB, represent subsets of observations. Moreover, they are not defined as operators over an existing cube (as in OLAP), but as new structures and new instances (observations) that may be considered as the application of constraints over an existing dataset.
The QB4OLAP vocabulary proposes to extend QB in order to support the following concepts, defined in classic MD models for OLAP and not modeled in QB:
- Dimension structure: the structure of a dimension is defined in terms of levels, which are hierarchically organized through rollup relations.
- Dimension instances: level instances are called level members, and there is a relation between level members from different levels. In QB4OLAP the relationship between level members (from most specific to more general concepts) is modeled using skos:broader property
- Aggregate functions: aggregate functions are used to compute measure aggregate values when performing OLAP operations (e.g: Roll-Up)
In OLAP, data are organized as hypercubes whose axes are called dimensions. Each point in this multidimensional space is mapped into one or more spaces of measures, representing facts that are analyzed along the cube’s dimensions. Dimensions are structured in hierarchies that allow analysis at different aggregation levels. The actual values in a dimension level are called members, which can also have properties or attributes. Members in a dimension level must have a corresponding member in the upper level in the hierarchy, and this correspondence is defined through so-called rollup functions.
A well-known set of operations is defined over cubes. We present some of these operations next. They are based on the recently proposed Cube Algebra CA.
Roll-Up summarizes data at a higher level in a dimension hierarchy. It receives a cube C , a dimension D ∈ C , a dimension level l,,u,, ∈ D such that l,,l,, →∗ l,,u,, , and a set of aggregate functions f , Roll-Up(C, D, l,,u,, , f) returns a new cube C′ where measures are aggregated along D up to the level l,,u,,. Analogously, Drill-Down disaggregates previously summarized data, and can be considered the inverse of Roll-Up. Note that this requires to store the aggregation path.
Slice receives a cube C,and a dimension D ∈ C, and removes D from the cube. Measure values in the cube are aggregated along dimension D up to level 'All' before removing the dimension, using the aggregate functions associated with each measure.
Dice receives a cube C , and a first order formula σ over measures and levels in C, and returns a new cube C′ such that the elements in C′ are the ones that satisfy σ.
C. Ciferri, R. Ciferri, L. I. Gómez, M. Schneider, A.A. Vaisman, and E. Zimányi. Cube Algebra: A Generic User-Centric Model and Query Language for OLAP Cubes. IJDWM 9(2): 39-65 (2013)
Household projections by district, England, 1991-2021,URL: http://opendatacommunities.org/data/households/projections/totalhouseolds
Linked Data, URL: http://linkeddata.org/
Ordnance Survey Administrative Geography Ontology v1, URL: http://www.ordnancesurvey.co.uk/ontology/admingeo.owl
Richard Cyganiak, Dave Reynolds. The RDF Data Cube Vocabulary. W3C Recommendation. URL: http://www.w3.org/TR/vocab-data-cube
Frank Manola; Eric Miller. RDF Primer. 10 February 2004. W3C Recommendation. URL: http://www.w3.org/TR/2004/REC-rdf-primer-20040210/
Antoine Isaac; Ed Summers. SKOS Simple Knowledge Organization System Primer. 18 August 2009. W3C Note. URL: http://www.w3.org/TR/2009/NOTE-skos-primer-20090818/
Eric Prud'hommeaux, Gavin Carothers. Turtle:Terse RDF Triple Language. 09 August 2011. W3C Working Draft. URL: http://www.w3.org/TR/2011/WD-turtle-20110809/