Support one-to-many binding between a logical dataset and the physical table, view, or query #61
Replies: 3 comments 3 replies
-
|
One question I have here is how you think about the differences of things like your cube and bronze layer. You can't just arbitrarily swap these out, because the cube will not be able to handle the same questions as the raw data does. For example, a cube will only be able to answer the pre-aggregates that are built for it. The unaggregated table can answer any aggregations. So, I don't think we really want the same logical table for both, do we? I had imagined that we would actually want different logical tables for each that describe what they can do. For example, for a pre-aggregated table, I imagined that you would have two different logical tables linked through some grain based description. I think the modeling for those aggregation tables is going to be more intricate. This is because it is not just a field mapping, but an operational mapping. e.g. the "mapping" from the bronze table to the aggregation table is more: For a cube it would be a little more intricate, because the LOD is more flexible. However, I think the same physics exists. |
Beta Was this translation helpful? Give feedback.
-
|
In addition to my comment on mapping from unaggregated to aggregated data, I think it is also worth thinking through what we could do by just adding general composablity. The loose coupling you are looking for could be base models that create the core logical mapping, then more specialized models pull them in and extend with more specialized functions. |
Beta Was this translation helpful? Give feedback.
-
|
Hey @redblackcoder We have spinned off a composability working group to address this. Are you interested in joining it ? I suggest start with joining the slack workspace for OSI https://join.slack.com/t/opensemanticx/shared_invite/zt-3pq1j0lid-tQBbEvAngAvz0I0vZm~HJw and ping [@dianne Wood] |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Problem
The current spec models a dataset (a logical dataset schema) and the underlying physical in a tight 1:1 coupling, this would cause many duplicate datasets and semantic models to be authored to support varity of underlying physical dataset, resulting in goverance overhead to ensure the semantic models and metrics are correct and in sync.
Consider a dataset with high cardinality fields, to support more in-depth analysis in pipelines like Spark, and a pre-aggregated derived dataset constructed from it, like a cube. They both share the same logical semantic model and can answer the same questions through metric definition, however, with current proposed 1:1 mapping, woould require two completely independent semantic models to be defined.
Proposal
Extract the binding between a logical dataset and its physical counterpart in a separate structure to allow re-use of the same semantic model, layered over multiple physical datasets, thus avoiding duplication and providing a single high quality semantic model which can work across varied systems.
Partial Bindings
The binding, as defined based on the modified spec, need not be complete, i.e. bind all the logical fields in the semantic model. This provides a flexible model where a semantic model can be complete, but the underlying physical entities can be incomplete, through partial bindings.
A semantic model can be described as a whole, without limiting to presence of every aspect of it in one physical system. At runtime, dynamically, the appropriate bindings can be choosen based on what aspects of the model are in use.
An example where such loose coupling can be useful is the supported dimensions for a metric to slice and dice with. In a big data pipeline, supporting a high cardinality dimension is fine, however, in a OLAP dataset which needs low latency query support for interactive visualizations, such dimensions would be unavailable.
Beta Was this translation helpful? Give feedback.
All reactions