.. index:: reportdatabasedesign
Report Database Design
With the launch of [[MeanTimeBeforeFailure]] and :ref:`topcrashersbyurl-chapter` reports, we have added 8 new database tables. The call into the following categories:
What relational? Aren't they all?
Taking inspiration from data warehousing, we implement the datastore with dimensional modeling instead of relational modeling. The pattern used star schemas. Our implementation is a very lightweight approach as we don't automatically generate facts for every combination of dimensions. This is not a Pentaho competitor :)
Star schemas are optimized for:
- read only systems
- large amounts of data
- viewed from different levels of granularity
The dimensions and facts are the heart of the pattern.
Each dimension is property with various attributes and values at different levels of granularity. Example:
urldims - table would have columns: id domain url
- en-us.www.mozilla.com, ALL
- en-us.www.mozilla.com, http://en-us.www.mozilla.com/en-US/firefox/features/
We see a dimension that describes the property "url". This is useful for talking about crashes that happen on a specific url. We also see two levels of granularity, a specific URL as well as all urls under a domain.
Dimensions give us ways to slice and dice aggregate crash data, then drill down or rollup this information.
Note: time could be a dimension ( and usually is in data warehouses ). For MTBF and Top Crash By URl we don't treat it as a 1st class dimension as their are no requirements to roll it up ( say to Q1 crashes, etc) and having it be a column in the facts table provides better performance.
For a given report it will be powered by a main facts table.
topcrashurlfacts - table would have the columns: id count rank day productdims_id urldims_id signaturedims_id
A top crashers by url fact has two key elements, an aggregate crash count and the rank respective to others facts. So if we have static values for all dimensions and day, then we can see who has the most crashes.
The general pattern of creating a report is for a series of static and 1 or two variable dimensions, display the facts that meet this criteria.