Enhancement | Physical Model Generation
NOTE: This issue is work-in-progress
The physical model design produced by Wire and Claude for HR entities would have resulted in development sooner but a subsequent refactor in dbt, the semantic layer and subsequent dashboards.
Issues:
- wire assumes source of truth for data model is siloed in the .wire/project folder.
- Did not follow our style guide/convetions
- Generated fact table design assumed the incorrect table grain despite providing visibility of the source tables and outgoing dbt models.
- unable to refine the model successfully, Claude kept defaulting back to incorrect assumptions after one or more additional requests.
- column naming did not follow conventions
- model lacked attributes to facilitate broader analytics use cases resulting in a basic data model design
These are a working list of items so far that should help to improve the quality of the output.
Introduce questions into the workflow
Identify what the prompt and template is to generate the design for the physical model. User input is needed such as asking clarifying questions to improve design generation such as:
- What are the key business questions the model needs to address?
- Are there undocumented requirements expected in the near future?
- What is the expected grain of each entity?
- Which entity in the conceptual model does each new entity relate to?
Introduce example model designs
Add what does good look like
Add what does bad look like and why
Introduce Style guide
## Format
- Warehouse entity names must use the syntax wh_entity_group__entity_name_<fact|dim|xa> e.g. 'customer_' for wh_master_data__customer_dim.
- fact indicates a fact table, dim indicates a dimension table, and xa indicates an extended aggregate table
- Columns must follow the order, primary key, foreign key, natural key, dimensions, ranks/sequences, metrics, boolean, date or timestamps)
- Data types consider the data type hierarchy e.g. numeric rather than float or integer for inclusivity unless there's a reason not to such as significant volumes of data where other data types would be more performant.
- Data type varchar can be used in dbml instead of the data type string for VScode dbml syntax highlighting
- primary keys and foreign keys defined by the suffix _pk and _fk respectively
- Natural keys e.g. ID's from source system defined by the suffix _natural_key to avoid confusion with surrogate keys
- Date or Timestamp dimensions defined by the suffix _dt or _ts respectively. If a timzone is not in UTC, this should be indicated, e.g `created_cet_ts`..
- Column field descriptions must explain the field including business logic and any use cases.
## Considerations
- The physical model serves as a guide for development and refinements may be necessary.
- A table note must include sufficient information to understand the grain of the table, the source system, and any other relevant information for development.
## Physical Model Definition of Done
The physical model should capture all the requirements that are gathered when putting the model together. This includes defining the data sources,
the proposed structure of the model, and getting sign-off from the data analytics lead or stakeholder in the domain,
who is the intended owner and responsible for the data asset(s).
- The model captures all requirements that were gathered.
- Field definitions, include a clear column name, data types, and meta data (e.g. description and any constraints).
- Primary keys and foreign keys are clearly implemented.
- A clear description and data sources are documented for each table.
- Code review is completed and all feedback is addressed.
- Final review and sign-off from the data analytics lead or stakeholder is obtained.
- **Relevant** Documentation updated (e.g. Confluence)
Example: Engineer Design
table wh_human_resource__labour_fact [headercolor: #FFE4B2] {
// Primary Key
labour_pk varchar [pk, note: 'Surrogate primary key. generate_surrogate_key([labour_store_natural_key, labour_employee_natural_key, labour_hour_ts]). Composite — shift exploded to hourly grain for POS matching.']
// Foreign Keys
employee_fk varchar [ref: > wh_human_resource__employee_dim.employee_pk, note: 'Foreign key to wh_human_resource__employee_dim. generate_surrogate_key([labour_employee_natural_key]).']
store_fk varchar [ref: > wh_master__store_dim.store_pk, note: 'Foreign key to wh_master__store_dim. generate_surrogate_key([labour_store_natural_key]). Store ID normalised via normalise_store_id() macro.']
calendar_fiscal_fk varchar [ref: > wh_master__calendar_fiscal_dim.calendar_fiscal_pk, note: 'Foreign key to wh_master__calendar_fiscal_dim. generate_surrogate_key([cast(labour_dt as date)]).']
labour_hour_type_fk varchar [ref: > wh_human_resource__labour_hour_type_dim.labour_hour_type_pk, note: 'Foreign key to wh_human_resource__labour_hour_type_dim. generate_surrogate_key([labour_hour_type_code, labour_country_code]). Country-scoped join.']
// Natural Keys
labour_natural_key varchar [note: 'MANUS shift record identifier. Degenerate dimension — retained for traceability to source shift. Source: manus_realized_hours_master.manus_id']
labour_employee_natural_key varchar [note: 'Workday employee number (pre-resolved from MANUS). Source: manus_realized_hours_master.workday_employee_number']
labour_store_natural_key varchar [note: 'Store code normalised to 4-digit zero-padded format via normalise_store_id(). Source: manus_realized_hours_master.store_id']
labour_country_code varchar [note: 'Country code. Used for country-scoped labour hour type join. Source: manus_realized_hours_master.country_code']
labour_hour_type_code varchar [note: 'MANUS hour code ID. Cross-reference to labour_hour_type_dim. Source: manus_realized_hours_master.hour_code_id']
labour_hour_type_source_code varchar [note: 'MANUS hour code string value. Source: manus_realized_hours_master.hour_code']
// attributes
labour_source_system varchar [note: 'Source system identifier (e.g., MANUS, fallback to hours_curr if realized_hours_master is deprecated). Source: hardcoded based on source table.']
labour_day_part_name varchar [note: 'Named part of day (e.g., morning, afternoon, evening) derived from labour_hour_ts for easier analysis.']
labour_status varchar [note: 'Status of the labour record (e.g., confirmed, unconfirmed) derived from source data for filtering and analysis. Source: manus_realized_hours_master.confirmed']
// index, ranks
labour_shift_hour_sequence numeric [note: 'Sequence number of the hour slot within the original shift segment. Used for fatigue & productivity analysis. Derived from row_number() over partition by manus_id ordered by labour_hour_ts.']
labour_consecutive_workday_sequence numeric [note: 'Sequence number of consecutive workdays for the employee. Facilitate Working Hours Act analysis. Derived from lag() function on labour_dt per employee.']
// Metrics
labour_worked_hours numeric [note: 'Total hours worked by the employee in this hour slot. Decimal. Derived from shift explosion — proportion of clock hour covered by shift segment.']
labour_worked_minutes numeric [note: 'Total minutes worked by the employee in this hour slot. As per the source data (e.g. from Manus).']
labour_productive_hours numeric [note: 'Productive hours worked in this hour slot. Derived: labour_worked_hours WHERE hour_type is productive (hour_code_id = 1), else 0.']
labour_productive_minutes numeric [note: 'Productive minutes worked in this hour slot. As per the source data (e.g. from Manus) for productive hour types, else 0.']
labour_non_productive_hours numeric [note: 'Non-productive hours worked in this hour slot. Derived: labour_worked_hours WHERE hour_type is not productive, else 0.']
labour_non_productive_minutes numeric [note: 'Non-productive minutes worked in this hour slot. As per the source data (e.g. from Manus) for non-productive hour types, else 0.']
// Booleans
labour_is_confirmed boolean [note: 'Whether the shift entry was confirmed by manager. Source: manus_realized_hours_master.confirmed']
// Temporal
labour_shift_start_ts timestamp [note: 'Start timestamp of the labour shift segment. Source: manus_realized_hours_master.from_ts']
labour_shift_end_ts timestamp [note: 'End timestamp of the labour shift segment. Source: manus_realized_hours_master.to_ts']
labour_hour_ts timestamp [note: 'Timestamp representing the specific hour slot for this record. Derived from labour_hour_start_ts by truncating to the hour. Used for joining with POS sales data at hour level.']
Note: '''
Table Grain: One row per employee per store per date per hour-of-day per.
Description: Fact table recording actual labour hours at hourly granularity, covering all workforce categories (employees, contractors, temporary staff). Shift segments are exploded into hourly buckets to support direct matching with POS sales data at hour-of-day level.
Realized Productive Hours = SUM(labour_productive_hours), grouped by store + fiscal week.
Materialisation: incremental (insert_overwrite), partitioned by labour_hour_ts (day), clustered by [store_fk, labour_hour_fk].
2-day lookback window for late corrections.
Source: pj-hkm-manus-prod.manus_realized_hours.manus_realized_hours_master (primary)
Fallback: pj-hkm-manus-prod.LANDING_ZONE.hours_curr (if realized_hours_master deprecated)
Considerations: Test PK unicity as soon as possible.
Jira: DA-4445 — Create physical Data model for the hours entity
'''
}
Enhancement | Physical Model Generation
NOTE: This issue is work-in-progress
The physical model design produced by Wire and Claude for HR entities would have resulted in development sooner but a subsequent refactor in dbt, the semantic layer and subsequent dashboards.
Issues:
These are a working list of items so far that should help to improve the quality of the output.
Introduce questions into the workflow
Identify what the prompt and template is to generate the design for the physical model. User input is needed such as asking clarifying questions to improve design generation such as:
Introduce example model designs
Add what does good look like
Add what does bad look like and why
Introduce Style guide
Example: Engineer Design