Skip to content

refactor: decouple Record from Deposition with source-agnostic model#107

Merged
rorybyrne merged 4 commits into
mainfrom
103-refactor-make-record
Mar 25, 2026
Merged

refactor: decouple Record from Deposition with source-agnostic model#107
rorybyrne merged 4 commits into
mainfrom
103-refactor-make-record

Conversation

@rorybyrne

Copy link
Copy Markdown
Contributor

Summary

  • Replace deposition_srn + indexes on Record with a discriminated union RecordSource (DepositionSource, HarvestSource) + convention_srn, so records can originate from any pathway
  • Confine HookDefinition to the validation boundary — downstream events carry only expected_features: list[str], not OCI runtime specs
  • Remove all filesystem paths from domain events — storage adapters resolve paths from source identifiers via FeatureStoragePort.get_hook_output_root()
  • Simplify HookInputs: replace deposition_srn with run_id + files_dir

Test plan

  • 930 unit tests passing (916 baseline + 14 net new)
  • RecordSource discriminated union: serialization, deserialization, discriminator, unknown type rejection
  • RecordDraft construction and validation
  • Deposition-sourced and harvest-sourced records publish correctly via RecordService
  • Full event chain: DepositionSubmitted → Validate → Approve → Record → InsertFeatures
  • Feature insertion resolves paths from source type+id via storage adapter
  • K8s and OCI runners use run_id for job naming/labels
  • Lint (ruff) and type check (ty) clean

Replace deposition_srn + indexes on Record with a discriminated union
RecordSource (DepositionSource, HarvestSource) + convention_srn. Records
can now originate from any pathway without hardwiring to depositions.

Key changes:
- RecordSource discriminated union in shared kernel (domain/shared/model/source.py)
- RecordDraft value object as input to publish_record()
- HookDefinition confined to validation boundary — downstream events carry
  only expected_features: list[str], not OCI runtime specs
- Events tightened: files_dir/staging_dir removed from all domain events,
  storage adapters resolve paths from source identifiers
- HookInputs simplified: deposition_srn replaced with run_id + files_dir
- FeatureStoragePort.get_hook_output_root() resolves paths from source
  type + id, handler passes resolved path to FeatureService
- IndexRef and find_by_deposition() removed, find_by_source() added
- Alembic migration: drop deposition_srn/indexes, add source (JSONB) +
  convention_srn with functional unique index

Closes #103
@github-actions

github-actions Bot commented Mar 25, 2026

Copy link
Copy Markdown

Code Coverage

Package Line Rate Complexity Health
. 93% 0
application 100% 0
application.api 100% 0
application.api.rest 82% 0
application.api.v1 82% 0
application.api.v1.routes 64% 0
application.event 100% 0
domain 100% 0
domain.auth 100% 0
domain.auth.command 90% 0
domain.auth.event 100% 0
domain.auth.model 93% 0
domain.auth.port 99% 0
domain.auth.query 93% 0
domain.auth.service 91% 0
domain.auth.util 100% 0
domain.auth.util.di 79% 0
domain.curation 100% 0
domain.curation.adapter 100% 0
domain.curation.command 100% 0
domain.curation.event 100% 0
domain.curation.handler 92% 0
domain.curation.model 100% 0
domain.curation.port 100% 0
domain.curation.query 100% 0
domain.curation.service 100% 0
domain.deposition 100% 0
domain.deposition.adapter 100% 0
domain.deposition.command 87% 0
domain.deposition.event 100% 0
domain.deposition.handler 100% 0
domain.deposition.model 98% 0
domain.deposition.port 100% 0
domain.deposition.query 86% 0
domain.deposition.service 97% 0
domain.deposition.util.di 94% 0
domain.discovery 100% 0
domain.discovery.model 100% 0
domain.discovery.port 100% 0
domain.discovery.query 100% 0
domain.discovery.service 94% 0
domain.discovery.util 100% 0
domain.discovery.util.di 94% 0
domain.export 100% 0
domain.export.adapter 100% 0
domain.export.command 100% 0
domain.export.event 100% 0
domain.export.model 100% 0
domain.export.port 100% 0
domain.export.query 100% 0
domain.export.service 100% 0
domain.feature 100% 0
domain.feature.event 100% 0
domain.feature.handler 100% 0
domain.feature.model 0% 0
domain.feature.port 100% 0
domain.feature.service 96% 0
domain.feature.util 100% 0
domain.feature.util.di 100% 0
domain.index 100% 0
domain.index.event 100% 0
domain.index.handler 75% 0
domain.index.model 84% 0
domain.index.service 100% 0
domain.record 100% 0
domain.record.adapter 100% 0
domain.record.command 100% 0
domain.record.event 100% 0
domain.record.handler 100% 0
domain.record.model 100% 0
domain.record.port 100% 0
domain.record.query 100% 0
domain.record.service 90% 0
domain.search 100% 0
domain.search.adapter 100% 0
domain.search.command 100% 0
domain.search.event 100% 0
domain.search.model 100% 0
domain.search.port 100% 0
domain.search.query 100% 0
domain.search.service 100% 0
domain.semantics 100% 0
domain.semantics.command 94% 0
domain.semantics.event 100% 0
domain.semantics.handler 100% 0
domain.semantics.model 100% 0
domain.semantics.port 100% 0
domain.semantics.query 90% 0
domain.semantics.service 100% 0
domain.semantics.util 100% 0
domain.semantics.util.di 93% 0
domain.shared 94% 0
domain.shared.authorization 87% 0
domain.shared.model 94% 0
domain.shared.port 100% 0
domain.source 100% 0
domain.source.event 100% 0
domain.source.handler 92% 0
domain.source.model 100% 0
domain.source.port 100% 0
domain.source.schedule 67% 0
domain.source.service 98% 0
domain.validation 100% 0
domain.validation.adapter 100% 0
domain.validation.command 0% 0
domain.validation.event 100% 0
domain.validation.handler 90% 0
domain.validation.model 85% 0
domain.validation.port 100% 0
domain.validation.query 100% 0
domain.validation.service 89% 0
domain.validation.util.di 92% 0
infrastructure 73% 0
infrastructure.auth 56% 0
infrastructure.event 79% 0
infrastructure.http 92% 0
infrastructure.index 82% 0
infrastructure.k8s 78% 0
infrastructure.messaging 100% 0
infrastructure.oci 54% 0
infrastructure.persistence 85% 0
infrastructure.persistence.adapter 63% 0
infrastructure.persistence.mappers 62% 0
infrastructure.persistence.repository 32% 0
infrastructure.s3 11% 0
infrastructure.source 91% 0
sdk 100% 0
sdk.index 100% 0
util 100% 0
util.di 77% 0
Summary 79% (5936 / 7529) 0

@greptile-apps

greptile-apps Bot commented Mar 25, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR replaces the tight Record ↔ Deposition coupling with a source-agnostic RecordSource discriminated union, enabling records to originate from any pathway (deposition today, harvest tomorrow). The introduction of RecordDraft as a clean input value-object to RecordService.publish_record(), the confinement of HookDefinition to the validation boundary (only expected_features: list[str] leaks downstream), and the removal of filesystem paths from domain events are all well-executed architectural improvements.

Key changes:

  • New RecordSource union (DepositionSource | HarvestSource) in source.py with a Pydantic discriminator; the 14 new tests cover serialization, deserialization, and unknown-type rejection.
  • RecordDraft value object decouples RecordService from deposition-specific fields; publish_record now accepts a single typed draft.
  • RecordPublished event now carries source: RecordSource + convention_srn (made non-optional) + expected_features instead of raw HookDefinition objects and a files_dir string.
  • InsertRecordFeatures resolves hook_output_dir from the storage port at event-handling time rather than reading it from the event payload.
  • Database migration drops deposition_srn/indexes, adds source (JSONB) + convention_srn, with a globally-unique index on (source->>'type', source->>'id').
  • One P1 concern: The migration file has a down_revision/docstring Revises: mismatch ("add_device_authorizations" vs "consumer_group_delivery"), which could produce two Alembic migration heads if not corrected.

Confidence Score: 4/5

  • Safe to merge after confirming the migration chain is correct; domain model and event changes are well-tested.
  • The refactoring is thorough, well-tested (930 passing, 14 net new), and previous review concerns about storage adapter coverage have been noted. The single concrete blocker before merge is verifying — and correcting if needed — the down_revision/docstring mismatch in the Alembic migration, which could silently create two migration heads. All other findings are cosmetic (stale parameter name in job_name).
  • server/migrations/versions/source_agnostic_records.py — verify down_revision matches the actual current migration head.

Important Files Changed

Filename Overview
server/osa/domain/shared/model/source.py Introduces the RecordSource discriminated union (DepositionSource
server/migrations/versions/source_agnostic_records.py Adds the source (JSONB) and convention_srn columns, drops deposition_srn/indexes. Contains a down_revision vs docstring Revises: mismatch (add_device_authorizations vs consumer_group_delivery) that could create two migration heads.
server/osa/domain/record/model/draft.py New RecordDraft value object cleanly encapsulates the inputs to RecordService.publish_record(), carrying source, metadata, convention_srn, and expected_features.
server/osa/domain/record/event/record_published.py Replaces deposition_srn/hooks/files_dir with source: RecordSource, convention_srn: ConventionSRN (now required, not Optional), and expected_features: list[str]; the non-null convention_srn tightens the contract.
server/osa/domain/feature/handler/insert_record_features.py Resolves hook_output_dir from the storage port at runtime (via source.type + source.id) instead of reading it from the event payload; delegates cleanly to FeatureService.
server/osa/infrastructure/persistence/mappers/record.py Uses a module-level TypeAdapter(RecordSource) to deserialize the JSONB source column; _source_adapter.dump_python(…, mode="json") correctly returns a JSON-serializable dict for JSONB insertion.
server/osa/infrastructure/k8s/naming.py label_value correctly generalised to accept `str
server/osa/infrastructure/persistence/tables.py Table definition updated to match migration: source JSONB + convention_srn Text, with a globally-unique index on (source->>'type', source->>'id').

Sequence Diagram

sequenceDiagram
    participant DS as DepositionService
    participant VH as ValidateDeposition
    participant VS as ValidationService
    participant AH as AutoApproveCuration
    participant CH as ConvertDepositionToRecord
    participant RS as RecordService
    participant IH as InsertRecordFeatures
    participant FS as FeatureStoragePort

    DS->>VH: DepositionSubmittedEvent(hooks, convention_srn)
    VH->>VS: run_validation(hooks)
    Note over VS: Extracts run_id from deposition_srn<br/>Resolves files_dir via hook_storage
    VS-->>VH: ValidationRun + HookResults
    VH->>AH: ValidationCompleted(expected_features=[h.name])
    Note over VH: HookDefinitions confined here ─<br/>only feature names leak downstream

    AH->>CH: DepositionApproved(expected_features, convention_srn)
    CH->>RS: publish_record(RecordDraft(DepositionSource, metadata, expected_features))
    RS-->>IH: RecordPublished(source, convention_srn, expected_features)

    IH->>FS: get_hook_output_root(source.type, source.id)
    FS-->>IH: hook_output_dir
    IH->>FS: hook_features_exist(hook_output_dir, feature_name)
    IH->>FS: read_hook_features(hook_output_dir, feature_name)
Loading

Reviews (2): Last reviewed commit: "refactor: update RecordPublished event c..." | Re-trigger Greptile

Comment on lines +56 to +61
def get_hook_output_root(self, source_type: str, source_id: str) -> str:
"""Resolve the root directory for a given source type and id."""
if source_type == "deposition":
srn = DepositionSRN.parse(source_id)
return str(self._dep_dir(srn))
raise ValueError(f"Unknown source type: {source_type}")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 HarvestSource not supported — ValueError at runtime

Both FilesystemStorageAdapter.get_hook_output_root and S3StorageAdapter.get_hook_output_root raise ValueError(f"Unknown source type: {source_type}") for any source type other than "deposition". Because HarvestSource is now a first-class RecordSource type, any RecordPublished event emitted for a harvest-sourced record will cause InsertRecordFeatures.handle to crash at the get_hook_output_root("harvest", ...) call — before any feature storage path is even looked up.

The test TestInsertRecordFeaturesHarvestSource masks this because it mocks the storage:

storage.get_hook_output_root.return_value = "/fake/harvest/dir"

The same gap exists in S3StorageAdapter (server/osa/infrastructure/s3/storage.py, line 181).

Either add a concrete path layout for harvest sources in both adapters, or guard feature insertion with a no-op path for source types whose storage layout is not yet defined (e.g. return an empty-features path and warn instead of raising).

Comment thread server/osa/infrastructure/persistence/repository/record.py Outdated
Remove convention_srn from unique constraint and drop source_type index
to simplify database schema. Remove unused find_by_source method from
repository interface and implementation. Delete empty value objects file.
… convention fields

Add ConventionSRN import and replace deposition_srn parameter with
source and convention_srn fields in make_record_published helper
function to align with updated event structure
@rorybyrne

Copy link
Copy Markdown
Contributor Author

@greptile

@rorybyrne rorybyrne merged commit 41088be into main Mar 25, 2026
1 check passed
@rorybyrne rorybyrne deleted the 103-refactor-make-record branch March 25, 2026 11:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant