Skip to content

Allow multiple transcript rows for the same Asterisk uniqueid#47

Merged
tommaso-ascani merged 7 commits intofix_transcriptionsfrom
nounique
May 4, 2026
Merged

Allow multiple transcript rows for the same Asterisk uniqueid#47
tommaso-ascani merged 7 commits intofix_transcriptionsfrom
nounique

Conversation

@Stell0
Copy link
Copy Markdown
Collaborator

@Stell0 Stell0 commented Apr 30, 2026

Summary

This change removes the UNIQUE constraint from transcripts.uniqueid and updates the persistence flow to track each stored transcript by its internal id.

Why

With recent ns8-nethvoice changes for transferred calls (nethesis/ns8-nethvoice#803), a single Asterisk call can produce multiple recording fragments that share the same uniqueid but belong to different call legs. The previous schema and write path assumed one row per uniqueid, so Satellite would either overwrite fragments through ON CONFLICT (uniqueid) or lose correct state tracking.

What changed

  • transcripts.uniqueid is now non-unique and remains indexed.
  • Startup schema bootstrap removes the legacy unique constraint from existing databases.
  • POST /api/get_transcription creates one transcript row per persisted request.
  • Raw transcript and state transitions are now updated by transcript_id, not by uniqueid.
  • Tests and README were updated.

Impact

This preserves all transferred-call fragments, prevents transcript/state corruption when multiple uploads share the same uniqueid, and keeps AI enrichment tied to the correct stored fragment. No HTTP or MQTT payloads changed.

@Stell0
Copy link
Copy Markdown
Collaborator Author

Stell0 commented Apr 30, 2026

nethcti-middleware Regressions And Fixes
Reviewed on branch fix_transcriptions at commit e4b4631.

  • Deterministic read regression: transcription and summary helpers still use unordered LIMIT 1, so duplicate transcript rows can return an arbitrary fragment. Fix by selecting one canonical row per uniqueid, preferably the latest non-deleted row ordered by updated_at DESC, id DESC, unless product explicitly wants aggregation.
  • Duplicate list regression: summary/status list logic can return multiple entries for the same uniqueid because it reads every matching transcript row. Fix with a canonical-row CTE/subquery or DISTINCT ON (uniqueid).
  • Update fan-out regression: manual summary updates currently affect every row with the same uniqueid. Fix by updating only the canonical row.
  • Delete fan-out regression: summary deletion currently marks every fragment deleted. Fix by deleting only the canonical row.
  • Watch/HEAD regression: summary state checks and watcher logic can observe the wrong row and misreport done, failed, or missing summary/transcription. Fix by applying the same canonical-row rule used elsewhere.
  • Caller/callee metadata regression: transferred-call CDR metadata still uses single-row selection and can display the wrong leg. Fix with a deterministic metadata selection rule, or merge data across matching CDR rows if that is the intended UX.
  • Missing coverage: add tests with at least two transcript rows sharing the same uniqueid and verify deterministic reads, deduplicated list output, and non-fan-out update/delete behavior.
  • No change needed for authorization: participation checks already iterate all matching CDR rows, so they are not relying on a single-row assumption.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the transcription persistence flow so Satellite can store multiple transcript fragments that share the same Asterisk uniqueid, which is needed for transferred/multi-leg calls after the related ns8-nethvoice changes.

Changes:

  • Removes the database uniqueness assumption on transcripts.uniqueid and migrates existing schemas to a non-unique indexed column.
  • Switches persistence/state tracking to use the transcript row’s internal id, while also storing optional linkedid, src_number, and dst_number.
  • Expands tests and README coverage for the new persistence behavior, including silent-audio handling.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
db.py Updates schema bootstrap/migration and changes transcript persistence/state helpers to work with row IDs instead of assuming uniqueid uniqueness.
api.py Passes new participant fields through the transcription endpoint, initializes a transcript row up front, and adds empty-audio handling.
tests/test_db.py Adds coverage for schema migration, new participant columns, insert/update-by-id behavior, and latest-row state updates.
tests/test_api.py Adds endpoint tests for persisted transcript IDs, linked/participant field forwarding, and silent-audio success handling.
README.md Documents the non-unique uniqueid model, new optional persisted fields, and the silent-audio behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread api.py Outdated
Comment thread db.py Outdated
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
@tommaso-ascani tommaso-ascani changed the base branch from main to fix_transcriptions May 4, 2026 08:20
Stell0 and others added 3 commits May 4, 2026 10:20
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
@tommaso-ascani tommaso-ascani merged commit bc11c48 into fix_transcriptions May 4, 2026
5 checks passed
@tommaso-ascani tommaso-ascani deleted the nounique branch May 4, 2026 09:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants