Skip to content

feat: enrich feedback records with translated text#90

Merged
xernobyl merged 17 commits into
feat/ENG-1254_tenant-settings-hub-language-enrichmentfrom
feat/ENG-1255_translate-feedback-records
Jun 25, 2026
Merged

feat: enrich feedback records with translated text#90
xernobyl merged 17 commits into
feat/ENG-1254_tenant-settings-hub-language-enrichmentfrom
feat/ENG-1255_translate-feedback-records

Conversation

@xernobyl

@xernobyl xernobyl commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

What does this PR do?

Adds asynchronous translation enrichment to the Hub: each eligible open-text feedback answer (value_text) is translated into the tenant's configured target_language (from the ENG-1254 tenant settings) and stored on the feedback record, so Formbricks can display translated feedback (ENG-1253).

The design mirrors the existing embedding pipeline — event → River job → worker → provider → persist — and runs fully independently of embeddings (embeddings keep operating on the source-language value_text; translation is additive).

Resolves ENG-1255.

Stacked on #89 (ENG-1254). The base of this PR is the feat/ENG-1254_tenant-settings-hub-language-enrichment branch, so the diff shows only the ENG-1255 commits. GitHub will auto-retarget the base to main once #89 merges — please review #89 first.

What's included

  • Schema (migrations/013_add_feedback_record_translation.sql): two read-only columns on feedback_recordsvalue_text_translated TEXT and translation_lang_key VARCHAR(35) (the target locale actually used). They ride the row lifecycle, so the existing tenant-data purge cleans them up — no separate teardown.
  • Provider (TranslationProvider): enqueues a feedback_translation job on create (non-empty text) and on update when value_text or language changed — translation depends on both (unlike embeddings, which ignore source language). Reads the tenant target through a short-TTL settings cache (the cache ENG-1254 deferred to this ticket). Gated by TRANSLATION_PROVIDER+TRANSLATION_MODEL (deployment) and per-tenant target_language.
  • Worker (FeedbackTranslationWorker): loads the record, translates value_text — or copies it verbatim when the source base+script already equals the target (no provider call) — and persists via a dedicated SetTranslation that does not re-publish an event (no enrichment loop). Clears a stale translation when value_text becomes empty. Not-found completes; tenant-write-conflict retries.
  • Provider abstraction (TranslationClient + factory): OpenAI (openai-go/v3 chat) and Google (genai generate), explicit TRANSLATION_PROVIDER with no default. Replicates Formbricks' "professional translator" prompt and human-readable BCP-47 language names (x/text/.../display), temperature 0.
  • Backfill command (cmd/backfill-translations): producer-only command that enqueues eligible existing records (missing/stale translations) across all tenants.
  • Settings-triggered re-translation (SettingsChangeListener port + EnrichmentSettingsListener adapter + TenantTranslationBackfillWorker): changing a tenant's target_language enqueues a durable per-tenant backfill that keyset-paginates the tenant's stale records and re-translates them, so existing records pick up a new target — not only newly ingested ones. TenantSettingsService depends only on a translation-free port; the adapter dispatches a changed key → its backfill via a config-built map, generalizing to future enrichment settings (one map entry each). The backfill job is unique per tenant (one in-flight per tenant) and crash-safe via the idempotent translation_lang_key IS DISTINCT FROM target_language query; it runs off the request path. Clearing target_language leaves existing translations in place.
  • Observability: TranslationMetrics mirroring EmbeddingMetricshub_translation_jobs_enqueued_total, hub_translation_provider_errors_total{reason}, hub_translation_outcomes_total{status}, hub_translation_worker_errors_total{reason}, hub_translation_duration_seconds{status}.
  • API: value_text_translated + translation_lang_key added as read-only fields on the feedback-record output schema in openapi.yaml (consumed by ENG-1253).

Out of scope (deferred): "other"-option free text, survey Q&A context in the prompt, and the Formbricks-side display (ENG-1253).

How should this be tested?

Deployment gate + per-tenant target. Example config:

TRANSLATION_PROVIDER=openai          # or google / google-gemini
TRANSLATION_MODEL=gpt-4o-mini
TRANSLATION_PROVIDER_API_KEY=...     # openai / google
# Google Vertex instead of an API key:
# TRANSLATION_GOOGLE_CLOUD_PROJECT=... TRANSLATION_GOOGLE_CLOUD_LOCATION=...

Automated (DATABASE_URL → a test DB with migrations + River applied):

  • make build
  • make fmt && make lint → 0 issues
  • make tests → full integration suite green, including tests/feedback_translation_test.go (repo SetTranslation, the backfill-targets query, and the end-to-end worker pipeline: translate+persist, copy-on-source==target, clear-on-empty), tests/tenant_translation_backfill_test.go (per-tenant backfill: tenant isolation, keyset pagination, idempotency, worker fan-out end-to-end), and the OpenAPI contract test
  • make check-coverage → ✅ (61.9%, threshold 15%)

Manual smoke:

  1. Set the env above and a tenant target_language via PUT /v1/tenants/{tenant_id}/settings.
  2. POST a text feedback record → run hub-workerGET the record → value_text_translated is populated and translation_lang_key equals the target.
  3. A tenant with no target_language → record left untouched.
  4. PATCH the tenant's target_language to a new locale → a per-tenant re-translation backfill is enqueued; after hub-worker runs, existing records re-translate to the new target (translation_lang_key updates). Clearing it leaves existing translations untouched.
  5. go run ./cmd/backfill-translations over existing records → enqueues the eligible ones (the guaranteed, all-tenant path).

Checklist

Required

  • Filled out the "How to test" section in this PR
  • Read Repository Guidelines
  • Self-reviewed my own code
  • Commented on my code in hard-to-understand bits
  • Ran make build
  • Ran make tests (integration tests in tests/)
  • Ran make fmt and make lint; no new warnings
  • Removed debug prints / temporary logging
  • Up to date with main (branch forks from current main; stacked on the ENG-1254 branch per the note above)
  • If database schema changed: added migration in migrations/ with goose annotations and ran make migrate-validate

Appreciated

  • If API changed: added or updated OpenAPI spec and ran contract tests (make tests)
  • If API behavior changed: only two new read-only output fields were added (documented in openapi.yaml); no request/response behavior change
  • Updated docs in docs/ if changes were necessary (a narrative concept page can follow, like ENG-1254's)
  • Ran make tests-coverage for meaningful logic changes

xernobyl added 12 commits June 22, 2026 22:14
ENG-1255 (persistence): scaffold language-enrichment storage. Adds the
value_text_translated + translation_lang_key columns (migration 013),
surfaces them as read-only fields on FeedbackRecord and the OpenAPI output
schema, includes them in every feedback-record read path, and adds a
tenant-write-locked SetTranslation repo method that does not publish a
domain event (so writing a translation can't loop the enrichment pipeline).
The async translation worker that populates them follows.

ENG-1255
ENG-1255: add TranslationConfig (TRANSLATION_* env; translation stays disabled
unless provider and model are set) and a per-process TTL+LRU tenant-settings
cache — the cache deferred from ENG-1254. The cache wraps the settings accessor
so the translation enqueue gate and worker can resolve a tenant's target
language without a DB read per feedback event; staleness is TTL-bounded and
self-corrects because the worker records the target it actually used. Registers
a bounded "tenant_settings" cache metric label. TRANSLATION_BASE_URL is
normalized/validated like EMBEDDING_BASE_URL.

ENG-1255
Re-review follow-ups on the ENG-1255 foundation:
- Unify the feedback-record column/scan duplication onto one feedbackRecordColumns
  const + the existing scanFeedbackRecord helper. The repo previously inlined 4
  column lists + 4 scans while taxonomy_repository.go had its own scanner; that
  scanner is now the single one (extended to the two translation columns) and the
  taxonomy node-records query selects them too. Collapses 8 sites to 2 so column
  and scan order cannot drift (a silent runtime scan error otherwise).
- config: default TENANT_SETTINGS_CACHE_TTL in applyDefaults too, mirroring the
  other nested DurationSec defaults, so a dropped env-default tag can't silently
  disable the cache; disable explicitly via TENANT_SETTINGS_CACHE_SIZE=0.
- observability: cache metric descriptions now list tenant_settings as a label.
- tests: add a SetTranslation integration test (persist, clear, NotFound) — the
  highest-risk new write path, covering the value/NULL round-trip.

ENG-1255
ENG-1255: the LLM seam for translation enrichment. Adds a TranslationClient
interface (Translate(TranslateRequest) -> text) and a provider-agnostic factory
mirroring the embedding factory — openai / google / google-gemini, explicit
TRANSLATION_PROVIDER (no default), validating API key, base URL, and Gemini
project+location. The OpenAI and Google SDK wrappers gain a low-level Translate
(chat completions / generate-content at temperature 0); a prompt adapter builds
a Formbricks-style "professional translator" prompt using human-readable language
names (x/text/language/display), falling back to "original language" when the
source is unknown. Unit tests cover config validation, the language-name helper,
and prompt rendering.

ENG-1255
ENG-1255: the enqueue side of translation enrichment, mirroring the embedding
provider. Adds FeedbackTranslationArgs (River job, unique by record + target +
value_text hash) and TranslationProvider — on a feedback-record create (non-empty
open text) or an update that changed value_text, it enqueues a job, but only for
text fields with non-empty value_text whose tenant has a target language
configured (read via the settings cache). Failures are logged and swallowed so
record ingestion is never blocked.

Consolidates the two identical per-feature River inserter interfaces into one
shared RiverJobInserter, and reuses TenantSettingsReader for the target lookup
(the iface linter flagged the duplicates). Unit tests cover enqueue eligibility,
target gating, update-changed-fields, and the error/skip paths.

ENG-1255
ENG-1255: the worker side. FeedbackTranslationWorker loads the record, translates
value_text into the job's target language via the TranslationClient — or copies
value_text verbatim when the source language already shares the target's base
language (no LLM call) — and persists it through FeedbackRecordsService.SetTranslation.
Mirrors the embedding worker's error handling: a missing record completes the job, a
tenant write conflict retries, a provider error retries then fails on the final
attempt; value_text that became empty since enqueue is skipped. Registered in the
River wiring, gated on a configured TranslationClient.

Adds FeedbackRecordsService.SetTranslation (+ the repository interface method) and 8
worker unit tests (translate, source==target copy, skip-empty, not-found, provider
retry/fail, tenant-write-conflict, record-gone-on-write).

ENG-1255
Address review findings on the translation pipeline:

- Clear-on-empty was unwired: editing value_text to empty left a stale
  value_text_translated forever. The provider now enqueues on an update that
  empties value_text (mirroring the embedding provider) and the worker clears the
  translation (SetTranslation nil) instead of skipping; the repository nulls both
  translation columns when the translation is nil.
- The source==target short-circuit compared only the base language, so it would
  copy zh-Hans text as a zh-Hant "translation". It now compares base AND script
  (sameLanguageAndScript): different scripts translate, regional variants
  (en-US/en-GB) still copy.
- Update the openai/googleai package docs to mention chat/generate-content.

Adds provider (update-to-empty enqueues a clear) and worker (clears on empty,
translates across scripts) tests.

ENG-1255
Address the full-feature review:

- Undetermined source ("und") was coerced to a guessed base by likely-subtags, so
  the source==target short-circuit copied text untranslated. Guard against
  language.Und in sameLanguageAndScript so an undetermined tag always translates.
- A source-language correction never re-translated: the provider only re-enqueued
  on value_text changes and the dedup hash ignored the source language. The provider
  now also enqueues on a language change, and the dedup hash folds in the source
  language (translationContentHash), so a correction produces a fresh job.
- Drop the stray "backfill" mention from the job-args doc (no translation backfill
  exists) and document the TRANSLATION_* and TENANT_SETTINGS_CACHE_* env vars in
  .env.example, mirroring EMBEDDING_*.

Tests: provider re-enqueues on a language change + the content hash varies by source
language; worker translates (not copies) an "und" source.

ENG-1255
ENG-1255: activate the translation pipeline, gated on TRANSLATION_PROVIDER +
TRANSLATION_MODEL (disabled otherwise), mirroring the embedding wiring.

- cmd/api: register the FeedbackTranslationWorker + translations queue (so the
  combined api process also translates, like embeddings) and register the
  TranslationProvider with the message manager, resolving the tenant target
  language through a short-TTL CachedTenantSettings over tenant settings.
- cmd/worker: build the TranslationClient and populate RiverDeps so the dedicated
  worker process translates jobs.

End to end: the provider enqueues on a feedback-record create/update for a
text field whose tenant has a target language; the worker translates value_text
(or copies when source==target) and persists it.

ENG-1255
ENG-1255: re-translate existing records when translation is first enabled or a
tenant changes its target language.

- repository.ListTranslationBackfillTargets joins tenant_settings to find text
  records with non-empty value_text whose tenant has a target language and whose
  stored translation_lang_key differs from it (never translated, or stale).
- FeedbackRecordsService.BackfillTranslations enqueues a "backfill" job per target;
  the inserter/queue/attempts are caller-provided (a one-off command), so the shared
  service keeps no backfill-only dependency.
- cmd/backfill-translations wires the client + a River producer and runs it,
  mirroring cmd/backfill-embeddings. Gated on TRANSLATION_PROVIDER+MODEL.

Tests: service unit (enqueues one job per target; repo error propagates) and a
Postgres integration test for the backfill query (untranslated/stale included,
already-current and no-target excluded).

ENG-1255
Drive FeedbackTranslationWorker end to end against Postgres with a fake
TranslationClient: translate + persist (source value_text preserved), copy
verbatim when source base+script matches the target (no provider call), and clear
a stale translation when value_text is empty. Complements the repo-level
SetTranslation and backfill-query integration tests.

Also clarify the cmd/api comment: the API registers the translation worker only to
satisfy River's insert-time validation; jobs are processed by hub-worker, not in
the API process (mirrors the embedding wiring).

ENG-1255
Mirror the embedding pipeline's metrics on the translation provider and worker so the
new enrichment path is observable in production:

  hub_translation_jobs_enqueued_total
  hub_translation_provider_errors_total{reason}
  hub_translation_outcomes_total{status}
  hub_translation_worker_errors_total{reason}
  hub_translation_duration_seconds{status}

TranslationMetrics mirrors EmbeddingMetrics (nil when metrics are disabled) and is wired
through NewMetrics, RiverDeps, cmd/api, and cmd/worker. The provider records enqueue
counts plus settings-read/enqueue errors; the worker records outcome + duration + worker
errors at every branch (success, skip, clear, retry, failed_final). Backfill stays
producer-only (nil metrics).

Bounded label sets gate every reason/status to "other". Unlike the embedding worker —
which emits "tenant_write_conflict" without listing it, so it buckets to "other" — the
translation worker's reason set includes it, so the conflict metric is labeled accurately.

Covered by observability metric tests (real SDK manual reader), provider/worker recording
tests, and the existing pipeline tests.

ENG-1255
@github-actions

github-actions Bot commented Jun 23, 2026

Copy link
Copy Markdown

✱ Stainless preview builds

This PR will update the hub SDKs with the following commit message.

feat: enrich feedback records with translated text

Edit this comment to update it. It will appear in the SDK's changelogs.

hub-typescript studio · code · diff

Your SDK build had at least one "note" diagnostic, but this did not represent a regression.
generate ✅build ✅lint ✅test ✅

npm install https://pkg.stainless.com/s/hub-typescript/dd72696312e1c98c5543e0ff796eed4cdbb29477/dist.tar.gz
hub-openapi studio · code · diff

Your SDK build had at least one "note" diagnostic, but this did not represent a regression.
generate ✅


This comment is auto-generated by GitHub Actions and is automatically kept up to date as you push.
If you push custom code to the preview branch, re-run this workflow to update the comment.
Last updated: 2026-06-24 15:37:12 UTC

@xernobyl xernobyl marked this pull request as draft June 23, 2026 11:10
@xernobyl xernobyl marked this pull request as ready for review June 23, 2026 11:19
@xernobyl

Copy link
Copy Markdown
Contributor Author

@coderabbitai can I have a review here?

@xernobyl xernobyl marked this pull request as draft June 23, 2026 11:26
@coderabbitai

coderabbitai Bot commented Jun 23, 2026

Copy link
Copy Markdown
🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

REPO="formbricks/hub"
PR=90

echo "=== PR Info ==="
gh pr view $PR --repo $REPO --json title,body,baseRefName,headRefName,additions,deletions,changedFiles | jq '{title,baseRefName,headRefName,additions,deletions,changedFiles}'

echo ""
echo "=== Changed Files ==="
gh pr view $PR --repo $REPO --json files | jq -r '.files[].path'

Length of output: 2093


🏁 Script executed:

#!/bin/bash
set -euo pipefail

REPO="formbricks/hub"
PR=90

echo "=== Full Diff ==="
gh pr diff $PR --repo $REPO

Length of output: 135973



@xernobyl Sure! Here's the review.


Review: feat: enrich feedback records with translated text

Overall: This is a well-structured PR. The design mirrors the embedding pipeline closely, the test coverage is thorough (unit + integration), and the observability story is complete. I found one metrics inconsistency that could mislead alerting, one unbounded query, and a handful of minor polish items.


🐛 Bug — Metrics misclassify not-found as failed_final in the worker

File: internal/workers/feedback_translation.go

record, err := w.service.GetFeedbackRecord(ctx, args.FeedbackRecordID)
if err != nil {
    if w.metrics != nil {
        w.metrics.RecordWorkerError(ctx, "get_record_failed")
        w.metrics.RecordTranslationOutcome(ctx, "failed_final")   // ← emitted before the not-found check
        w.metrics.RecordTranslationDuration(ctx, time.Since(start), "failed_final")
    }
    ...
    if errors.Is(err, huberrors.ErrNotFound) {
        return nil   // ← job completes successfully
    }
    return fmt.Errorf("get feedback record: %w", err)
}

When the record has been deleted/purged, the job returns nil (correctly) but the metrics have already emitted get_record_failed + failed_final. This will fire failed_final alerts for a normal, expected condition. Compare with handleSetTranslationError which correctly emits skipped for the not-found case on the write side.

Suggested fix:

if errors.Is(err, huberrors.ErrNotFound) {
    if w.metrics != nil {
        w.metrics.RecordTranslationOutcome(ctx, "skipped")
        w.metrics.RecordTranslationDuration(ctx, time.Since(start), "skipped")
    }
    slog.Info("translation: record gone before read, skipping", "feedback_record_id", args.FeedbackRecordID)
    return nil
}
if w.metrics != nil {
    w.metrics.RecordWorkerError(ctx, "get_record_failed")
    w.metrics.RecordTranslationOutcome(ctx, "failed_final")
    w.metrics.RecordTranslationDuration(ctx, time.Since(start), "failed_final")
}
slog.Error("translation: get record failed", ...)
return fmt.Errorf("get feedback record: %w", err)

⚠️ Performance — ListTranslationBackfillTargets has no pagination

File: internal/repository/feedback_records_repository.go

const query = `
    SELECT fr.id, ts.settings->>'target_language'
    FROM feedback_records fr
    JOIN tenant_settings ts ON ts.tenant_id = fr.tenant_id
    WHERE fr.field_type = 'text'
        AND fr.value_text IS NOT NULL AND btrim(fr.value_text) <> ''
        AND COALESCE(ts.settings->>'target_language', '') <> ''
        AND fr.translation_lang_key IS DISTINCT FROM ts.settings->>'target_language'
    ORDER BY fr.id`

This is a full table scan with no LIMIT. On a large deployment with millions of text records, this loads them all into memory at once before the backfill loop begins. Consider chunking by cursor (WHERE fr.id > $1 ORDER BY fr.id LIMIT 1000) or at minimum documenting the behaviour so operators know to expect a slow, memory-heavy start for large databases.


⚠️ Config — MaxWorkers hardcoded to 1 in the API process

File: cmd/api/app.go (Block 1, the worker registration block)

queues[service.TranslationsQueueName] = river.QueueConfig{MaxWorkers: 1}

The worker app correctly uses cfg.Translation.MaxConcurrent (via NewRiverWorkersAndQueues), but the API process hardcodes MaxWorkers: 1, ignoring TRANSLATION_MAX_CONCURRENT. If the API process ends up processing jobs (which it can, per the comment), concurrency is silently capped at 1 regardless of operator config. Either use cfg.Translation.MaxConcurrent here too or document in .env.example that only hub-worker respects the concurrency setting.


🔍 Minor — SetTranslation accepts empty langKey with non-nil translated

File: internal/repository/feedback_records_repository.go

var langKeyArg any
if translated != nil {
    langKeyArg = langKey  // langKey could be ""
}

When translated != nil and langKey == "", the update stores an empty string in translation_lang_key alongside a non-nil value_text_translated. All current call sites pass a non-empty lang key, but a defensive guard would prevent a subtle data inconsistency:

if translated != nil {
    if langKey == "" {
        return fmt.Errorf("set feedback record translation: langKey must not be empty when translated is non-nil")
    }
    langKeyArg = langKey
}

🔍 Note — VARCHAR(35) may be tight for BCP-47 private-use tags

File: migrations/013_add_feedback_record_translation.sql

ALTER TABLE feedback_records ADD COLUMN translation_lang_key VARCHAR(35);

IANA BCP-47 tags with private-use extensions (e.g. zh-Latn-TW-pinyin-x-internal-variant) can exceed 35 characters. For the supported providers and standard locale codes this is fine, but bumping to VARCHAR(63) or TEXT at migration time avoids a future schema change — migrations on large tables are expensive.


🔍 Note — Potential index gap for backfill query

File: migrations/013_add_feedback_record_translation.sql

The ListTranslationBackfillTargets query filters on field_type = 'text' and joins on tenant_id. If a partial index on (tenant_id, field_type) WHERE field_type = 'text' doesn't already exist, the backfill will do a sequential scan of the full feedback_records table. Worth adding to this migration or a follow-up.


✅ Highlights

  • BCP-47 script comparison (sameLanguageAndScript) correctly handles zh-Hanszh-Hant and und — good edge-case coverage.
  • Content hash (translationContentHash) includes source language so a language-correction re-enqueues without flooding the queue.
  • No enrichment loopSetTranslation explicitly does not publish an event. Clean.
  • feedbackRecordColumns constant — eliminates the scan-order drift risk elegantly.
  • CachedTenantSettings error-non-caching behaviour tested explicitly.
  • Test coverage — unit + integration + backfill targets + pipeline end-to-end.
✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai

coderabbitai Bot commented Jun 23, 2026

Copy link
Copy Markdown

Review Change Stack

Walkthrough

This PR adds a feedback record translation pipeline. A database migration adds value_text_translated and translation_lang_key columns to feedback_records. New TranslationConfig and TenantSettingsCacheConfig types are added to the application config. OpenAI and Google AI clients gain Translate methods; a factory creates provider-agnostic TranslationClient instances. A FeedbackTranslationInserter interface is consolidated into a shared RiverJobInserter. The FeedbackRecordsRepository gains SetTranslation and ListTranslationBackfillTargets methods, and the service layer gains SetTranslation, BackfillTranslations, and an event-driven TranslationProvider backed by a TTL LRU CachedTenantSettings. A FeedbackTranslationWorker processes River jobs with BCP-47 language matching, copy semantics, and retry/failure mapping. Five new OTel metrics track the pipeline. Both the API and worker apps conditionally wire translation support, and a new backfill-translations CLI enqueues jobs for records missing translations.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 41.03% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The pull request title 'feat: enrich feedback records with translated text' clearly and concisely summarizes the primary change. It follows Conventional Commits format and accurately reflects the main objective of adding translation enrichment to feedback records.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed The PR description provides comprehensive, well-structured documentation covering objectives, implementation details, testing approach, and a completed checklist.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 14

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@cmd/api/app.go`:
- Line 306: The TranslationsQueueName queue configuration is hardcoded to
MaxWorkers: 1, which ignores the configured translation concurrency setting.
Replace the hardcoded value 1 with cfg.Translation.MaxConcurrent in the
QueueConfig for service.TranslationsQueueName so that the translation queue
respects the configured concurrency level from the configuration.

In `@cmd/backfill-translations/main.go`:
- Around line 118-120: The BackfillTranslations method in feedbackRecordsService
is loading all target records into memory before enqueueing, causing memory
exhaustion on large tenants. Refactor the BackfillTranslations method to
implement paginated or cursor-based reads of records instead of materializing
the entire result set at once, and enqueue records in configurable batch sizes
rather than all at once. Modify the method signature to accept batch size and
offset/cursor parameters, then update the call site where BackfillTranslations
is invoked (in the main function where it processes ctx, riverClient,
service.TranslationsQueueName, and maxAttempts) to handle multiple iterations
with pagination until all records are processed.

In `@internal/config/config.go`:
- Around line 326-334: The condition checking cfg.TenantSettingsCache.Size in
the if statement is using <= 0, which treats an explicitly set value of 0 (meant
to disable the cache) the same as an unset value, causing it to be reset to
2048. Change the condition from <= 0 to < 0 so that when
TenantSettingsCache.Size is explicitly set to 0, it is honored and disables the
cache, while uninitialized negative values still get the default of 2048
applied.

In `@internal/observability/aggregate.go`:
- Line 21: The doc comment for the NewMetrics function on line 21 is incomplete
and does not mention that the function also creates and returns CacheMetrics.
Update the comment to include CacheMetrics in the list of metrics that
NewMetrics creates, ensuring the documentation accurately reflects what the
function actually returns based on the implementation at lines 49-52.

In `@internal/repository/feedback_records_repository.go`:
- Around line 193-224: The query in this method loads all eligible translation
backfill targets into memory at once, which can cause excessive memory
consumption on large tenants. Add a LIMIT clause to the SQL query to process
results in bounded batches, and modify the method to implement pagination by
adding an offset parameter or cursor-based pagination so callers can iterate
through results in smaller chunks rather than loading everything at once. The
targets slice accumulation in the for loop should reflect this batched approach.
- Around line 155-170: The SetTranslation method in FeedbackRecordsRepository
currently does not validate that langKey is non-empty when a translated value is
being set. Add a validation guard that checks if translated is non-nil and
ensures langKey is non-empty before proceeding with the database operation. If
langKey is empty when translated is provided, return an appropriate error to
prevent writing inconsistent translation metadata to the database.

In `@internal/service/feedback_records_service.go`:
- Around line 354-360: The BackfillTranslations method in FeedbackRecordsService
currently loads all translation targets into memory at once by calling
ListTranslationBackfillTargets, which causes O(n) memory usage and process
stalling for large datasets. Refactor the repository contract to support
cursor-based pagination with a batch limit (e.g., using id > cursor LIMIT
batchSize pattern), then update BackfillTranslations to loop through targets
batch-by-batch, enqueueing jobs for each batch before fetching the next batch.
This streaming approach will process data incrementally without materializing
the entire result set in memory.
- Around line 131-139: In the SetTranslation method of FeedbackRecordsService,
add validation to ensure that langKey is not empty before allowing the
translation to be persisted. Before calling s.repo.SetTranslation, check if
langKey is empty and return an appropriate error if it is, preventing
inconsistent translation states where a translation exists without a valid
language key. This validation should occur at the beginning of the method,
before any repository calls.

In `@internal/service/translation_client_factory_test.go`:
- Around line 9-46: The test table in TestValidateTranslationConfig is missing
regression test cases for whitespace-only inputs. Add three new test cases to
the tests slice to verify that whitespace-only strings in ProviderAPIKey,
GoogleCloudProject, and GoogleCloudLocation fields are properly rejected. For
each field, create a test case that sets the relevant field to a whitespace-only
string (e.g., "   ") with a Provider that would normally require that field, and
set wantErr to the appropriate validation error constant to ensure the
validation catches these edge cases.

In `@internal/service/translation_client_factory.go`:
- Around line 165-174: Trim whitespace from configuration fields before
validation to fail fast on misconfigured values. In the first validation check
for RequiresAPIKey, trim cfg.ProviderAPIKey before comparing it to an empty
string. In the second validation check for RequiresGoogleGeminiConfig, trim both
cfg.GoogleCloudProject and cfg.GoogleCloudLocation before comparing them to
empty strings. This ensures that whitespace-only values are caught during
startup validation rather than failing later at runtime.

In `@internal/workers/feedback_translation_test.go`:
- Around line 205-212: The test TestFeedbackTranslationWorker_NotFoundCompletes
only verifies that the Work method returns nil when a not-found error occurs,
but it does not validate the metrics emitted in this scenario. Add a metrics
assertion to verify that the skipped metric (not failed_final) is recorded when
the worker encounters a not-found error during the get-record operation. This
requires passing a metrics mock or spy to the worker constructor and asserting
on the recorded metrics after calling Work.

In `@internal/workers/feedback_translation.go`:
- Around line 58-71: The issue is that failure metrics (RecordWorkerError,
RecordTranslationOutcome with "failed_final", and RecordTranslationDuration with
"failed_final") are recorded before checking if the error is a not-found error.
When a GetFeedbackRecord call returns ErrNotFound, this is a benign case that
should not be recorded as a terminal failure. Move the check for errors.Is(err,
huberrors.ErrNotFound) before recording any failure metrics, so that not-found
errors are handled separately without triggering the failed_final metrics
recording. Only record the failure metrics when the error is not a not-found
error.

In `@migrations/013_add_feedback_record_translation.sql`:
- Line 8: The translation_lang_key column in the ALTER TABLE feedback_records
statement is defined as VARCHAR(35), which is too restrictive for valid BCP-47
locale tags that include extensions or private-use subtags, potentially causing
write failures at runtime. Increase the column length by changing VARCHAR(35) to
VARCHAR(63) or TEXT to accommodate the full range of valid locale tag formats.

In `@tests/feedback_translation_test.go`:
- Around line 72-77: The test is currently only asserting that
ValueTextTranslated is cleared to nil when SetTranslation is called with nil,
but it does not verify that TranslationLangKey is also cleared. Add an assertion
after the existing assert.Nil call for cleared.ValueTextTranslated to also
assert that cleared.TranslationLangKey is nil, ensuring the test validates the
complete clear contract. Apply the same fix to the other clear-path test
mentioned in the comment (around lines 250-253).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 37f4d97c-a480-41e1-913f-311315c0145b

📥 Commits

Reviewing files that changed from the base of the PR and between 2e9d171 and 315b959.

📒 Files selected for processing (35)
  • .env.example
  • cmd/api/app.go
  • cmd/backfill-translations/main.go
  • cmd/worker/app.go
  • internal/config/config.go
  • internal/googleai/client.go
  • internal/models/feedback_records.go
  • internal/observability/aggregate.go
  • internal/observability/cache.go
  • internal/observability/names.go
  • internal/observability/translation.go
  • internal/observability/translation_test.go
  • internal/openai/client.go
  • internal/repository/feedback_records_repository.go
  • internal/repository/taxonomy_repository.go
  • internal/service/embedding_job_args.go
  • internal/service/embedding_provider.go
  • internal/service/feedback_records_service.go
  • internal/service/feedback_records_service_test.go
  • internal/service/job_inserter.go
  • internal/service/tenant_settings_cache.go
  • internal/service/tenant_settings_cache_test.go
  • internal/service/translation_client.go
  • internal/service/translation_client_factory.go
  • internal/service/translation_client_factory_test.go
  • internal/service/translation_client_test.go
  • internal/service/translation_job_args.go
  • internal/service/translation_provider.go
  • internal/service/translation_provider_test.go
  • internal/workers/feedback_translation.go
  • internal/workers/feedback_translation_test.go
  • internal/workers/wiring.go
  • migrations/013_add_feedback_record_translation.sql
  • openapi.yaml
  • tests/feedback_translation_test.go
💤 Files with no reviewable changes (1)
  • internal/service/embedding_job_args.go

Comment thread cmd/api/app.go
Comment thread cmd/backfill-translations/main.go
Comment thread internal/config/config.go Outdated
Comment thread internal/observability/aggregate.go Outdated
Comment thread internal/repository/feedback_records_repository.go
Comment thread internal/service/translation_client_factory.go
Comment thread internal/workers/feedback_translation_test.go
Comment thread internal/workers/feedback_translation.go Outdated
Comment thread migrations/013_add_feedback_record_translation.sql
Comment thread tests/feedback_translation_test.go
…y guard

Worker: a not-found GetFeedbackRecord is a benign delete/purge race, not a terminal
failure. Record it as "skipped" (consistent with the not-found-on-write path) instead
of "failed_final" plus a worker error, so it no longer trips failure alerts.

Config: honor an explicit TENANT_SETTINGS_CACHE_SIZE=0 to disable the cache (the
documented behavior; NewCachedTenantSettings already treats size <= 0 as "no caching").
The previous `<= 0` default reset 0 to 2048, so disable could never take effect — now
the size is defaulted only when the env var is unset.

Service: SetTranslation rejects a (translated, "") pair via ErrTranslationLangKeyRequired
so a translation can never persist without the locale it was produced in; clearing (nil
translation) still passes through.

Also: clear-path integration tests now assert translation_lang_key is nulled too, and the
NewMetrics doc comment lists CacheMetrics.

ENG-1255
@xernobyl

Copy link
Copy Markdown
Contributor Author

Thanks for the thorough pass. Addressed in eda00c4. Summary of what changed and what I intentionally left for parity with the embedding pipeline.

Fixed

  • Worker: not-found GET no longer recorded as failed_final. A not-found GetFeedbackRecord is a benign delete/tenant-purge race, not a terminal failure — and it was inconsistent with the not-found-on-write path (which already records skipped). The check now runs before the failure metrics and records skipped (no worker error), so it won't trip failure alerts. Added a metrics assertion to TestFeedbackTranslationWorker_NotFoundCompletes.
  • Config: TENANT_SETTINGS_CACHE_SIZE=0 now disables the cache. NewCachedTenantSettings already treats size <= 0 as "no caching", but if Size <= 0 { = 2048 } reset an explicit 0, so the documented disable could never take effect. The size is now defaulted only when the env var is unset (os.LookupEnv). Added a config test.
  • Service: SetTranslation rejects a (translated, "") pair via a new ErrTranslationLangKeyRequired so a translation can't persist without the locale it was produced in (clearing — nil translation — still passes through). Added at the service layer (domain boundary); the repository stays a thin mapper. Added a unit test.
  • Tests: clear-path assertions now also check translation_lang_key is nulled (both the repo and worker-pipeline integration tests).
  • Doc: NewMetrics comment now lists CacheMetrics.

Left as-is (deliberate parity with the embedding pipeline)

  • cmd/api translation queue MaxWorkers: 1 — the API process is insert-only (no river.Start); the embedding queue is configured identically (same file). Worker concurrency is set in cmd/worker via cfg.Translation.MaxConcurrent. Changing it here would have no effect and would diverge from embeddings.
  • Backfill loads all targets into memoryBackfillEmbeddings does the same ([]uuid); the rows are tiny (uuid + short locale) and this is a one-off CLI. Tracked as a follow-up to paginate both backfill commands together rather than diverge here.
  • No whitespace trim in ValidateTranslationConfigValidateEmbeddingConfig doesn't trim either, and a whitespace-only secret fails clearly at the first provider call. Kept the sibling validators consistent.
  • translation_lang_key VARCHAR(35) — deliberately matches maxTargetLanguageLen = 35, the validated cap on target_language in tenant settings (the only source of this column). It's DB-layer enforcement of the same bound; a value too long to be a valid target_language can never reach here.

Follow-ups (separate)

  • The embedding worker has the same not-found-at-GET classification (feedback_embedding.go) — will fix off main.
  • Cursor-based pagination for both backfill commands.

xernobyl added 2 commits June 23, 2026 15:24
Changing a tenant's target_language previously had no effect on existing feedback
records — only newly created/edited records were translated, and refreshing the
backlog required the global backfill CLI. A settings change now automatically
triggers a per-tenant re-translation backfill.

Design (clean separation; generalizes to future enrichment settings):
- TenantSettingsService gains a translation-free SettingsChangeListener port and
  fires it after a successful write with the keys that changed (PUT: all settable
  keys; PATCH: keys present, incl. a null removal; skipped on a no-op PATCH). A
  listener issue never fails the settings write.
- EnrichmentSettingsListener (adapter) maps a changed key to the enrichment backfill
  it triggers via a config-built handler map: target_language enqueues a durable
  TenantTranslationBackfillArgs job. Not routed through the webhook-coupled, lossy
  event bus. Adding a future setting (e.g. sentiment_enabled) is one map entry.
- TenantTranslationBackfillWorker keyset-paginates the tenant's stale records and
  enqueues the existing per-record FeedbackTranslationArgs jobs, off the request
  path. Unique by TenantID (one in-flight backfill per tenant); crash-safe via the
  idempotent "translation_lang_key IS DISTINCT FROM target_language" query.
- Repo ListTranslationBackfillTargetsForTenant (keyset, shares SQL with the global
  query) + service BackfillTranslationsForTenant.

Clearing target_language leaves existing translations in place (the query's
non-empty guard makes the triggered backfill a no-op). The global backfill CLI
remains the guaranteed recovery path.

ENG-1255
BackfillTranslations materialized every eligible target across all tenants in one
slice; on a large deployment that is a memory spike when the CLI runs. It now keyset-
paginates (id > cursor LIMIT n) like the per-tenant backfill, with the shared loop
factored into backfillTranslationsPaged so both paths stream identically.

ENG-1255
A provider 429 was treated as a generic error, so the translation worker
burned through River's retry attempts on the fast backoff and dropped the
work as failed_final, ignoring the provider's Retry-After / RetryInfo hint.

The openai and google clients now classify a 429 (RESOURCE_EXHAUSTED) as a
shared huberrors.RateLimitError carrying the provider's retry-after hint, and
the worker snoozes for that delay (river.JobSnooze) instead of failing.
Snoozing re-queues without consuming an attempt, so a burst against a
rate-limited model defers rather than drops work. The delay is clamped
(5s-5min, default 30s) and a per-job window (1h) bounds indefinite snoozing
against a standing quota; past it the job fails normally and a backfill
recovers it. A rate_limited worker-error metric records the deferral.
@xernobyl xernobyl marked this pull request as ready for review June 23, 2026 17:14
Comment thread internal/workers/feedback_translation.go
…tion

A feedback translation job carries the tenant's target language captured at
enqueue time. If the target changed (or was read from a stale settings cache),
an out-of-order older job could finish after a newer-target job and overwrite
value_text_translated / translation_lang_key with the stale target.

SetTranslation now persists only while the tenant's current target_language
still matches the job's target (atomic UPDATE ... FROM tenant_settings); a
stale write matches no row and returns huberrors.ErrTranslationSuperseded,
which the worker records as a benign skip. The clear path stays unconditional.

Adds a worker unit test, an end-to-end worker/repo regression test, and a
distinct "superseded" worker-error metric label.
@xernobyl xernobyl merged commit 4b0575d into feat/ENG-1254_tenant-settings-hub-language-enrichment Jun 25, 2026
12 checks passed
@xernobyl xernobyl deleted the feat/ENG-1255_translate-feedback-records branch June 25, 2026 10:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants