fix(gateway): deduplicate PlatformConn to prevent content multiplication#12
Closed
hrygo wants to merge 1 commit into
Closed
fix(gateway): deduplicate PlatformConn to prevent content multiplication#12hrygo wants to merge 1 commit into
hrygo wants to merge 1 commit into
Conversation
…content multiplication Bridge.Handle calls JoinPlatformSession on every message, creating a new pcEntry wrapper each time. Since each pcEntry is a distinct map key, N messages result in N copies of every event being routed to the platform connection (2x, 4x, 8x... duplication). Additionally, when a worker finishes quickly, the done event can arrive while EnsureCard is still in PhaseCreating. Close() tried to transition to PhaseCompleted but this was not allowed from PhaseCreating, so the final content flush was silently dropped. Root causes: - JoinPlatformSession did not deduplicate identical PlatformConns - PhaseCreating did not allow transition to PhaseCompleted Fixes #11
hrygo
pushed a commit
that referenced
this pull request
May 26, 2026
WARN #5/#16 — Add migrations-postgres/README.md explaining gaps (003=SQLite PRAGMAs, 008=SQLite event store optimize — PG-only skip) WARN #24 — Strip trailing semicolon before appending RETURNING id in eventstore turns.insert PG rebind (prevents syntax error) WARN #12 — Update env.example and config.yaml DSN examples from sslmode=disable → sslmode=prefer
hrygo
added a commit
that referenced
this pull request
May 27, 2026
* ✨ feat(db): add PostgreSQL dual-database support via dbutil.Dialect abstraction Add opt-in PostgreSQL backend while preserving SQLite as the default. A thin dbutil.Dialect layer (5 methods, 120-line Rebind state machine) isolates all SQL dialect differences — no ORM, no existing interface changes. Architecture: - internal/dbutil/ — Dialect type + Rebind($1..$N) state machine + DB wrapper - DBConfig split into Driver + SQLiteConfig + PostgresConfig sub-structs - sqlutil.WriteMu becomes no-op on PG (MVCC handles concurrency natively) - 9 PG migration files in migrations-postgres/ alongside SQLite originals - 5 PG Store implementations (session/cron/eventstore/chat_access/api_key) - gateway_run.go branches on db.driver: "sqlite" | "postgres" Key design decisions: - Dialect is a string constant type, not interface - Rebind uses 6-state automaton handling string literals, quotes, $$, comments - Existing Store interfaces zero-change - Phase 0 extracted 3 missing interfaces: ChatAccessStorer, APIKeyUserStorer, DBExecutor Stats: 46 files, +2103/-71, go build clean, 33/34 test suites pass Closes: #487 * 📝 docs: improve AGENTS.md coverage and remove stale line counts - Create AGENTS.md for internal/cron/ (timer engine, 3 schedule types, YAML import, backoff retry, attached session dispatch) - Create AGENTS.md for internal/dbutil/ (WriteMu serialization, PRAGMA tuning, dialect abstraction, rebind) - Add missing bot_registry.go and config.go to messaging/AGENTS.md - Remove line counts from STRUCTURE sections across all subdirectory AGENTS.md files to prevent staleness (7 files affected) * 🐛 fix(db): fix 5 issues in PostgreSQL dual-database support - Fix nil interface trap in akStore constructors (typed nil pointer stored in interface caused panic in nil checks) - Eliminate double PG connection in NewPGStore (accept shared *dbutil.DB instead of opening its own) - Wire PG admin store through DI (export NewAPIKeyUserPGStore, pass via GatewayDeps/APIKeyStore) - Fix openPostgres DSN source (use cfg.DSN() instead of cfg.Path) and honor Postgres.MaxOpenConns config - Fix turns table success column type from INTEGER to BOOLEAN * 🐛 fix(db): address 6 code review findings F1 - Prevent double-close in gatewayStores: make PGStore.Close() a no-op (gatewayStores.close() already handles s.db.Close()) F2 - Fix Validate() for PG-only configs: guard db.path check with driver=sqlite gate, check both legacy Path and SQLite.Path F3 - Fix BeginTx context cancellation: remove defer cancel() that violated database/sql contract for PG eventstore transactions F4 - Fix SQLite init: use cfg.SQLite.Path with cfg.Path fallback in dbutil/openSQLite() and session/stores.go F5 - Fix CLI OpenStore: branch on db.driver to support PostgreSQL cron commands (client.go, cron_cmd.go, cron_history.go) F6 - Fix migration 010: replace DROP TABLE IF EXISTS with CREATE TABLE IF NOT EXISTS pattern Also: add jackc/pgx/v5 stdlib import to sqlutil/driver.go update AGENTS.md PostgreSQL status line update config tests for structured DBConfig validation * 🐛 fix(db): address PR #490 review blocking issues Fix 2 ship-blocking issues from hotplex-ai review: 1. nil cache invalidator — APIKeyUserPGStore now exported and accepts dbResolver via SetInvalidator(); gateway_run.go wires it after init 2. apikey_pg_store create() hardcoded $N — uses dialect.Rebind() now Plus 3 WARN fixes: - Remove dead code var _ = (*sql.DB)(nil) from dialect.go - Wrap error in apikey_pg_store get() with context message - Export apiKeyUserPGStore → APIKeyUserPGStore * 🎨 style(db): address PR #490 WARN items - Use testify/require in rebind_test.go (was t.Errorf) — wraps table-driven tests in t.Run + require.Equal/require.True - Add t.Parallel() to all db_test.go test functions - Add TestDialectConstantsSync — compile-time check that dbutil and sqlutil dialect constants match - Update session/AGENTS.md: pgstore.go stub → pg_store.go full PG impl - Merge dual switch in migrate.go into single switch - Add ConnMaxLifetime(5min) + PingContext validation on PG pool open - Log warning when using default DSN (sslmode=disable) - Remove dead code var _ = (*sql.DB)(nil) from dialect.go * 🐛 fix(brain): remove dead nil checks in extractor_test.go NewClaudeCodeExtractor() and NewOpenCodeExtractor() always allocate and return non-nil pointers. The nil checks triggered SA5011 false positives in staticcheck. Remove the dead nil-check branch and unused extractor variable. * 🎨 style(db): address PR #490 round-3 WARN items W2 - Wrap errors in session PGStore GetExpiredMaxLifetime/GetExpiredIdle with fmt.Errorf (was raw err, inconsistent with DeleteTerminated) W3 - Add t.Parallel() to all 9 test functions in rebind_test.go (pure string functions, safe to parallelize) W4 - Extract DBConfig.EffectiveSQLitePath() to eliminate legacy path fallback duplication in dbutil/db.go + session/stores.go W7 - Set ConnMaxIdleTime(5min) in openPostgres alongside ConnMaxLifetime (was infinite → stale connections on PG restart) W12 - Add sync.Once to APIKeyUserPGStore.SetInvalidator() (prevents data race on hot-reload) * 🐳 feat(docker): add PostgreSQL init + multi-DB Docker Compose setup - Add docker/postgres-init.sql — creates hotplex DB and pgcrypto extension - Add docker/docker-entrypoint.sh — dual-mode entrypoint (gateway or cron) - Update Dockerfile — multi-stage build, pgx driver, healthcheck - Update docker-compose.yml — postgres service, healthcheck, env vars - Update docker-compose.prod.yml — production PG config with volume - Update configs/env.example — add PG DSN and db.driver examples - Update .dockerignore — exclude sql files from build context * 🔒 fix(db): address PR #490 5th-round MUST FIX items MUST FIX 1 — Default PG DSN sslmode=disable → sslmode=prefer - PostgresConfig.DSN() default now uses sslmode=prefer (was =disable) - Eliminate duplicate default DSN in dbutil/db.go openPostgres (cfg.DSN() already provides the default; the hasDefaultDSN check was dead code since DSN() never returned empty) - Detect default via cfg.Postgres.ConnStr == "" instead MUST FIX 2 — CLAUDE.md documentation sync - Session: add pg_store.go (PostgreSQL persistence) - sqlutil/: mention jackc/pgx/v5 PG driver + WriteMu PG no-op - Add new dbutil/ entry (Dialect, Rebind, BoolValue, DB wrapper) to support module list WARN — SetInvalidator via type assertion - Add SetInvalidator() to APIKeyUserStorer interface - Implement on both SQLite and PG stores - Replace type assertion in gateway_run.go with interface call * 🐳 feat(docker): add dedicated PG compose file + refine Docker configs - Add docker-compose.pg.yml — PostgreSQL-only stack for testing - Refine Dockerfile, docker-compose.yml, docker-compose.prod.yml - Update docker-entrypoint.sh for PG DSN env injection * ✅ test(db): add sqlmock tests for all 5 PG stores (round 6 MF1) MF1 - 5 PG store test files, 24 test cases, zero → covered New files: - internal/session/pg_store_test.go (6 tests) - internal/cron/pg_store_test.go (6 tests) - internal/eventstore/pg_store_test.go (3 tests) - internal/admin/apikey_pg_store_test.go (4 tests) - internal/messaging/chat_access_pg_store_test.go (5 tests) MF3 - Remove root user override from docker-compose.yml (Dockerfile already uses USER hotplex + COPY --chown) Pattern: go-sqlmock + testify/require + t.Parallel() + regexp.QuoteMeta Coverage: success paths + error paths (NotExist, duplicate, ErrNoRows) * chore: update Dockerfile and docker-compose.prod.yml * chore: update Dockerfile and docker-compose.prod.yml * 🔧 fix(db): address PR #490 WARN items (round 6) WARN #5/#16 — Add migrations-postgres/README.md explaining gaps (003=SQLite PRAGMAs, 008=SQLite event store optimize — PG-only skip) WARN #24 — Strip trailing semicolon before appending RETURNING id in eventstore turns.insert PG rebind (prevents syntax error) WARN #12 — Update env.example and config.yaml DSN examples from sslmode=disable → sslmode=prefer * fix(db): fix PostgreSQL int4 overflow, banner display, and config binding - Fix INTEGER→BIGINT for timestamp columns in PG migrations (002,005,007,009) - Add migration 012 to ALTER existing tables with BIGINT timestamps - Fix startup banner to show "PostgreSQL" instead of SQLite path when using PG - Add BindEnv for db.driver and db.postgres.* config fields - Fix ConnMaxIdleTime from 5min to 3min - Add EffectiveMaxOpenConns bridge method - Add Makefile dev-pg target for PostgreSQL dev environment - Fix docker-compose security and PG config issues * feat(db): add db-stats skill manual with go:embed integration Add database awareness and statistics analysis manual (db-stats-skill-manual.md): - 4-step database detection: process → env vars (incl. MAKEFLAGS) → .env → config - Complete schema reference for all 6 tables (SQLite/PG type differences) - 9 categories of analytics SQL templates with index-aware optimization - Fix 7 SQL issues found in audit: index prevention, sort order, PG BOOLEAN, JOIN optimization Integrate via go:embed pattern (matching cron skill manual): - internal/dbutil/skill.go: embed + SkillManual() - gateway_run.go: release to ~/.hotplex/skills/db-stats.md on startup - META-COGNITION.md: §8 B-channel directive for mandatory pre-read * refactor: streamline db-stats META-COGNITION entry and add conflict rule * refactor: scope db-stats directive to HotPlex operational data only * fix: address PR #490 review items — security, correctness, fail-fast - P0: Remove POSTGRES_ vars from envsubst to prevent password leaking into on-disk config - IsUniqueViolation: replace fragile string matching with pgx type assertion (errors.As) - apiKeyUserStore: add SQLite-only doc comment, fix LastInsertId error handling - openPostgres: fail-fast when DSN is empty instead of using insecure default - pgStore SetInvalidator: replace sync.Once with mutex to avoid silently dropping updates * fix: address PR #490 round-7 review — correctness, security, dead code - Fix env var name: HOTPLEX_DB_POSTGRES_CONNSTR → HOTPLEX_DB_POSTGRES_DSN - Add mutex to apiKeyUserStore (parity with pgStore, prevents data race) - Wrap pgStore.create error with fmt.Errorf for consistency - Add Effective* bridge methods for all SQLite pragma config fields - pragma.go: use Effective* methods instead of flat legacy fields - envsubst: replace prefix grep with explicit allowlist (YAML injection fix) - Remove dead openPostgresDB from sqlutil (dbutil.Open is the PG path) - Add writeMu nil comment for PG path in gatewayStores - Config.Validate: add PG DSN required check when driver=postgres - .dockerignore: restore configs/*.yaml exclusion, keep config.yaml * fix: remove redundant ON CONFLICT SET id in UpsertByName PG preserves conflict row columns not in SET clause; id = cron_jobs.id was a no-op. state and created_at kept as explicit runtime-state guards. * fix: address PR #490 round-8 review — writeMu serialization, CLI writeMu, migration safety, DSN cleanup P1: apiKeyUserStore write/create/update/delete now wrapped with writeMu.WithLock() for SQLite serialization P1: CLI cron path creates writeMu instead of passing nil to session/cron/event stores P1: PG migration 009 adds IF NOT EXISTS for idempotent re-runs P2: PostgresConfig.DSN() returns empty string when unconfigured instead of misleading default P3: EffectiveWALMode zero-value ambiguity documented * fix: address PR #490 round-9 review — PG BOOLEAN scan, CLI migrations, envsubst cleanup P1: Add scanJobRowPG for PostgreSQL BOOLEAN→bool scanning (pgx returns bool, not int) P1: CLI OpenStore PG path now runs goose migrations before creating stores P2: Remove HOTPLEX_DB_POSTGRES_DSN from envsubst allowlist (Viper handles it) P2: NewWriteMu empty dialect defaults to SQLite for consistent nil-safe behavior * fix: migration 012 Down path — prevent BIGINT truncation to INTEGER The Down migration used TYPE INTEGER which would silently truncate Unix ms timestamps (~1.7×10¹² exceeds int4 max). Replaced with no-op since reverting to INTEGER is unsafe. * fix: address PR #490 round-10 review — wrap all bare return nil, err with fmt.Errorf P2: Add fmt.Errorf("...: %w", err) context to 8 bare error returns across dbutil, cron, eventstore, and session PG stores per CLAUDE.md error convention. --------- Co-authored-by: 黄飞虹 <aaronwong1989@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #11 — Feishu streaming card content duplication bug where responses appear 2x, 4x, 8x... multiplied after N messages.
Root Causes
JoinPlatformSessioncreates duplicatepcEntrywrappers —Bridge.HandlecallsJoinPlatformSessionon every incoming message, but each call creates a new&pcEntry{pc: pc}map entry. Since eachpcEntryis a distinct pointer (map key), N messages result in N event routing entries, causing N-fold content duplication.Close()silently dropped duringPhaseCreating— When a worker finishes quickly, thedoneevent arrives whileEnsureCardis still inPhaseCreating(doing HTTP round-trips).Close()tries to transition toPhaseCompleted, but this transition was not allowed fromPhaseCreating, so the final content flush was lost.Changes
internal/gateway/hub.go: Added deduplication check inJoinPlatformSession— iterates existing session entries to find an existingpcEntrywrapping the samePlatformConn, returning early if found.internal/messaging/feishu/streaming.go:PhaseCompletedtoPhaseCreating's allowed transitions in the phase state machineEnsureCard: after the card message is sent, ifClose()was already called (phase isPhaseCompleted), immediately flush optimized content via IM patch and returnTest Plan
make test-short— all packages passmake lint— 0 issues