feat(reader): add Reader Simulation Agent module by JamesGoslings · Pull Request #135 · shenminglinyi/PlotPilot

JamesGoslings · 2026-04-20T14:03:08Z

Related to #75

实现了追读力引擎的第一阶段：读者模拟 Agent，
包含三类读者视角评估、四维度评分、钩子强度分析、劝退风险告警。
硬约束/软约束分级和债务机制将在后续 PR 中实现。

Simulate three reader personas (hardcore fan, casual reader, nitpicker)
evaluating each chapter across four dimensions:

Suspense retention (悬疑保持度)
Thrill score (爽感评分)
Churn risk (劝退风险)
Emotional resonance (情感共鸣度)

New files:

application/reader/ - Core service, schema, DTOs
infrastructure/persistence/database/reader_simulation_repository.py
infrastructure/persistence/database/migrations/add_reader_simulations.sql
interfaces/api/v1/reader/ - REST API (4 endpoints)
tests/unit/application/services/test_reader_simulation_service.py

API endpoints:

POST /api/v1/reader/novels/{id}/chapters/{n}/simulate
GET /api/v1/reader/novels/{id}/chapters/{n}/simulation
GET /api/v1/reader/novels/{id}/simulations
GET /api/v1/reader/novels/{id}/churn-alerts

变更类型

feat 新功能

变更说明

新增读者模拟 Agent 模块，通过 LLM 模拟三类读者（硬核粉/休闲读者/挑刺党）对章节进行多维度评估，为作者提供读者视角的质量反馈。该功能是 #75 追读力引擎的基础能力，后续可在此之上叠加硬约束/软约束分级和债务机制。

架构影响

涉及层级：application / infrastructure / interfaces
是否新增数据库表/字段：是（新增 reader_simulations 表，migration: add_reader_simulations.sql，首次调用 API 时自动建表）
是否修改现有 API 契约（路径/字段/类型变更）：否（仅新增 /api/v1/reader/ 前缀下的 4 个端点）

测试

pytest tests/unit/application/services/test_reader_simulation_service.py -v
# 17 passed — 覆盖 Schema 归一化、人设映射、DTO 序列化、平均分计算

# API 冒烟测试（需启动后端 + 配置 LLM 密钥）
curl -X POST http://127.0.0.1:8005/api/v1/reader/novels/{novel_id}/chapters/1/simulate
# 返回 {"success": true, "data": {...}} 结构正确


<!-- This is an auto-generated comment: release notes by coderabbit.ai -->
## Summary by CodeRabbit

* **New Features**
  * Reader simulation: multi‑persona chapter evaluations with aggregated scores, readability, hook strength, pacing, and churn‑risk alerts
  * New API endpoints to run simulations, fetch per‑chapter results, list simulations, and query churn alerts
  * Persistence: stored simulation records with indexes and retrieval/listing utilities

* **Tests**
  * Unit tests for schemas, DTOs, report serialization, persona normalization and score clamping

* **Documentation**
  * Package initializer docstring describing reader‑simulation concepts (module‑level)
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

coderabbitai · 2026-04-20T14:04:50Z

📝 Walkthrough

Walkthrough

Adds a reader-simulation subsystem: Pydantic schemas and DTOs, an async service that queries chapters and an LLM, DB migration and repository for persistence, new FastAPI routes to run/list simulations and churn alerts, and unit tests for schemas/DTOs.

Changes

Cohort / File(s)	Summary
Module init & DTOs `application/reader/__init__.py`, `application/reader/dtos/reader_feedback_dto.py`	Adds package initializer and DTOs (`ReaderDimensionScoresDTO`, `ReaderFeedbackDTO`, `ChapterReaderReportDTO`) with `to_dict()` serialization, persona label mapping, avg-score aggregation, and fallback handling.
Schemas & Validation `application/reader/schema.py`	Adds Pydantic models (`ReaderDimensionScores`, `SingleReaderFeedbackPayload`, `ReaderSimulationLlmPayload`) with validators to normalize personas, clamp numeric fields to 0–100, and normalize hook strength/readability.
Service / Business Logic `application/reader/services/reader_simulation_service.py`	Adds `ReaderSimulationService`: loads chapter context, excerpts content, builds LLM prompt for 3 personas, calls and sanitizes LLM JSON, validates payload, maps to DTOs, ensures all personas present, and returns `ChapterReaderReportDTO` (with `_empty_report` fallback).
Persistence (migration + repo) `infrastructure/persistence/database/migrations/add_reader_simulations.sql`, `infrastructure/persistence/database/reader_simulation_repository.py`	Creates `reader_simulations` table and indexes; adds SQLite repository with `ensure_table()`, `save()`, `get_latest()`, `list_by_novel()`, and `get_high_churn_chapters()`.
API Routes & Integration `interfaces/api/v1/reader/__init__.py`, `interfaces/main.py`	Registers `/reader` router with endpoints to simulate, fetch chapter simulation, list simulations, and churn alerts; lazy-inits service/repo, persists results best-effort, and integrates router into main app.
Tests `tests/unit/application/services/test_reader_simulation_service.py`	Adds tests for schema coercion/clamping, persona normalization, payload parsing, DTO serialization/avg computations, and fallback report behavior.

Sequence Diagram

sequenceDiagram
    participant Client
    participant API as "API /reader"
    participant Service as "ReaderSimulationService"
    participant Repo as "ReaderSimulationRepository"
    participant DB as "Database"
    participant LLM as "LLM Client"

    Client->>API: POST /reader/novels/{id}/chapters/{n}/simulate
    API->>Service: simulate(novel_id, chapter_number)
    Service->>Repo: fetch chapter & context
    Repo->>DB: SELECT chapter / summaries
    DB-->>Repo: chapter data
    Repo-->>Service: context
    Service->>Service: build prompt (3 personas)
    Service->>LLM: generate(prompt)
    LLM-->>Service: raw JSON text
    Service->>Service: sanitize & parse JSON
    Service->>Service: validate schema -> DTOs
    Service-->>API: ChapterReaderReportDTO
    API->>Repo: save(report) (best-effort)
    Repo->>DB: INSERT reader_simulations
    DB-->>Repo: record_id
    API-->>Client: 200 { report / meta }

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Suggested reviewers

shenminglinyi

Poem

🐰
I nibble prompts at break of dawn,
Three readers murmur, then are gone,
Scores hop in rows, a JSON lawn,
Reports stitched neat from dusk to dawn,
Hooray — the chapter's carrot drawn! 🥕

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 37.50% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly summarizes the main addition: a new Reader Simulation Agent module for the reader-retention engine.
Description check	✅ Passed	The description is comprehensive and follows the template with all required sections: change type (feat), explanation, architecture impact, tests, and risk assessment.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 5

🧹 Nitpick comments (4)

infrastructure/persistence/database/reader_simulation_repository.py (2)

36-73: Consider datetime.now(timezone.utc) over deprecated utcnow().

datetime.utcnow() is deprecated in Python 3.12+. While it still functions, consider using datetime.now(timezone.utc) for forward compatibility.

♻️ Suggested change

+from datetime import datetime, timezone
 ...
-                datetime.utcnow().isoformat(),
+                datetime.now(timezone.utc).isoformat(),

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@infrastructure/persistence/database/reader_simulation_repository.py` around
lines 36 - 73, The save method in reader_simulation_repository uses the
deprecated datetime.utcnow(); replace that call with datetime.now(timezone.utc)
to produce an aware UTC timestamp before inserting (update the import to include
timezone if needed) so in function save you should change
datetime.utcnow().isoformat() to datetime.now(timezone.utc).isoformat() and
ensure the module imports timezone from datetime.

109-120: Consider returning only the latest record per chapter for churn alerts.

Currently this returns all historical records above the threshold, which could include multiple records for the same chapter. If the intent is to show "which chapters currently have high churn risk," consider filtering to latest-per-chapter similar to list_by_novel().

♻️ Suggested query to return latest-per-chapter only

     def get_high_churn_chapters(
         self, novel_id: str, threshold: float = 60.0,
     ) -> List[Dict]:
         """获取劝退风险高的章节（用于告警）。"""
         db = self._get_db()
         cursor = db.execute(
-            """SELECT * FROM reader_simulations
-               WHERE novel_id = ? AND avg_churn_risk >= ?
-               ORDER BY avg_churn_risk DESC""",
-            (novel_id, threshold),
+            """SELECT rs.* FROM reader_simulations rs
+               INNER JOIN (
+                   SELECT novel_id, chapter_number, MAX(created_at) as max_created
+                   FROM reader_simulations
+                   WHERE novel_id = ?
+                   GROUP BY novel_id, chapter_number
+               ) latest ON rs.novel_id = latest.novel_id
+                       AND rs.chapter_number = latest.chapter_number
+                       AND rs.created_at = latest.max_created
+               WHERE rs.avg_churn_risk >= ?
+               ORDER BY rs.avg_churn_risk DESC""",
+            (novel_id, threshold),
         )
         return [dict(row) for row in cursor.fetchall()]

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@infrastructure/persistence/database/reader_simulation_repository.py` around
lines 109 - 120, The method get_high_churn_chapters currently returns all
historical rows above the threshold; change it to return only the latest record
per chapter (like list_by_novel does) by modifying the SQL to filter to the most
recent row per chapter_id (e.g., using a subquery or join that selects
MAX(created_at) per novel_id+chapter_id) while still applying novel_id and
avg_churn_risk >= threshold and ordering by avg_churn_risk DESC; update the
method get_high_churn_chapters accordingly so it returns one dict per chapter
(latest) instead of multiple historical records.

application/reader/services/reader_simulation_service.py (1)

97-105: Log the exception instead of silently passing.

Static analysis correctly flags the bare except: pass. While fetching prev_summary is optional, logging at DEBUG level aids troubleshooting.

♻️ Suggested change

         if self._knowledge_repo and prev_chapter:
             try:
                 knowledge = self._knowledge_repo.get_by_novel_id(novel_id)
                 if knowledge:
                     ch_sum = knowledge.get_chapter(chapter_number - 1)
                     if ch_sum:
                         prev_summary = ch_sum.summary or ""
-            except Exception:
-                pass
+            except Exception as e:
+                logger.debug("Failed to fetch prev chapter summary: %s", e)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@application/reader/services/reader_simulation_service.py` around lines 97 -
105, The bare except is swallowing errors when fetching prev_summary; replace it
with catching Exception as e and log the exception at DEBUG so failures are
visible. Update the block around self._knowledge_repo.get_by_novel_id(novel_id)
/ knowledge.get_chapter(chapter_number - 1) to use "except Exception as e:" and
call the existing logger (e.g., self._logger.debug(...) or module logger via
logging.getLogger(__name__).debug(...)) including context like
novel_id/chapter_number and the exception, while leaving prev_summary behavior
unchanged if an error occurs.

interfaces/api/v1/reader/__init__.py (1)

31-35: Narrow the knowledge-repo fallback and log the downgrade path.

The broad catch at Lines 34-35 masks unexpected runtime failures and silently drops knowledge context.

💡 Suggested refactor

         from interfaces.api.dependencies import get_chapter_repository, get_llm_service
         try:
             from interfaces.api.dependencies import get_knowledge_repository
             knowledge_repo = get_knowledge_repository()
-        except Exception:
+        except ImportError:
             knowledge_repo = None
+        except Exception as e:
+            logger.warning("知识库依赖初始化失败，降级为无知识库模式: %s", e)
+            knowledge_repo = None

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@interfaces/api/v1/reader/__init__.py` around lines 31 - 35, The current broad
except hides real errors when importing or calling get_knowledge_repository;
narrow the fallback by catching only expected exceptions (e.g., ImportError,
ModuleNotFoundError and the specific error(s) you expect from
get_knowledge_repository such as AttributeError or a custom
RepoNotAvailableError) and set knowledge_repo = None only in those cases, while
re-raising any other Exception; also emit a log entry when falling back (include
the exception message and context) so the downgrade path is visible—update the
import/call around get_knowledge_repository and the knowledge_repo assignment
accordingly and use the module logger (or logging.getLogger(__name__)) to record
the fallback with exception details.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@infrastructure/persistence/database/reader_simulation_repository.py`:
- Around line 25-34: The ensure_table method currently calls
db.executescript(sql) but DatabaseConnection does not expose executescript;
modify ensure_table to call executescript on the underlying sqlite3.Connection
by using self._get_db().get_connection().executescript(sql) (locate the
ensure_table function and replace the db.executescript call with
get_connection().executescript). Keep the existing migration_path read and
encoding logic, and ensure you still call get_connection().executescript only
when migration_path.exists().

In `@interfaces/api/v1/reader/__init__.py`:
- Around line 94-97: The handlers log the full exception and then interpolate
its text into the HTTP 500 response (e.g., logger.error(..., e) and raise
HTTPException(... detail=f"...{e}")), which leaks internals; instead, keep
detailed error information in server logs by using logger.exception(...) or
logger.error(..., exc_info=True) and raise HTTPException with a stable
client-facing message (e.g., detail="读者模拟分析失败") while chaining the original
exception (raise HTTPException(status_code=500, detail="读者模拟分析失败") from e).
Apply this change to all generic except Exception blocks that use the variable e
in this file (the blocks around the logger.error / HTTPException uses).
- Around line 183-191: The code assumes each item in feedbacks is a dict and
calls fb.get(...), which can raise when malformed JSON yields non-dict entries;
before iterating, validate that feedbacks is a list and inside the loop check
each fb with isinstance(fb, dict) (or skip/convert non-dicts to {}) and ensure
fb.get("pain_points", []) returns a list before extending all_pain_points;
update the block that assigns feedbacks (from feedbacks_raw) and the loop over
feedbacks to perform these type guards using the variables feedbacks_raw,
feedbacks, fb, and all_pain_points.
- Around line 75-97: The current code swallows persistence errors in the inner
try/except (repo = _get_repo(); repo.save(...)) by only logging via
logger.warning, causing the API (return {"success": True, "data":
report.to_dict()}) to report success even when save failed; change the inner
except to propagate the failure (either re-raise the exception or raise an
HTTPException) so callers see the error: locate the repo.save call and its
surrounding try/except in interfaces/api/v1/reader/__init__.py and replace the
logger.warning branch with code that raises the exception (or raises
HTTPException with a 500 and message) after logging, ensuring persistent save
failures are not treated as successful responses.
- Around line 47-53: The current _get_repo() sets the module-global _repo before
calling ReaderSimulationRepository.ensure_table(), so if ensure_table() raises
once subsequent calls will return the already-set _repo and never retry; to fix,
instantiate the repository into a local variable (e.g., repo =
ReaderSimulationRepository()), call repo.ensure_table() inside the try/except,
and only assign the module-global _repo = repo after ensure_table() succeeds (or
alternatively, on exception reset _repo to None) so future calls will retry
initialization; reference the symbols _get_repo, _repo, and
ReaderSimulationRepository.ensure_table when locating the change.

---

Nitpick comments:
In `@application/reader/services/reader_simulation_service.py`:
- Around line 97-105: The bare except is swallowing errors when fetching
prev_summary; replace it with catching Exception as e and log the exception at
DEBUG so failures are visible. Update the block around
self._knowledge_repo.get_by_novel_id(novel_id) /
knowledge.get_chapter(chapter_number - 1) to use "except Exception as e:" and
call the existing logger (e.g., self._logger.debug(...) or module logger via
logging.getLogger(__name__).debug(...)) including context like
novel_id/chapter_number and the exception, while leaving prev_summary behavior
unchanged if an error occurs.

In `@infrastructure/persistence/database/reader_simulation_repository.py`:
- Around line 36-73: The save method in reader_simulation_repository uses the
deprecated datetime.utcnow(); replace that call with datetime.now(timezone.utc)
to produce an aware UTC timestamp before inserting (update the import to include
timezone if needed) so in function save you should change
datetime.utcnow().isoformat() to datetime.now(timezone.utc).isoformat() and
ensure the module imports timezone from datetime.
- Around line 109-120: The method get_high_churn_chapters currently returns all
historical rows above the threshold; change it to return only the latest record
per chapter (like list_by_novel does) by modifying the SQL to filter to the most
recent row per chapter_id (e.g., using a subquery or join that selects
MAX(created_at) per novel_id+chapter_id) while still applying novel_id and
avg_churn_risk >= threshold and ordering by avg_churn_risk DESC; update the
method get_high_churn_chapters accordingly so it returns one dict per chapter
(latest) instead of multiple historical records.

In `@interfaces/api/v1/reader/__init__.py`:
- Around line 31-35: The current broad except hides real errors when importing
or calling get_knowledge_repository; narrow the fallback by catching only
expected exceptions (e.g., ImportError, ModuleNotFoundError and the specific
error(s) you expect from get_knowledge_repository such as AttributeError or a
custom RepoNotAvailableError) and set knowledge_repo = None only in those cases,
while re-raising any other Exception; also emit a log entry when falling back
(include the exception message and context) so the downgrade path is
visible—update the import/call around get_knowledge_repository and the
knowledge_repo assignment accordingly and use the module logger (or
logging.getLogger(__name__)) to record the fallback with exception details.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 7801d62d-09ec-41d5-8c9e-9967113767ce

📥 Commits

Reviewing files that changed from the base of the PR and between b909fea and 4b20d3f.

📒 Files selected for processing (11)

application/reader/__init__.py
application/reader/dtos/__init__.py
application/reader/dtos/reader_feedback_dto.py
application/reader/schema.py
application/reader/services/__init__.py
application/reader/services/reader_simulation_service.py
infrastructure/persistence/database/migrations/add_reader_simulations.sql
infrastructure/persistence/database/reader_simulation_repository.py
interfaces/api/v1/reader/__init__.py
interfaces/main.py
tests/unit/application/services/test_reader_simulation_service.py

coderabbitai · 2026-04-20T14:10:50Z

+    if _repo is None:
+        _repo = ReaderSimulationRepository()
+        try:
+            _repo.ensure_table()
+        except Exception as e:
+            logger.warning("读者模拟表初始化失败（首次使用时会自动重试）: %s", e)
+    return _repo


⚠️ Potential issue | 🟠 Major

_get_repo() won’t actually retry table initialization after first failure.

At Line 48, _repo is set before ensure_table(). If Line 50 fails once, future calls skip initialization entirely, contradicting the retry message at Line 52.

💡 Suggested fix

def _get_repo() -> ReaderSimulationRepository: global _repo if _repo is None: - _repo = ReaderSimulationRepository() + candidate = ReaderSimulationRepository() try: - _repo.ensure_table() + candidate.ensure_table() except Exception as e: logger.warning("读者模拟表初始化失败（首次使用时会自动重试）: %s", e) - return _repo + raise + _repo = candidate + return _repo

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if _repo is None:

_repo = ReaderSimulationRepository()

try:

_repo.ensure_table()

except Exception as e:

logger.warning("读者模拟表初始化失败（首次使用时会自动重试）: %s", e)

return _repo

if _repo is None:

candidate = ReaderSimulationRepository()

try:

candidate.ensure_table()

except Exception as e:

logger.warning("读者模拟表初始化失败（首次使用时会自动重试）: %s", e)

raise

_repo = candidate

return _repo

🧰 Tools

🪛 Ruff (0.15.10)

[warning] 51-51: Do not catch blind exception: Exception

(BLE001)

[warning] 52-52: String contains ambiguous （ (FULLWIDTH LEFT PARENTHESIS). Did you mean ( (LEFT PARENTHESIS)?

(RUF001)

[warning] 52-52: String contains ambiguous ） (FULLWIDTH RIGHT PARENTHESIS). Did you mean ) (RIGHT PARENTHESIS)?

(RUF001)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@interfaces/api/v1/reader/__init__.py` around lines 47 - 53, The current _get_repo() sets the module-global _repo before calling ReaderSimulationRepository.ensure_table(), so if ensure_table() raises once subsequent calls will return the already-set _repo and never retry; to fix, instantiate the repository into a local variable (e.g., repo = ReaderSimulationRepository()), call repo.ensure_table() inside the try/except, and only assign the module-global _repo = repo after ensure_table() succeeds (or alternatively, on exception reset _repo to None) so future calls will retry initialization; reference the symbols _get_repo, _repo, and ReaderSimulationRepository.ensure_table when locating the change.

coderabbitai · 2026-04-20T14:10:50Z

+        try:
+            repo = _get_repo()
+            repo.save(
+                novel_id=novel_id,
+                chapter_number=chapter_number,
+                overall_readability=report.overall_readability,
+                chapter_hook_strength=report.chapter_hook_strength,
+                pacing_verdict=report.pacing_verdict,
+                avg_scores=report._compute_avg_scores(),
+                feedbacks_json=json.dumps(
+                    [f.to_dict() for f in report.feedbacks],
+                    ensure_ascii=False,
+                ),
+            )
+        except Exception as e:
+            logger.warning("读者模拟结果持久化失败: %s", e)
+
+        return {"success": True, "data": report.to_dict()}
+
+    except Exception as e:
+        logger.error("读者模拟失败 novel=%s ch=%d: %s", novel_id, chapter_number, e)
+        raise HTTPException(status_code=500, detail=f"读者模拟分析失败: {e}")
+


⚠️ Potential issue | 🟠 Major

POST simulate reports success even when DB persistence fails.

At Lines 89-92, save failures are logged but ignored, so the API returns success with non-persisted data.

💡 Suggested fix

# 持久化 try: repo = _get_repo() repo.save( novel_id=novel_id, chapter_number=chapter_number, overall_readability=report.overall_readability, chapter_hook_strength=report.chapter_hook_strength, pacing_verdict=report.pacing_verdict, avg_scores=report._compute_avg_scores(), feedbacks_json=json.dumps( [f.to_dict() for f in report.feedbacks], ensure_ascii=False, ), ) except Exception as e: - logger.warning("读者模拟结果持久化失败: %s", e) + logger.error("读者模拟结果持久化失败 novel=%s ch=%d: %s", novel_id, chapter_number, e) + raise HTTPException(status_code=500, detail="读者模拟完成，但保存失败，请稍后重试") from e return {"success": True, "data": report.to_dict()} + except HTTPException: + raise except Exception as e: logger.error("读者模拟失败 novel=%s ch=%d: %s", novel_id, chapter_number, e) raise HTTPException(status_code=500, detail=f"读者模拟分析失败: {e}")

🧰 Tools

🪛 Ruff (0.15.10)

[warning] 89-89: Do not catch blind exception: Exception

(BLE001)

[warning] 94-94: Do not catch blind exception: Exception

(BLE001)

[warning] 96-96: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@interfaces/api/v1/reader/__init__.py` around lines 75 - 97, The current code swallows persistence errors in the inner try/except (repo = _get_repo(); repo.save(...)) by only logging via logger.warning, causing the API (return {"success": True, "data": report.to_dict()}) to report success even when save failed; change the inner except to propagate the failure (either re-raise the exception or raise an HTTPException) so callers see the error: locate the repo.save call and its surrounding try/except in interfaces/api/v1/reader/__init__.py and replace the logger.warning branch with code that raises the exception (or raises HTTPException with a 500 and message) after logging, ensuring persistent save failures are not treated as successful responses.

coderabbitai · 2026-04-20T14:10:50Z

+    except Exception as e:
+        logger.error("读者模拟失败 novel=%s ch=%d: %s", novel_id, chapter_number, e)
+        raise HTTPException(status_code=500, detail=f"读者模拟分析失败: {e}")
+


⚠️ Potential issue | 🟠 Major

Avoid exposing internal exception text in HTTP 500 responses.

Returning str(e) / interpolated {e} at Lines 96/138/168/208 can leak backend internals. Use a stable client message and chain the original exception.

💡 Suggested fix pattern

- except Exception as e: - logger.error("获取读者模拟记录失败: %s", e) - raise HTTPException(status_code=500, detail=str(e)) + except Exception as e: + logger.exception("获取读者模拟记录失败") + raise HTTPException(status_code=500, detail="服务器内部错误，请稍后重试") from e

Apply the same pattern to the other generic except Exception blocks in this file.

Also applies to: 136-138, 166-168, 206-208

🧰 Tools

🪛 Ruff (0.15.10)

[warning] 94-94: Do not catch blind exception: Exception

(BLE001)

[warning] 96-96: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@interfaces/api/v1/reader/__init__.py` around lines 94 - 97, The handlers log the full exception and then interpolate its text into the HTTP 500 response (e.g., logger.error(..., e) and raise HTTPException(... detail=f"...{e}")), which leaks internals; instead, keep detailed error information in server logs by using logger.exception(...) or logger.error(..., exc_info=True) and raise HTTPException with a stable client-facing message (e.g., detail="读者模拟分析失败") while chaining the original exception (raise HTTPException(status_code=500, detail="读者模拟分析失败") from e). Apply this change to all generic except Exception blocks that use the variable e in this file (the blocks around the logger.error / HTTPException uses).

coderabbitai · 2026-04-20T14:10:50Z

+            feedbacks_raw = r.get("feedbacks_json", "[]")
+            try:
+                feedbacks = json.loads(feedbacks_raw)
+            except (json.JSONDecodeError, TypeError):
+                feedbacks = []
+            # 提取痛点汇总
+            all_pain_points = []
+            for fb in feedbacks:
+                all_pain_points.extend(fb.get("pain_points", []))


⚠️ Potential issue | 🟡 Minor

Guard feedbacks_json shape before iterating pain points.

At Line 191, fb.get(...) assumes every item is a dict. Malformed-but-valid JSON can still trigger a 500 here.

💡 Suggested fix

for r in records: feedbacks_raw = r.get("feedbacks_json", "[]") try: - feedbacks = json.loads(feedbacks_raw) + parsed = json.loads(feedbacks_raw) except (json.JSONDecodeError, TypeError): - feedbacks = [] + parsed = [] + feedbacks = [fb for fb in parsed if isinstance(fb, dict)] if isinstance(parsed, list) else [] # 提取痛点汇总 all_pain_points = [] for fb in feedbacks: - all_pain_points.extend(fb.get("pain_points", [])) + pain_points = fb.get("pain_points", []) + if isinstance(pain_points, list): + all_pain_points.extend(pain_points)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@interfaces/api/v1/reader/__init__.py` around lines 183 - 191, The code assumes each item in feedbacks is a dict and calls fb.get(...), which can raise when malformed JSON yields non-dict entries; before iterating, validate that feedbacks is a list and inside the loop check each fb with isinstance(fb, dict) (or skip/convert non-dicts to {}) and ensure fb.get("pain_points", []) returns a list before extending all_pain_points; update the block that assigns feedbacks (from feedbacks_raw) and the loop over feedbacks to perform these type guards using the variables feedbacks_raw, feedbacks, fb, and all_pain_points.

shenminglinyi · 2026-04-25T11:04:33Z

感谢 @JamesGoslings 的贡献！

Reader Simulation Agent 模块设计优秀，完全符合项目 DDD 四层架构：

application/reader/ - 应用层服务 ✅
infrastructure/persistence/ - 基础设施层 ✅
interfaces/api/v1/reader/ - 接口层 ✅

追读力引擎是项目的核心功能方向，此 PR 价值很高。

但由于存在合并冲突，暂时无法合并。请作者：

git rebase origin/master 解决冲突
确保测试通过后通知维护者

我们会持续关注此 PR，期待合并！

再次感谢！

Simulate three reader personas (hardcore fan, casual reader, nitpicker) evaluating each chapter across four dimensions: - Suspense retention (悬疑保持度) - Thrill score (爽感评分) - Churn risk (劝退风险) - Emotional resonance (情感共鸣度) New files: - application/reader/ - Core service, schema, DTOs - infrastructure/persistence/database/reader_simulation_repository.py - infrastructure/persistence/database/migrations/add_reader_simulations.sql - interfaces/api/v1/reader/ - REST API (4 endpoints) - tests/unit/application/services/test_reader_simulation_service.py API endpoints: - POST /api/v1/reader/novels/{id}/chapters/{n}/simulate - GET /api/v1/reader/novels/{id}/chapters/{n}/simulation - GET /api/v1/reader/novels/{id}/simulations - GET /api/v1/reader/novels/{id}/churn-alerts

coderabbitai

Actionable comments posted: 4

♻️ Duplicate comments (1)

infrastructure/persistence/database/reader_simulation_repository.py (1)

25-34: ⚠️ Potential issue | 🔴 Critical

Call executescript() on the SQLite connection, not the wrapper.

At Line 34, _get_db() returns the DatabaseConnection wrapper from infrastructure/persistence/database/connection.py, and that type does not expose executescript(). This will raise AttributeError the first time table initialization runs. Use db.get_connection().executescript(sql) and commit on that connection instead.

Proposed fix

     def ensure_table(self) -> None:
         """确保表存在（首次调用时自动建表）。"""
         from pathlib import Path
         migration_path = (
             Path(__file__).parent / "migrations" / "add_reader_simulations.sql"
         )
         db = self._get_db()
         if migration_path.exists():
             sql = migration_path.read_text(encoding="utf-8")
-            db.executescript(sql)
+            conn = db.get_connection()
+            conn.executescript(sql)
+            conn.commit()

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@infrastructure/persistence/database/reader_simulation_repository.py` around
lines 25 - 34, ensure_table is calling executescript on the DatabaseConnection
wrapper returned by _get_db(), which lacks executescript; instead call
executescript on the underlying sqlite3 connection via
db.get_connection().executescript(sql) and then commit that connection (e.g.,
db.get_connection().commit()) to persist the migration; update ensure_table to
read the SQL, call executescript on db.get_connection(), and commit on that same
connection.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@application/reader/schema.py`:
- Around line 34-47: The shared validator clamp_score (decorated on
"suspense_retention", "thrill_score", "churn_risk", "emotional_resonance")
always returns 50.0 for None/invalid values, but churn_risk should use its
declared default 30.0; update the validation so churn_risk gets 30.0 on
None/invalid inputs—either by splitting validators (create a separate
clamp_churn_risk validator that returns 30.0 and apply it only to "churn_risk")
or by making clamp_score inspect the field name (via the validator context or a
separate decorated method) and return 30.0 when validating "churn_risk" while
keeping 50.0 for the others; adjust decorator targets accordingly so other
fields keep the 50.0 fallback.

In `@application/reader/services/reader_simulation_service.py`:
- Around line 279-305: The current loop copies all payload.feedbacks (including
duplicates and unknown personas) into feedbacks and then appends missing
defaults, which can produce >3 entries and skew
ChapterReaderReportDTO._compute_avg_scores(); instead, iterate only the
canonical persona keys ("hardcore","casual","nitpicker") and for each key find
the first matching fb in payload.feedbacks (ignore others), build a single
ReaderFeedbackDTO using that fb's fields (persona_label via PERSONA_LABELS,
scores via ReaderDimensionScoresDTO) or a default
ReaderDimensionScoresDTO/one_line_verdict when none found, thereby ensuring
exactly one feedback per canonical persona and preventing duplicates/unknown
personas from affecting averages.

In `@infrastructure/persistence/database/reader_simulation_repository.py`:
- Around line 109-118: get_high_churn_chapters currently returns every
historical row above the threshold, causing duplicate/stale alerts; change it to
only consider the latest simulation per chapter_number (same behavior as
list_by_novel) by filtering/joining on the max simulation_time (or max id if
used as monotonic) per (novel_id, chapter_number) before applying avg_churn_risk
>= threshold and ordering; update the SQL/query in get_high_churn_chapters to
select rows whose (novel_id, chapter_number) match the per-chapter latest
timestamp, then apply the threshold and ORDER BY avg_churn_risk DESC so only the
newest run per chapter can trigger an alert.

In `@interfaces/api/v1/reader/__init__.py`:
- Around line 67-92: The handler currently persists and returns success even
when ReaderSimulationService.simulate() returns a fallback empty DTO (e.g.,
_empty_report() for "chapter not found" or "empty content"), which creates fake
history; change the flow after calling _get_service().simulate(...) to detect
the fallback (by using an explicit flag or method on ChapterReaderReportDTO such
as is_empty(), or by comparing to _empty_report()) and in that case do NOT call
repo.save(...) and instead return an appropriate 4xx response (or raise an HTTP
error) indicating the simulation failed; only perform repo.save(...) and return
{"success": True, "data": report.to_dict()} when the report is a real analysis
result.

---

Duplicate comments:
In `@infrastructure/persistence/database/reader_simulation_repository.py`:
- Around line 25-34: ensure_table is calling executescript on the
DatabaseConnection wrapper returned by _get_db(), which lacks executescript;
instead call executescript on the underlying sqlite3 connection via
db.get_connection().executescript(sql) and then commit that connection (e.g.,
db.get_connection().commit()) to persist the migration; update ensure_table to
read the SQL, call executescript on db.get_connection(), and commit on that same
connection.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 5e8c5c01-46be-441c-8b3a-f76de4fa30b4

📥 Commits

Reviewing files that changed from the base of the PR and between 4b20d3f and 539da91.

📒 Files selected for processing (11)

application/reader/__init__.py
application/reader/dtos/__init__.py
application/reader/dtos/reader_feedback_dto.py
application/reader/schema.py
application/reader/services/__init__.py
application/reader/services/reader_simulation_service.py
infrastructure/persistence/database/migrations/add_reader_simulations.sql
infrastructure/persistence/database/reader_simulation_repository.py
interfaces/api/v1/reader/__init__.py
interfaces/main.py
tests/unit/application/services/test_reader_simulation_service.py

✅ Files skipped from review due to trivial changes (1)

infrastructure/persistence/database/migrations/add_reader_simulations.sql

🚧 Files skipped from review as they are similar to previous changes (1)

interfaces/main.py

coderabbitai · 2026-04-25T11:19:19Z

+    @field_validator(
+        "suspense_retention", "thrill_score", "churn_risk", "emotional_resonance",
+        mode="before",
+    )
+    @classmethod
+    def clamp_score(cls, value: object) -> float:
+        """将评分归一到 0-100 范围。"""
+        if value is None:
+            return 50.0
+        try:
+            v = float(value)
+        except (TypeError, ValueError):
+            return 50.0
+        return max(0.0, min(100.0, v))


⚠️ Potential issue | 🟠 Major

Keep the churn_risk fallback aligned with its declared default.

This shared validator returns 50.0 for None/invalid input on every dimension, but churn_risk is declared with a default of 30.0. A malformed LLM value now gets interpreted as elevated churn, which can create false positives in the alert endpoint and persisted averages. Give churn_risk its own 30.0 fallback or split the validators by field.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@application/reader/schema.py` around lines 34 - 47, The shared validator clamp_score (decorated on "suspense_retention", "thrill_score", "churn_risk", "emotional_resonance") always returns 50.0 for None/invalid values, but churn_risk should use its declared default 30.0; update the validation so churn_risk gets 30.0 on None/invalid inputs—either by splitting validators (create a separate clamp_churn_risk validator that returns 30.0 and apply it only to "churn_risk") or by making clamp_score inspect the field name (via the validator context or a separate decorated method) and return 30.0 when validating "churn_risk" while keeping 50.0 for the others; adjust decorator targets accordingly so other fields keep the 50.0 fallback.

coderabbitai · 2026-04-25T11:19:19Z

+        feedbacks = []
+        for fb in payload.feedbacks:
+            feedbacks.append(ReaderFeedbackDTO(
+                persona=fb.persona,
+                persona_label=PERSONA_LABELS.get(fb.persona, fb.persona),
+                scores=ReaderDimensionScoresDTO(
+                    suspense_retention=fb.scores.suspense_retention,
+                    thrill_score=fb.scores.thrill_score,
+                    churn_risk=fb.scores.churn_risk,
+                    emotional_resonance=fb.scores.emotional_resonance,
+                ),
+                one_line_verdict=fb.one_line_verdict,
+                highlights=list(fb.highlights),
+                pain_points=list(fb.pain_points),
+                suggestions=list(fb.suggestions),
+            ))
+
+        # 确保三个人设都有（缺失时填默认）
+        existing_personas = {f.persona for f in feedbacks}
+        for persona_key in ("hardcore", "casual", "nitpicker"):
+            if persona_key not in existing_personas:
+                feedbacks.append(ReaderFeedbackDTO(
+                    persona=persona_key,
+                    persona_label=PERSONA_LABELS.get(persona_key, persona_key),
+                    scores=ReaderDimensionScoresDTO(),
+                    one_line_verdict="（该读者视角的反馈未能生成）",
+                ))


⚠️ Potential issue | 🟠 Major

Normalize to exactly one feedback per canonical persona.

This loop preserves duplicate and unknown persona values from the LLM output, then only fills missing personas afterward. The result can contain 4+ feedback blocks, and ChapterReaderReportDTO._compute_avg_scores() will average over those extras, skewing both the API response and the persisted scores. Build the result from the three allowed keys (hardcore, casual, nitpicker) and ignore or overwrite anything outside that set.

🧰 Tools

🪛 Ruff (0.15.11)

[warning] 296-296: Comment contains ambiguous （ (FULLWIDTH LEFT PARENTHESIS). Did you mean ( (LEFT PARENTHESIS)?

(RUF003)

[warning] 296-296: Comment contains ambiguous ） (FULLWIDTH RIGHT PARENTHESIS). Did you mean ) (RIGHT PARENTHESIS)?

(RUF003)

[warning] 304-304: String contains ambiguous （ (FULLWIDTH LEFT PARENTHESIS). Did you mean ( (LEFT PARENTHESIS)?

(RUF001)

[warning] 304-304: String contains ambiguous ） (FULLWIDTH RIGHT PARENTHESIS). Did you mean ) (RIGHT PARENTHESIS)?

(RUF001)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@application/reader/services/reader_simulation_service.py` around lines 279 - 305, The current loop copies all payload.feedbacks (including duplicates and unknown personas) into feedbacks and then appends missing defaults, which can produce >3 entries and skew ChapterReaderReportDTO._compute_avg_scores(); instead, iterate only the canonical persona keys ("hardcore","casual","nitpicker") and for each key find the first matching fb in payload.feedbacks (ignore others), build a single ReaderFeedbackDTO using that fb's fields (persona_label via PERSONA_LABELS, scores via ReaderDimensionScoresDTO) or a default ReaderDimensionScoresDTO/one_line_verdict when none found, thereby ensuring exactly one feedback per canonical persona and preventing duplicates/unknown personas from affecting averages.

coderabbitai · 2026-04-25T11:19:19Z

+    def get_high_churn_chapters(
+        self, novel_id: str, threshold: float = 60.0,
+    ) -> List[Dict]:
+        """获取劝退风险高的章节（用于告警）。"""
+        db = self._get_db()
+        cursor = db.execute(
+            """SELECT * FROM reader_simulations
+               WHERE novel_id = ? AND avg_churn_risk >= ?
+               ORDER BY avg_churn_risk DESC""",
+            (novel_id, threshold),


⚠️ Potential issue | 🟠 Major

Only alert on the latest simulation per chapter.

This query returns every historical row whose avg_churn_risk crosses the threshold. After a chapter is re-simulated, the API can surface duplicate chapters or stale alerts even if the newest run is healthy. Mirror list_by_novel() here and filter on the latest row per chapter_number before applying the threshold.

Proposed fix

def get_high_churn_chapters( self, novel_id: str, threshold: float = 60.0, ) -> List[Dict]: """获取劝退风险高的章节（用于告警）。""" db = self._get_db() cursor = db.execute( - """SELECT * FROM reader_simulations - WHERE novel_id = ? AND avg_churn_risk >= ? - ORDER BY avg_churn_risk DESC""", - (novel_id, threshold), + """SELECT rs.* FROM reader_simulations rs + INNER JOIN ( + SELECT novel_id, chapter_number, MAX(created_at) AS max_created + FROM reader_simulations + WHERE novel_id = ? + GROUP BY novel_id, chapter_number + ) latest ON rs.novel_id = latest.novel_id + AND rs.chapter_number = latest.chapter_number + AND rs.created_at = latest.max_created + WHERE rs.avg_churn_risk >= ? + ORDER BY rs.avg_churn_risk DESC, rs.chapter_number ASC""", + (novel_id, threshold), ) return [dict(row) for row in cursor.fetchall()]

🧰 Tools

🪛 Ruff (0.15.11)

[warning] 112-112: Docstring contains ambiguous （ (FULLWIDTH LEFT PARENTHESIS). Did you mean ( (LEFT PARENTHESIS)?

(RUF002)

[warning] 112-112: Docstring contains ambiguous ） (FULLWIDTH RIGHT PARENTHESIS). Did you mean ) (RIGHT PARENTHESIS)?

(RUF002)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@infrastructure/persistence/database/reader_simulation_repository.py` around lines 109 - 118, get_high_churn_chapters currently returns every historical row above the threshold, causing duplicate/stale alerts; change it to only consider the latest simulation per chapter_number (same behavior as list_by_novel) by filtering/joining on the max simulation_time (or max id if used as monotonic) per (novel_id, chapter_number) before applying avg_churn_risk >= threshold and ordering; update the SQL/query in get_high_churn_chapters to select rows whose (novel_id, chapter_number) match the per-chapter latest timestamp, then apply the threshold and ORDER BY avg_churn_risk DESC so only the newest run per chapter can trigger an alert.

Fixes raised in PR shenminglinyi#135 review: - 🔴 Critical: DatabaseConnection has no executescript() attribute. Get raw sqlite3 connection via get_connection() and call executescript on it. Wrap in try/except since CREATE TABLE IF NOT EXISTS in migration makes repeated calls safe; remaining failures are logged not silenced. - 🟠 Major: silent LLM failure masquerading as success. Previously any LLM/parse failure returned HTTP 200 with default-value 'feedback'. Now: * Service catches LLM exceptions and returns fallback report tagged with is_fallback=True + error_message. * Empty-feedbacks parse result (LLM refusal / empty response) also marked as fallback with response preview for debugging. * Router rejects persistence of fallback reports (no more dirty data). * Router distinguishes client errors (400) from upstream LLM errors (502) instead of masking both as 200. - 🟠 Major: persistence failures silently swallowed. Now exposed via meta.persisted/persist_error in response so clients can retry. New DTO fields: ChapterReaderReportDTO.is_fallback, error_message. Tests: 3 new cases covering fallback semantics. Total 19/19 passing. Manually verified against running server: - Non-existent chapter -> HTTP 400 (was silently 200) - LLM failure -> HTTP 502 (was silently 200 with fake scores) - executescript warning gone from logs

JamesGoslings · 2026-04-25T11:32:56Z

@shenminglinyi 你好，冲突已解决，CodeRabbit 的 🔴 Critical 和 🟠 Major 问题也都修了（见 442500a）：

修复 executescript 调用错误
LLM 失败/空响应不再返回假的 200，改为 502 并拒绝持久化
持久化失败通过 meta.persist_error 显式告知客户端
新增 3 个 fallback 场景单元测试（19/19 通过）

本地已验证：启动 warning 消失。麻烦复审，期待合入 🙏

coderabbitai

♻️ Duplicate comments (5)

interfaces/api/v1/reader/__init__.py (3)

227-236: ⚠️ Potential issue | 🟡 Minor

Guard feedbacks_json shape before iterating pain points.

Line 236 assumes every item in feedbacks is a dict. Malformed JSON could cause a 500 if an item isn't a dict or if pain_points isn't a list.

Proposed fix

             # 提取痛点汇总
             all_pain_points = []
             for fb in feedbacks:
-                all_pain_points.extend(fb.get("pain_points", []))
+                if isinstance(fb, dict):
+                    pain_points = fb.get("pain_points", [])
+                    if isinstance(pain_points, list):
+                        all_pain_points.extend(pain_points)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@interfaces/api/v1/reader/__init__.py` around lines 227 - 236, Validate and
guard the shape of parsed feedbacks before iterating: ensure that after
json.loads(feedbacks_raw) the variable feedbacks is a list (fallback to [] if
not), and when iterating use type checks on each fb (e.g., ensure isinstance(fb,
dict)) and only extend all_pain_points with fb.get("pain_points", []) when
pain_points is a list (or coerce/skip non-list values); update the loop that
builds all_pain_points (variables: records, feedbacks_raw, feedbacks,
all_pain_points, fb) to perform these checks to avoid TypeError/AttributeError
on malformed JSON items.

49-57: ⚠️ Potential issue | 🟠 Major

_get_repo() won't retry table initialization after first failure.

At line 52, _repo is assigned before ensure_table(). If line 54 fails, subsequent calls skip initialization entirely because _repo is already set.

Proposed fix

 def _get_repo() -> ReaderSimulationRepository:
     global _repo
     if _repo is None:
-        _repo = ReaderSimulationRepository()
+        candidate = ReaderSimulationRepository()
         try:
-            _repo.ensure_table()
+            candidate.ensure_table()
         except Exception as e:
             logger.warning("读者模拟表初始化失败（首次使用时会自动重试）: %s", e)
+            raise
+        _repo = candidate
     return _repo

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@interfaces/api/v1/reader/__init__.py` around lines 49 - 57, _get_repo
currently assigns the global _repo before calling
ReaderSimulationRepository.ensure_table, so if ensure_table() raises the
partially-initialized _repo remains set and future calls won't retry; fix by
instantiating a local variable (e.g., repo = ReaderSimulationRepository()), call
repo.ensure_table() inside the try, and only assign the global _repo = repo
after ensure_table() succeeds; keep the existing except branch to log via
logger.warning and do not overwrite _repo on failure so subsequent calls will
retry initialization.

181-183: ⚠️ Potential issue | 🟡 Minor

Avoid exposing internal exception text in HTTP 500 responses.

Lines 183, 213, and 253 return str(e) or interpolated {e} in the HTTP response detail, which can leak backend internals to clients.

Proposed fix pattern

-    except Exception as e:
-        logger.error("获取读者模拟记录失败: %s", e)
-        raise HTTPException(status_code=500, detail=str(e))
+    except Exception as e:
+        logger.exception("获取读者模拟记录失败")
+        raise HTTPException(status_code=500, detail="服务器内部错误，请稍后重试") from e

Apply the same pattern to all generic except Exception handlers in this file.

Also applies to: 211-213, 251-253

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@interfaces/api/v1/reader/__init__.py` around lines 181 - 183, Multiple
generic except Exception handlers in this module currently expose internal
exception text via HTTPException(detail=str(e)); update each handler (the ones
that log and raise HTTPException around the reader endpoints) to log the full
exception (e.g., logger.exception(...) or logger.error(..., exc_info=True)) but
return a generic client-safe message in the HTTP response (for example raise
HTTPException(status_code=500, detail="Internal server error while processing
request") or similar), and apply this change consistently to the three handlers
that currently use str(e) so logs keep full details while responses do not leak
internals.

application/reader/services/reader_simulation_service.py (1)

307-333: ⚠️ Potential issue | 🟠 Major

Normalize to exactly one feedback per canonical persona.

This loop preserves duplicate and unknown persona values from LLM output, then fills missing personas afterward. If the LLM returns multiple feedbacks with the same persona (e.g., two "hardcore" entries), the result can contain 4+ feedback blocks, skewing _compute_avg_scores().

Consider iterating only the canonical keys and taking the first match:

Proposed fix

-        feedbacks = []
-        for fb in payload.feedbacks:
-            feedbacks.append(ReaderFeedbackDTO(
-                persona=fb.persona,
-                persona_label=PERSONA_LABELS.get(fb.persona, fb.persona),
-                scores=ReaderDimensionScoresDTO(
-                    suspense_retention=fb.scores.suspense_retention,
-                    thrill_score=fb.scores.thrill_score,
-                    churn_risk=fb.scores.churn_risk,
-                    emotional_resonance=fb.scores.emotional_resonance,
-                ),
-                one_line_verdict=fb.one_line_verdict,
-                highlights=list(fb.highlights),
-                pain_points=list(fb.pain_points),
-                suggestions=list(fb.suggestions),
-            ))
-
-        # 确保三个人设都有（缺失时填默认）
-        existing_personas = {f.persona for f in feedbacks}
-        for persona_key in ("hardcore", "casual", "nitpicker"):
-            if persona_key not in existing_personas:
-                feedbacks.append(ReaderFeedbackDTO(
-                    persona=persona_key,
-                    persona_label=PERSONA_LABELS.get(persona_key, persona_key),
-                    scores=ReaderDimensionScoresDTO(),
-                    one_line_verdict="（该读者视角的反馈未能生成）",
-                ))
+        # Build exactly one feedback per canonical persona
+        persona_fb_map = {fb.persona: fb for fb in reversed(payload.feedbacks)}
+        feedbacks = []
+        for persona_key in ("hardcore", "casual", "nitpicker"):
+            fb = persona_fb_map.get(persona_key)
+            if fb:
+                feedbacks.append(ReaderFeedbackDTO(
+                    persona=persona_key,
+                    persona_label=PERSONA_LABELS.get(persona_key, persona_key),
+                    scores=ReaderDimensionScoresDTO(
+                        suspense_retention=fb.scores.suspense_retention,
+                        thrill_score=fb.scores.thrill_score,
+                        churn_risk=fb.scores.churn_risk,
+                        emotional_resonance=fb.scores.emotional_resonance,
+                    ),
+                    one_line_verdict=fb.one_line_verdict,
+                    highlights=list(fb.highlights),
+                    pain_points=list(fb.pain_points),
+                    suggestions=list(fb.suggestions),
+                ))
+            else:
+                feedbacks.append(ReaderFeedbackDTO(
+                    persona=persona_key,
+                    persona_label=PERSONA_LABELS.get(persona_key, persona_key),
+                    scores=ReaderDimensionScoresDTO(),
+                    one_line_verdict="（该读者视角的反馈未能生成）",
+                ))

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@application/reader/services/reader_simulation_service.py` around lines 307 -
333, The current builder collects all payload.feedbacks (allowing
duplicates/unknown personas) then pads missing personas, which can create >3
entries and break _compute_avg_scores(); change the logic in the
ReaderFeedbackDTO construction to iterate canonical persona keys
("hardcore","casual","nitpicker") and for each persona_key select the first
matching entry from payload.feedbacks (match on fb.persona == persona_key),
mapping it into ReaderFeedbackDTO (use PERSONA_LABELS and
ReaderDimensionScoresDTO as before); if no match found, append a default
ReaderFeedbackDTO with empty ReaderDimensionScoresDTO and the placeholder
one_line_verdict — this guarantees exactly one feedback per canonical persona
and ignores duplicates/unknowns.

infrastructure/persistence/database/reader_simulation_repository.py (1)

133-144: ⚠️ Potential issue | 🟠 Major

Only alert on the latest simulation per chapter.

This query returns every historical row whose avg_churn_risk crosses the threshold, which can surface duplicate/stale alerts after re-simulation. Mirror the list_by_novel() approach by filtering on the latest row per chapter_number before applying the threshold.

Proposed fix

     def get_high_churn_chapters(
         self, novel_id: str, threshold: float = 60.0,
     ) -> List[Dict]:
         """获取劝退风险高的章节（用于告警）。"""
         db = self._get_db()
         cursor = db.execute(
-            """SELECT * FROM reader_simulations
-               WHERE novel_id = ? AND avg_churn_risk >= ?
-               ORDER BY avg_churn_risk DESC""",
-            (novel_id, threshold),
+            """SELECT rs.* FROM reader_simulations rs
+               INNER JOIN (
+                   SELECT novel_id, chapter_number, MAX(created_at) AS max_created
+                   FROM reader_simulations
+                   WHERE novel_id = ?
+                   GROUP BY novel_id, chapter_number
+               ) latest ON rs.novel_id = latest.novel_id
+                       AND rs.chapter_number = latest.chapter_number
+                       AND rs.created_at = latest.max_created
+               WHERE rs.avg_churn_risk >= ?
+               ORDER BY rs.avg_churn_risk DESC""",
+            (novel_id, threshold),
         )
         return [dict(row) for row in cursor.fetchall()]

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@infrastructure/persistence/database/reader_simulation_repository.py` around
lines 133 - 144, The current get_high_churn_chapters method returns all
historical rows exceeding the threshold; change it to only consider the latest
simulation per chapter by filtering to the most recent row per chapter_number
(mirroring list_by_novel). Modify the SQL in get_high_churn_chapters to join or
subquery on reader_simulations grouped by chapter_number (e.g., using
MAX(simulation_time) or MAX(id)) to select only the latest row for each
chapter_number, then apply the avg_churn_risk >= ? filter and ORDER BY
avg_churn_risk DESC so stale duplicates are excluded.

🧹 Nitpick comments (3)

infrastructure/persistence/database/reader_simulation_repository.py (1)

93-93: datetime.utcnow() is deprecated in Python 3.12+.

Consider using datetime.now(timezone.utc) for forward compatibility.

Proposed fix

+from datetime import datetime, timezone
-from datetime import datetime
...
-                datetime.utcnow().isoformat(),
+                datetime.now(timezone.utc).isoformat(),

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@infrastructure/persistence/database/reader_simulation_repository.py` at line
93, Replace the deprecated datetime.utcnow() call with a timezone-aware
timestamp: change the occurrence of datetime.utcnow().isoformat() in
reader_simulation_repository (where the timestamp is generated) to use
datetime.now(timezone.utc).isoformat() and update imports to include timezone
(e.g., import or add from datetime import timezone). Ensure any tests or callers
expecting UTC strings still work with the ISO8601 output.

application/reader/services/reader_simulation_service.py (2)

342-342: datetime.utcnow() is deprecated in Python 3.12+.

Same as in the repository file, consider using datetime.now(timezone.utc) for forward compatibility.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@application/reader/services/reader_simulation_service.py` at line 342,
Replace the deprecated naive UTC timestamp creation used for the analyzed_at
field (datetime.utcnow()) with an aware UTC datetime: use
datetime.now(timezone.utc) and update imports to include timezone (e.g., from
datetime import datetime, timezone) so analyzed_at is timezone-aware; ensure any
downstream code expecting naive datetimes is adjusted or cast accordingly in
reader_simulation_service.py (look for the analyzed_at assignment).

104-105: Silent exception swallowing hides knowledge repository issues.

The try-except-pass pattern completely swallows errors from get_knowledge_repository(). Consider at least logging at DEBUG level so issues are discoverable during troubleshooting.

Proposed fix

         if self._knowledge_repo and prev_chapter:
             try:
                 knowledge = self._knowledge_repo.get_by_novel_id(novel_id)
                 if knowledge:
                     ch_sum = knowledge.get_chapter(chapter_number - 1)
                     if ch_sum:
                         prev_summary = ch_sum.summary or ""
             except Exception:
-                pass
+                logger.debug("获取知识仓储前章摘要失败，已忽略", exc_info=True)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@application/reader/services/reader_simulation_service.py` around lines 104 -
105, The bare "except: pass" swallowing exceptions from
get_knowledge_repository() should be replaced with catching the exception (e.g.,
"except Exception as e:") and logging it at DEBUG level so issues are
discoverable; update the exception block in the ReaderSimulationService (where
get_knowledge_repository() is called) to call an available logger (e.g.,
self.logger.debug or process_logger.debug) and include the exception details
(use exc_info=True or format the exception) instead of silently passing.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@application/reader/services/reader_simulation_service.py`:
- Around line 307-333: The current builder collects all payload.feedbacks
(allowing duplicates/unknown personas) then pads missing personas, which can
create >3 entries and break _compute_avg_scores(); change the logic in the
ReaderFeedbackDTO construction to iterate canonical persona keys
("hardcore","casual","nitpicker") and for each persona_key select the first
matching entry from payload.feedbacks (match on fb.persona == persona_key),
mapping it into ReaderFeedbackDTO (use PERSONA_LABELS and
ReaderDimensionScoresDTO as before); if no match found, append a default
ReaderFeedbackDTO with empty ReaderDimensionScoresDTO and the placeholder
one_line_verdict — this guarantees exactly one feedback per canonical persona
and ignores duplicates/unknowns.

In `@infrastructure/persistence/database/reader_simulation_repository.py`:
- Around line 133-144: The current get_high_churn_chapters method returns all
historical rows exceeding the threshold; change it to only consider the latest
simulation per chapter by filtering to the most recent row per chapter_number
(mirroring list_by_novel). Modify the SQL in get_high_churn_chapters to join or
subquery on reader_simulations grouped by chapter_number (e.g., using
MAX(simulation_time) or MAX(id)) to select only the latest row for each
chapter_number, then apply the avg_churn_risk >= ? filter and ORDER BY
avg_churn_risk DESC so stale duplicates are excluded.

In `@interfaces/api/v1/reader/__init__.py`:
- Around line 227-236: Validate and guard the shape of parsed feedbacks before
iterating: ensure that after json.loads(feedbacks_raw) the variable feedbacks is
a list (fallback to [] if not), and when iterating use type checks on each fb
(e.g., ensure isinstance(fb, dict)) and only extend all_pain_points with
fb.get("pain_points", []) when pain_points is a list (or coerce/skip non-list
values); update the loop that builds all_pain_points (variables: records,
feedbacks_raw, feedbacks, all_pain_points, fb) to perform these checks to avoid
TypeError/AttributeError on malformed JSON items.
- Around line 49-57: _get_repo currently assigns the global _repo before calling
ReaderSimulationRepository.ensure_table, so if ensure_table() raises the
partially-initialized _repo remains set and future calls won't retry; fix by
instantiating a local variable (e.g., repo = ReaderSimulationRepository()), call
repo.ensure_table() inside the try, and only assign the global _repo = repo
after ensure_table() succeeds; keep the existing except branch to log via
logger.warning and do not overwrite _repo on failure so subsequent calls will
retry initialization.
- Around line 181-183: Multiple generic except Exception handlers in this module
currently expose internal exception text via HTTPException(detail=str(e));
update each handler (the ones that log and raise HTTPException around the reader
endpoints) to log the full exception (e.g., logger.exception(...) or
logger.error(..., exc_info=True)) but return a generic client-safe message in
the HTTP response (for example raise HTTPException(status_code=500,
detail="Internal server error while processing request") or similar), and apply
this change consistently to the three handlers that currently use str(e) so logs
keep full details while responses do not leak internals.

---

Nitpick comments:
In `@application/reader/services/reader_simulation_service.py`:
- Line 342: Replace the deprecated naive UTC timestamp creation used for the
analyzed_at field (datetime.utcnow()) with an aware UTC datetime: use
datetime.now(timezone.utc) and update imports to include timezone (e.g., from
datetime import datetime, timezone) so analyzed_at is timezone-aware; ensure any
downstream code expecting naive datetimes is adjusted or cast accordingly in
reader_simulation_service.py (look for the analyzed_at assignment).
- Around line 104-105: The bare "except: pass" swallowing exceptions from
get_knowledge_repository() should be replaced with catching the exception (e.g.,
"except Exception as e:") and logging it at DEBUG level so issues are
discoverable; update the exception block in the ReaderSimulationService (where
get_knowledge_repository() is called) to call an available logger (e.g.,
self.logger.debug or process_logger.debug) and include the exception details
(use exc_info=True or format the exception) instead of silently passing.

In `@infrastructure/persistence/database/reader_simulation_repository.py`:
- Line 93: Replace the deprecated datetime.utcnow() call with a timezone-aware
timestamp: change the occurrence of datetime.utcnow().isoformat() in
reader_simulation_repository (where the timestamp is generated) to use
datetime.now(timezone.utc).isoformat() and update imports to include timezone
(e.g., import or add from datetime import timezone). Ensure any tests or callers
expecting UTC strings still work with the ISO8601 output.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 95ba0fb2-3ae0-4cb2-97c8-755b4b79a4c5

📥 Commits

Reviewing files that changed from the base of the PR and between 539da91 and 442500a.

📒 Files selected for processing (5)

application/reader/dtos/reader_feedback_dto.py
application/reader/services/reader_simulation_service.py
infrastructure/persistence/database/reader_simulation_repository.py
interfaces/api/v1/reader/__init__.py
tests/unit/application/services/test_reader_simulation_service.py

JamesGoslings requested a review from shenminglinyi as a code owner April 20, 2026 14:03

coderabbitai Bot reviewed Apr 20, 2026

View reviewed changes

shenminglinyi mentioned this pull request Apr 25, 2026

feat:让读者看完一章就想点下一章 #75

Closed

JamesGoslings force-pushed the feat/reader-simulation-agent branch from 4b20d3f to 539da91 Compare April 25, 2026 11:13

coderabbitai Bot reviewed Apr 25, 2026

View reviewed changes

shenminglinyi merged commit 442adda into shenminglinyi:master Apr 25, 2026
1 check passed

Conversation

JamesGoslings commented Apr 20, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

变更类型

变更说明

架构影响

测试

Uh oh!

coderabbitai Bot commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

shenminglinyi commented Apr 25, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

JamesGoslings commented Apr 25, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

JamesGoslings commented Apr 20, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 20, 2026 •

edited

Loading