v4.0.0 — Multilingual pipeline, language-aware search, reports reorganization
v4.0.0 — Multilingual pipeline, language-aware search, reports reorganization
Changes from v3.0.2
🌍 Fully multilingual final report
- All summary labels (outline/data/report/chapters/sources/facts/lines/chars/min) now dynamically translate to
$LANG— zh→中文, en→English, fr→Français, ja→日本語, ru→Русский, etc. - Chapter list heading also translated. Previously only zh/en were supported; now all 19 languages.
🔍 Language-aware search source filtering
- Non-Chinese research (
$LANG != "zh") now skips Chinese-only search engines (cn.bing.com, sogou, 360) and all B-class Chinese sources (zhihu, 36kr, CSDN, etc.) to eliminate irrelevant results. - Generic search engines now get locale parameters: Brave
&country={COUNTRY}, Mojeek&lang={LANG}. - Regional engines added: Yandex for Russian (
ru), Yahoo JP for Japanese (ja). - LANG→COUNTRY mapping table added to SKILL.md for Task 2 variable replacement.
📁 Reports organized by language
- Reports now saved to
reports/$LANG/subdirectories (e.g.,reports/zh/,reports/en/,reports/fr/). - Existing 38 reports classified and moved into their respective language directories.
🔧 Windows compatibility improvements
- Filename sanitization (dr_gen.py): Windows-invalid characters (
<>:"/\|?*) replaced with-; trailing dots/spaces trimmed. - Zero-byte file cleanup: Task 4 deletes stale 0-byte stubs before assembly to prevent silent failures.
os.makedirs(dirname(output), exist_ok=True)added as safety net in dr_gen.py..gitignoreupdated:tmp/,language.txt,start_time.txtnow ignored.
🔄 Pipeline restructuring
- Setup phase extracted: TMPDIR creation, TOOLSDIR/PROMPTSDIR detection, and file reading now happen before Step 0 (language detection). Previously Step 0 referenced
{TMPDIR}before it was created, causinglanguage.txtto be written to the wrong location on first run. - Language detection now announces result:
🌐 Language detected: enafter completion. - "禁止" rule updated: clarifies handoff file reads (outline.json, manifest.json) are allowed between tasks; only search calls and data processing must stay within sub-agents.
✅ QA improvements
- TOC heading whitelist expanded in
dr_check.py: now includes all 19 language variants (目次/목차/Índice/Table des matières/Inhaltsverzeichnis/etc.) — previously only had English/Chinese/German. - Final report template requires all labels to be in
$LANG(no more Chinese labels appearing in French research output).
Files changed
SKILL.md— Setup phase, search source filtering table, reports/$LANG/, final report multilingual template, updated variable mappingsprompts/task2_data_collection.md— Search source language filtering, regional engines, LANG/COUNTRY variablesprompts/task4_assembly.md— Output path changed toreports/{LANG}/tools/dr_gen.py— Filename sanitization,os.makedirssafety nettools/dr_check.py— TOC heading whitelist expanded to 19 languages.gitignore— New ignores for tmp/ and temp filesVERSION— 3.0.2 → 4.0.0reports/— Existing 38 files reorganized by language subdirectory
v4.0.0 — 全链路多语言、搜索源按语言过滤、报告按语言分类
相对于 v3.0.2 的变更
🌍 最终汇报完全多语言化
- 所有摘要标签(大纲/数据/报告/章/来源/事实/行/字/分钟)根据
$LANG动态翻译——zh→中文、en→English、fr→Français、ja→日本語、ru→Русский…… - 章节列表标题同步翻译。此前仅支 zh/en 两种,现覆盖全部 19 种语言。
🔍 搜索源按语言过滤
- 非中文调研(
$LANG != "zh")跳过中文专用搜索引擎(cn.bing.com、搜狗、360)和 B 类中文源(知乎、36氪、CSDN 等),避免噪音结果。 - 通用搜索引擎加 locale 参数:Brave
&country={COUNTRY}、Mojeek&lang={LANG}。 - 新增区域引擎:俄语用 Yandex,日语用 Yahoo JP。
- 在 SKILL.md 中添加 LANG→COUNTRY 映射表用于 Task 2 变量替换。
📁 报告按语言分类
- 报告保存到
reports/$LANG/子目录(如reports/zh/、reports/en/、results/fr/)。 - 38 份现有报告已分类移入对应语言目录。
🔧 Windows 兼容性改进
- 文件名净化(dr_gen.py):Windows 非法字符(
<>:"/\|?*)替换为-;尾部句点和空格去除。 - 零字节残留清理:Task 4 在装配前删除所有 0 字节文件,防止静默失败。
- dr_gen.py 写文件前加
os.makedirs(dirname(output), exist_ok=True)兜底。 .gitignore更新:新增tmp/、language.txt、start_time.txt。
🔄 流程重构
- 分离出 Setup 阶段:TMPDIR 创建、TOOLSDIR/PROMPTSDIR 确定、文件读取,现在都在 Step 0 语言判定之前完成。此前 Step 0 引用
{TMPDIR}时目录还未创建,导致language.txt第一次被写到错误位置。 - 语言判定后向用户公告结果:
🌐 Language detected: en。 - 更新"禁止"规则:明确 Task 间 handoff 文件读取(outline.json、manifest.json)不受限;只有搜索引擎调用和数据处理必须在子 agent 内完成。
✅ QA 改进
dr_check.py的 TOC 标题白名单扩展到 19 种语言(目次/목차/Índice/Table des matières/Inhaltsverzeichnis 等),此前只有英文/中文/德语三项。- 最终汇报模板强制全部标签按
$LANG翻译(不再出现法语调研结果显示中文标签的问题)。
变更文件
SKILL.md— Setup 阶段、搜索源过滤表、reports/$LANG/、最终汇报多语言模板、变量映射更新prompts/task2_data_collection.md— 搜索源语言过滤、区域引擎、LANG/COUNTRY 变量prompts/task4_assembly.md— 输出路径改为reports/{LANG}/tools/dr_gen.py— 文件名净化、os.makedirs兜底tools/dr_check.py— TOC 标题白名单扩展到 19 种语言.gitignore— 新增 tmp/ 和临时文件忽略VERSION— 3.0.2 → 4.0.0reports/— 38 份现有报告按语言子目录重组