[v1.3] 增强文本解码 #1166

cyfung1031 · 2026-01-31T03:20:05Z

概述 Descriptions

close #1163

主要用来处理网上下载代码时编码非utf-8但charset又没设置时
进行 BOM 或 null guessing 来决定 utf-8 / utf-16le / utf-16be / utf-32le / utf-32be

不处理 cloud sync 相关的代码

变更内容 Changes

截图 Screenshots

CodFrm · 2026-01-31T03:23:25Z

src/app/service/service_worker/resource.ts

    const uint8Array = new Uint8Array(arrayBuffer);
    if (isText(uint8Array)) {
-      resource.content = await data.text();
+      resource.content = await readBlobContent(data, contentType);


@resource 应该要保留原汁原味

@require和@require-css 是会转换成代码运行的，可以进行解码

Copilot

Pull request overview

这个 PR 为 ScriptCat 增强了文本解码能力，主要解决从网络下载脚本时编码不是 UTF-8 且 charset 未正确设置的问题。通过实现 BOM 检测和基于 null 字节模式的启发式算法，可以自动识别 UTF-8、UTF-16LE/BE、UTF-32LE/BE 等编码格式。

Changes:

新增 readBlobContent 函数，实现智能编码检测，按优先级依次尝试：Content-Type charset → BOM 检测 → null 模式启发式 → UTF-8 验证 → windows-1252 回退
新增 detectBOM 辅助函数用于识别各种 BOM 标记
新增 guessByNullPattern 辅助函数通过 null 字节分布模式识别 UTF-16/32 编码
优化 parseCharsetFromContentType 函数，返回空字符串替代 null，简化逻辑
优化 decodeUTF32 性能，仅在 BE 分支创建 DataView
在脚本获取、资源加载、favicon 抓取等场景应用新的编码检测逻辑

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
src/pkg/utils/encoding.ts	核心实现：新增智能编码检测函数和辅助算法，优化现有编码函数
src/pkg/utils/encoding.test.ts	更新测试：调整 parseCharsetFromContentType 返回值断言从 null 改为空字符串
src/pkg/utils/script.ts	集成应用：在 fetchScriptBody 中使用 readBlobContent 替代 resp.text()
src/pages/store/favicons.ts	集成应用：在 fetchIconByDomain 中使用 readBlobContent 处理 HTML 内容
src/app/service/service_worker/resource.ts	集成应用：在 loadByUrl 中使用 readBlobContent 处理文本资源，并优化 contentType 变量提取

Copilot · 2026-01-31T03:33:43Z

src/pkg/utils/encoding.ts

+export const readBlobContent = async (blob: Blob | File | Response, contentType: string | null): Promise<string> => {
+  const buffer = await blob.arrayBuffer();
+  const uint8 = new Uint8Array(buffer);
+
+  if (uint8.length === 0) {
+    return "";
+  }
+
+  // 优先尝试使用 Content-Type header 中的 charset
+  const headerCharset = parseCharsetFromContentType(contentType);
+  if (headerCharset) {
+    try {
+      // 验证 charset 是否有效
+      return bytesDecode(headerCharset, uint8);
+    } catch (e: any) {
+      console.warn(`Invalid charset from Content-Type header: ${headerCharset}, error: ${e.message}`);
+    }
+  }
+
+  // BOM detection (highest priority)
+  const bomEncoding = detectBOM(uint8);
+  if (bomEncoding) return bytesDecode(bomEncoding, uint8);
+
+  const checkSize = Math.min(uint8.length, 16 * 1024);
+
+  if (uint8.length % 2 === 0) {
+    // Heuristic detection (first 16 KB)
+    const heuristicEncoding = guessByNullPattern(uint8, checkSize);
+    if (heuristicEncoding) return bytesDecode(heuristicEncoding, uint8);
+  }
+
+  // UTF-8 validation → legacy fallback
+  let encoding = "utf-8";
+  try {
+    // Strict mode – throws on invalid sequences
+    new TextDecoder("utf-8", { fatal: true }).decode(uint8.subarray(0, checkSize));
+  } catch {
+    // Invalid UTF-8 → most common real-world fallback
+    encoding = "windows-1252"; // OR detectEncoding(uint8, null)
+  }
+
+  return bytesDecode(encoding, uint8);
+};


新增的 readBlobContent 函数是这个 PR 的核心功能，但缺少专门的单元测试。考虑到该函数集成了多个编码检测步骤（Content-Type header → BOM → null pattern 启发式 → UTF-8 验证 → fallback），建议添加测试用例覆盖以下场景：

带有有效 Content-Type charset 的情况

带有无效 Content-Type charset 的情况

各种 BOM（UTF-8, UTF-16LE, UTF-16BE, UTF-32LE, UTF-32BE）

无 BOM 但可通过 null pattern 检测的 UTF-16/32 编码

UTF-8 文本（无 BOM）

无效 UTF-8 回退到 windows-1252

空 Blob 的情况

长度小于 64 字节的短文本

鉴于同目录下其他工具函数都有对应的测试文件，且 encoding.test.ts 已经有 510 行的测试代码，这表明项目对测试覆盖率有较高要求。

@copilot 请开一个PR新增所需的单元测试

@CodFrm 我不能指示 copilot...

@CodFrm 我不能指示 copilot...

似乎是因为跨仓库了？我也不行

@copilot open a new pull request to apply changes based on this feedback

src/pkg/utils/encoding.ts

Copilot · 2026-01-31T03:33:43Z

src/pkg/utils/encoding.ts

+    new TextDecoder("utf-8", { fatal: true }).decode(uint8.subarray(0, checkSize));
+  } catch {
+    // Invalid UTF-8 → most common real-world fallback
+    encoding = "windows-1252"; // OR detectEncoding(uint8, null)


在 line 208，注释 // OR detectEncoding(uint8, null) 提示了一个替代方案。目前的实现直接使用 windows-1252 作为 fallback，这是西欧语言最常见的编码。

如果需要支持更广泛的编码（如亚洲语言），可以考虑使用 detectEncoding(uint8, null) 来通过 chardet 自动检测。但这会增加性能开销。

当前的实现偏向于性能和简单性，是合理的选择。如果未来收到关于非西欧语言编码的问题反馈，可以考虑改用 detectEncoding。这个注释可以保留作为未来改进的提示。

当前的实现偏向于性能和简单性，是合理的选择。如果未来收到关于非西欧语言编码的问题反馈，可以考虑改用 detectEncoding。这个注释可以保留作为未来改进的提示。

src/pkg/utils/encoding.ts

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

增强文本解码

052f7d4

cyfung1031 changed the title ~~增强文本解码~~ [v1.3] 增强文本解码 Jan 31, 2026

CodFrm reviewed Jan 31, 2026

View reviewed changes

CodFrm requested a review from Copilot January 31, 2026 03:29

Copilot started reviewing on behalf of CodFrm January 31, 2026 03:29 View session

Copilot AI reviewed Jan 31, 2026

View reviewed changes

cyfung1031 and others added 4 commits January 31, 2026 12:40

Update resource.ts

636089c

Update src/pkg/utils/encoding.ts

82116ec

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update src/pkg/utils/encoding.ts

ee1ad8c

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

添加单元测试

795ec3d

CodFrm merged commit f4ac909 into scriptscat:release/v1.3 Feb 1, 2026
1 of 2 checks passed

cyfung1031 deleted the pr-text-decode-101 branch February 1, 2026 07:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[v1.3] 增强文本解码 #1166

[v1.3] 增强文本解码 #1166

cyfung1031 commented Jan 31, 2026

Uh oh!

CodFrm Jan 31, 2026

Uh oh!

cyfung1031 Jan 31, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jan 31, 2026

Uh oh!

cyfung1031 Jan 31, 2026

Uh oh!

cyfung1031 Jan 31, 2026

Uh oh!

CodFrm Jan 31, 2026 •

edited

Loading

Uh oh!

CodFrm Jan 31, 2026

Uh oh!

Uh oh!

Uh oh!

Copilot AI Jan 31, 2026

Uh oh!

cyfung1031 Jan 31, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[v1.3] 增强文本解码 #1166

[v1.3] 增强文本解码 #1166

Conversation

cyfung1031 commented Jan 31, 2026

概述 Descriptions

变更内容 Changes

截图 Screenshots

Uh oh!

CodFrm Jan 31, 2026

Choose a reason for hiding this comment

Uh oh!

cyfung1031 Jan 31, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Jan 31, 2026

Choose a reason for hiding this comment

Uh oh!

cyfung1031 Jan 31, 2026

Choose a reason for hiding this comment

Uh oh!

cyfung1031 Jan 31, 2026

Choose a reason for hiding this comment

Uh oh!

CodFrm Jan 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

CodFrm Jan 31, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI Jan 31, 2026

Choose a reason for hiding this comment

Uh oh!

cyfung1031 Jan 31, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

CodFrm Jan 31, 2026 •

edited

Loading