fix(landing): render real Unicode in language tags instead of literal \u escapes by heznpc · Pull Request #74 · heznpc/skillBridge

heznpc · 2026-04-10T11:45:47Z

Summary

The landing page at https://heznpc.github.io/skillBridge/ has been displaying literal \uXXXX escape sequences for non-Latin language names since the lang-tag list was first auto-generated. Verified live before this fix.

Language	Was showing	Should show
Korean	`\ud55c\uad6d\uc5b4`	한국어
Japanese	`\u65e5\u672c\u8a9e`	日本語
Chinese (Simplified)	`\u4e2d\u6587(\u7b80\u4f53)`	中文(简体)
Chinese (Traditional)	`\u4e2d\u6587(\u7e41\u9ad4)`	中文(繁體)
Russian	`\u0420\u0443\u0441\u0441\u043a\u0438\u0439`	Русский
(also Vietnamese, Ukrainian, Czech, Turkish, Arabic, Hindi, Thai, Bengali, Hebrew, Romanian, Greek in `+ N more`)	`\u...`	actual chars

Languages whose names are pure Latin (Español, Français, Deutsch, Português (BR), etc.) rendered fine because they were already stored with literal characters.

Root cause

src/lib/constants.js stored each language label as a string literal containing \uXXXX escape sequences:

const PREMIUM_LANGUAGES = [
  { code: 'ko', label: '\ud55c\uad6d\uc5b4' },   // ← escape literal
  { code: 'ja', label: '\u65e5\u672c\u8a9e' },
  // ...
  { code: 'de', label: 'Deutsch' },              // ← already plain UTF-8
];

At runtime this is fine. The JS engine decodes \uXXXX when parsing the source, so popup.js, header-controls.js, translator.js, and the in-extension language picker all see the correct characters and display them properly.

But scripts/generate-docs.js reads constants.js as a TEXT FILE and uses a regex to extract the label between single quotes:

const entryRe = /\{\s*code:\s*'([^']+)',\s*label:\s*'([^']+)'\s*\}/g;

The regex captures the raw bytes \, u, d, 5, 5, c, ... which then get written verbatim into docs/index.html. Browsers don't decode \uXXXX in HTML text content, so users see the escape sequences literally on the live page.

Fix

Convert every \uXXXX escape in PREMIUM_LANGUAGES and AVAILABLE_LANGUAGES to its literal UTF-8 character. constants.js is already a UTF-8 source file, every other label was already non-escaped (Deutsch, Italiano, Polski...), and prettier/eslint accept either form — there was no reason to use escapes.

Once the source has real characters, the script's regex captures them and the HTML output is correct UTF-8.

Why this is safe at runtime

'\ud55c\uad6d\uc5b4' and '한국어' produce identical strings when the JS engine parses them
All runtime consumers (popup.js:112-128, header-controls.js:78-178, translator.js:33, content.js:455) use lang.label as text and don't string-compare against escape literals
tests/constants.test.js only checks length, code presence, and label truthiness — no string content assertions
Tests pass unchanged (309/309)

Verification

Check	Result
`npm test`	309/309 passing (same as baseline)
`npm run lint`	clean
`npm run format:check`	clean (Prettier accepts UTF-8)
`npm run docs`	idempotent — second run produces no further diff
Raw byte inspection of `docs/index.html`	confirmed valid UTF-8 with real Unicode characters

Out of scope (separate fix candidate)

README.md:7 links to https://heznpc.github.io/skillbridge/ (lowercase b) but the actual GitHub Pages URL is https://heznpc.github.io/skillBridge/ (capital B). Lowercase returns 404. Same project area but a separate concern — will track separately if you'd like.

Test plan

CI green (test + validate)
After merge, GitHub Pages auto-rebuilds (a few minutes)
Verify live page at https://heznpc.github.io/skillBridge/ shows real characters in lang-tags

🤖 Generated with Claude Code

… \u escapes The landing page at heznpc.github.io/skillBridge/ has been displaying literal escape sequences for non-Latin language names since the lang-tag list was first auto-generated: Korean : \ud55c\uad6d\uc5b4 (should be 한국어) Japanese : \u65e5\u672c\u8a9e (should be 日本語) Chinese : \u4e2d\u6587(\u7b80\u4f53) (should be 中文(简体)) Russian : \u0420\u0443\u0441\u0441\u043a\u0438\u0439 (should be Русский) ... and 5 more Verified live at https://heznpc.github.io/skillBridge/ before this fix. Languages whose names are pure Latin (Spanish, French, German, Vietnamese post-decoding via the JS engine in popup) rendered fine. Root cause: src/lib/constants.js stored each language label as a string literal containing \uXXXX escape sequences. At runtime this is fine — the JS engine decodes them when parsing the source — so popup.js, header- controls.js, and the in-extension language picker all show the right characters. But scripts/generate-docs.js reads constants.js as a TEXT FILE and uses a regex to extract the label between single quotes. The regex captures the raw bytes \, u, d, 5, 5, c, ... which then get written verbatim into docs/index.html. Browsers don't decode \uXXXX in HTML text content, so the live page shows the escape sequences literally. Fix: convert every \uXXXX escape in PREMIUM_LANGUAGES and AVAILABLE_LANGUAGES to its literal UTF-8 character. constants.js is already a UTF-8 source file, every other label was already non-escaped (Deutsch, Italiano, etc.), and prettier/eslint accept either form — there was no reason to use escapes in the first place. Once the source has real characters, the script's regex captures them and the HTML gets correct UTF-8 output. Why this is safe at runtime: - The JS engine produces an identical string from '\ud55c\uad6d\uc5b4' and '한국어'. - All runtime consumers (popup.js:112-128, header-controls.js:78-178, translator.js:33, content.js:455) use lang.label as text and don't string-compare against escape literals. - tests/constants.test.js only checks length, code presence, and label truthiness — no string content assertions. Tests pass unchanged. Verification: - npm test — 309/309 passing (same as baseline) - npm run lint — clean - npm run format:check — clean (Prettier accepts UTF-8 in source) - npm run docs — idempotent; second run produces no diff - docs/index.html — confirmed via raw byte read that the new content is valid UTF-8 with real Unicode characters Out of scope (separate fix candidate): - README.md L7 links to https://heznpc.github.io/skillbridge/ (lowercase b) but the actual GitHub Pages URL is https://heznpc.github.io/skillBridge/ (capital B). Lowercase returns 404. Same project area but a separate concern. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

heznpc merged commit 2d1314b into main Apr 10, 2026
2 checks passed

heznpc deleted the fix/landing-page-unicode-labels branch April 10, 2026 11:46

heznpc mentioned this pull request Apr 10, 2026

fix(cd): repair file change detection and switch CWS to draft-only upload #75

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(landing): render real Unicode in language tags instead of literal \u escapes#74

fix(landing): render real Unicode in language tags instead of literal \u escapes#74
heznpc merged 1 commit intomainfrom
fix/landing-page-unicode-labels

heznpc commented Apr 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

heznpc commented Apr 10, 2026

Summary

Root cause

Fix

Why this is safe at runtime

Verification

Out of scope (separate fix candidate)

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant