Skip to content

codegen --deno-esm: non-ASCII string literals lower to octal escapes, blocked by strict-mode ESM #460

@hyperpolymath

Description

@hyperpolymath

Surfaced by

hyperpolymath/standards#284.

Behaviour

```affine
fn label(): String { return "❌ FAIL" } // U+274C CROSS MARK
```

Compiled with `affinescript compile --deno-esm`, the literal emits as:

```js
return "\\226\\157\\140 FAIL";
```

That's a UTF-8 byte sequence rendered as octal escapes. Running the output with `deno run` (strict-mode ESM by default) fails:

```
SyntaxError: Octal escape sequences are not allowed in strict mode.
```

Why this hurts

ESM is strict-by-default. Every non-ASCII character in a string literal (emoji, accented Latin, CJK, etc.) blocks the Deno build with no in-band hint that the offender is the codegen pass, not the source. standards#284 worked around it by replacing emoji status indicators with ASCII `[FAIL]` / `[OK]`.

Proposed fix

Lower non-ASCII to `\\uXXXX` (UCS-2) or `\\u{XXXXX}` (full Unicode escape) instead of octal-per-byte:

```js
// Current (broken under strict mode)
"\\226\\157\\140 FAIL"

// Proposed
"\\u274C FAIL"
// or for code points > U+FFFF:
"\\u{1F602} sob"
```

Probably one diff in the string-literal emitter inside `lib/codegen_*.ml` (whichever file owns ESM emission for string nodes).

Acceptance

  • `"❌"` source -> Deno-run output containing the original character
  • BMP code points emit as `\\uXXXX`
  • Non-BMP code points emit as `\\u{XXXXX}`
  • Regression test in golden snapshot covering at least one of each class
  • `deno run` execution test of the emitted file (not just type-check)

Refs

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions