Return decoded specifiers and identifiers #61

evanw · 2021-02-19T07:26:30Z

Context: vitejs/vite#2083. Import paths are strings and can therefore contain escape sequences, like this:

import test from "./\u0442\u0435\u0441\u0442";
export default test;

Right now the documentation for this module seems to encourage using this library in a way that doesn't handle any escape sequences. Taking a raw slice of the input means that the backslashes are literal backslashes in the path instead of the actual underlying escape sequences. So in other words you would get the string "./\\u0442\\u0435\\u0441\\u0442" instead of the string "./тест" and then fail to import the path.

This can of course be worked around by interpreting the escape sequences yourself. But I thought I'd file this issue because it could be nice for either this library to do that for you or for there to be a recommended way of doing this, since one reason to use this library is to extract the import paths. If that's out of scope, maybe just calling out that you need to be sure to handle escape sequences yourself when you use this library with a link to the spec?

The text was updated successfully, but these errors were encountered:

guybedford · 2021-02-20T11:02:53Z

Thanks, yes the API of this project could be more polished to avoid these footguns. It was actually originally written for https://github.com/guybedford/es-module-shims hence why it is entirely offset based, with a heavy focus on the overall library footprint.

I've added a new section to the readme in https://github.com/guybedford/es-module-lexer#escape-sequences, I think that should cover the concerns, but further clarifications welcome.

evanw · 2021-02-20T11:16:09Z

For this reason, when handling specifiers or identifiers it can advisable to pass the full string to JSON.parse:
JSON.parse(source.substring(imports[i].s, imports[i].e));

That workaround had occurred to me too. However, it doesn't work for a few reasons:

It's missing the quote characters, so it won't actually work at all. You'd have to do this instead:
```
JSON.parse('"' + source.substring(imports[i].s, imports[i].e) + '"');
```
This is JavaScript, not JSON. They support different escape sequences. Here are some counter-examples that don't work with this approach:
```
import './\x61\x62\x63.js';
import './\u{20204}.js';
```
It will technically break for single-quoted strings containing a double quote. I assume this case would never come up in real code though.

guybedford · 2021-02-20T11:18:47Z

It's missing the quote characters, so it won't actually work at all. You'd have to do this instead:

The .s and .e offset indices include the " quote characters exactly for this reason, so you don't need additions.

This is JavaScript, not JSON. They support different escape sequences.

Both of the examples support JSON.parse in my test.

It will technically break for single-quoted strings containing a double quote

Because of the first point, this case works out fine too.

guybedford · 2021-02-20T11:20:00Z

Because of the first point, this case works out fine too.

Ah, of course this doesn't, will put some thought to this one.

evanw · 2021-02-20T11:32:17Z

Here's my test:

import lexer from 'es-module-lexer';

function tryCatch(fn) {
  console.log(`\n${fn}:`);
  try {
    return fn();
  } catch (err) {
    return err;
  }
}

let js, imp;

js = `import 'abc.js';`;
[[imp]] = await lexer.parse(js);
console.log(tryCatch(() => JSON.parse(js.substring(imp.s, imp.e))));
console.log(tryCatch(() => JSON.parse('"' + js.substring(imp.s, imp.e) + '"')));

js = `import './\\x61\\x62\\x63.js';`;
[[imp]] = await lexer.parse(js);
console.log(tryCatch(() => JSON.parse('"' + js.substring(imp.s, imp.e) + '"')));

js = `import './\\u{20204}.js';`;
[[imp]] = await lexer.parse(js);
console.log(tryCatch(() => JSON.parse('"' + js.substring(imp.s, imp.e) + '"')));

I'm running with node v15.2.0 and es-module-lexer v0.3.26 from npm. This is the output:


() => JSON.parse(js.substring(imp.s, imp.e)):
SyntaxError: Unexpected token a in JSON at position 0
    at JSON.parse (<anonymous>)
    at file:///Users/evan/Desktop/example.mjs:16:33
    at tryCatch (file:///Users/evan/Desktop/example.mjs:6:12)
    at file:///Users/evan/Desktop/example.mjs:16:13

() => JSON.parse('"' + js.substring(imp.s, imp.e) + '"'):
abc.js

() => JSON.parse('"' + js.substring(imp.s, imp.e) + '"'):
SyntaxError: Unexpected token x in JSON at position 4
    at JSON.parse (<anonymous>)
    at file:///Users/evan/Desktop/example.mjs:21:33
    at tryCatch (file:///Users/evan/Desktop/example.mjs:6:12)
    at file:///Users/evan/Desktop/example.mjs:21:13

() => JSON.parse('"' + js.substring(imp.s, imp.e) + '"'):
SyntaxError: Unexpected token { in JSON at position 5
    at JSON.parse (<anonymous>)
    at file:///Users/evan/Desktop/example.mjs:25:33
    at tryCatch (file:///Users/evan/Desktop/example.mjs:6:12)
    at file:///Users/evan/Desktop/example.mjs:25:13

Edit: I think indirect eval should work fine though:

import lexer from 'es-module-lexer';

function tryCatch(fn) {
  console.log(`\n${fn}:`);
  try {
    return fn();
  } catch (err) {
    return err;
  }
}

let js, imp;

js = `import 'abc.js';`;
[[imp]] = await lexer.parse(js);
console.log(tryCatch(() => (0, eval)(js.substring(imp.s - 1, imp.e + 1))));

js = `import './\\x61\\x62\\x63.js';`;
[[imp]] = await lexer.parse(js);
console.log(tryCatch(() => (0, eval)(js.substring(imp.s - 1, imp.e + 1))));

js = `import './\\u{20204}.js';`;
[[imp]] = await lexer.parse(js);
console.log(tryCatch(() => (0, eval)(js.substring(imp.s - 1, imp.e + 1))));

js = `import './".js';`;
[[imp]] = await lexer.parse(js);
console.log(tryCatch(() => (0, eval)(js.substring(imp.s - 1, imp.e + 1))));

The output:

() => (0, eval)(js.substring(imp.s - 1, imp.e + 1)):
abc.js

() => (0, eval)(js.substring(imp.s - 1, imp.e + 1)):
./abc.js

() => (0, eval)(js.substring(imp.s - 1, imp.e + 1)):
./𠈄.js

() => (0, eval)(js.substring(imp.s - 1, imp.e + 1)):
./".js

guybedford · 2021-02-20T12:06:42Z

Ok, I've put together a PR in #62. Let me know if that seems like it can work here.

evanw · 2021-02-20T12:20:21Z

Yeah that approach seems good to me.

guybedford · 2021-02-20T12:41:59Z

Released in 0.4.0.

evanw · 2021-02-20T12:48:55Z

I can confirm the fix. Looks good!

evanw mentioned this issue Feb 19, 2021

Broken import with unicode path evanw/esbuild#841

Closed

guybedford changed the title ~~Recommended way to handle Unicode text?~~ Return decoded specifiers and identifiers Feb 20, 2021

guybedford closed this as completed Feb 20, 2021

guybedford mentioned this issue Mar 4, 2021

Handle specifier encodings jspm/rust-lexer#2

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Return decoded specifiers and identifiers #61

Return decoded specifiers and identifiers #61

evanw commented Feb 19, 2021

guybedford commented Feb 20, 2021

evanw commented Feb 20, 2021

guybedford commented Feb 20, 2021

guybedford commented Feb 20, 2021

evanw commented Feb 20, 2021 •

edited

Loading

guybedford commented Feb 20, 2021

evanw commented Feb 20, 2021

guybedford commented Feb 20, 2021

evanw commented Feb 20, 2021

Return decoded specifiers and identifiers #61

Return decoded specifiers and identifiers #61

Comments

evanw commented Feb 19, 2021

guybedford commented Feb 20, 2021

evanw commented Feb 20, 2021

guybedford commented Feb 20, 2021

guybedford commented Feb 20, 2021

evanw commented Feb 20, 2021 • edited Loading

guybedford commented Feb 20, 2021

evanw commented Feb 20, 2021

guybedford commented Feb 20, 2021

evanw commented Feb 20, 2021

evanw commented Feb 20, 2021 •

edited

Loading