Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

encoding: make TextDecoder handle BOM correctly #30132

Closed
wants to merge 1 commit into from

Conversation

@addaleax
Copy link
Member

addaleax commented Oct 26, 2019

Do not accept the BOM if it comes from a different encoding, and
only discard the BOM after it has actually been read (including
when it is spread over multiple chunks in streaming mode).

Fixes: #25315

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • tests and/or benchmarks are included
  • commit message follows commit guidelines
@nodejs-github-bot

This comment has been minimized.

Do not accept the BOM if it comes from a different encoding, and
only discard the BOM after it has actually been read (including
when it is spread over multiple chunks in streaming mode).

Fixes: #25315
@addaleax addaleax force-pushed the addaleax:textdecoder-bom branch from b6414f5 to a30272b Nov 1, 2019
@nodejs-github-bot

This comment has been minimized.

@nodejs-github-bot

This comment has been minimized.

@devsnek
devsnek approved these changes Nov 4, 2019
@addaleax addaleax added author ready and removed review wanted labels Nov 4, 2019
@addaleax

This comment has been minimized.

Copy link
Member Author

addaleax commented Nov 5, 2019

Landed in 237be2e

@addaleax addaleax closed this Nov 5, 2019
addaleax added a commit that referenced this pull request Nov 5, 2019
Do not accept the BOM if it comes from a different encoding, and
only discard the BOM after it has actually been read (including
when it is spread over multiple chunks in streaming mode).

Fixes: #25315

PR-URL: #30132
Reviewed-By: Gus Caplan <me@gus.host>
@addaleax addaleax deleted the addaleax:textdecoder-bom branch Nov 5, 2019
Copy link
Member

srl295 left a comment

missed this discussion. Code looks OK, I'm not familiar with this spec. But shouldn't ICU do this in ucnv_toUnicode?

@addaleax

This comment has been minimized.

Copy link
Member Author

addaleax commented Nov 5, 2019

@srl295 I don’t know, should it drop a BOM at the beginning of the text automatically? The current behaviour is that it doesn’t. Unless maybe there’s a flag in there that we could use and that I don’t know about? (And we do also at least need one way to make it not drop a BOM at the beginning of the text.)

@srl295

This comment has been minimized.

Copy link
Member

srl295 commented Nov 5, 2019

@addaleax

This comment has been minimized.

Copy link
Member Author

addaleax commented Nov 5, 2019

@srl295 I really have no idea whether ICU should or should not support automatic BOM-dropping or where to look up if it already does (other than that the implementation doesn’t look to me like it does), so I don’t feel qualified to open an issue and explain Node’s needs.

All I know is that it’s relatively easy for Node.js to work with what ICU currently provides.

@srl295

This comment has been minimized.

Copy link
Member

srl295 commented Nov 5, 2019

@srl295 I really have no idea whether ICU should or should not support automatic BOM-dropping or where to look up if it already does (other than that the implementation doesn’t look to me like it does), so I don’t feel qualified to open an issue and explain Node’s needs.

All I know is that it’s relatively easy for Node.js to work with what ICU currently provides.

OK. I'll file an issue then.

MylesBorins added a commit that referenced this pull request Nov 17, 2019
Do not accept the BOM if it comes from a different encoding, and
only discard the BOM after it has actually been read (including
when it is spread over multiple chunks in streaming mode).

Fixes: #25315

PR-URL: #30132
Reviewed-By: Gus Caplan <me@gus.host>
@BridgeAR BridgeAR mentioned this pull request Nov 19, 2019
targos added a commit that referenced this pull request Dec 1, 2019
Do not accept the BOM if it comes from a different encoding, and
only discard the BOM after it has actually been read (including
when it is spread over multiple chunks in streaming mode).

Fixes: #25315

PR-URL: #30132
Reviewed-By: Gus Caplan <me@gus.host>
@BethGriggs BethGriggs mentioned this pull request Dec 9, 2019
MylesBorins added a commit that referenced this pull request Dec 17, 2019
Do not accept the BOM if it comes from a different encoding, and
only discard the BOM after it has actually been read (including
when it is spread over multiple chunks in streaming mode).

Fixes: #25315

PR-URL: #30132
Reviewed-By: Gus Caplan <me@gus.host>
@BethGriggs BethGriggs mentioned this pull request Dec 23, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.