Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #36951: Add the "file.encoding_candidate" attribute for selecting encoding to open file. #117053

Closed
wants to merge 5 commits into from

Conversation

a45s67
Copy link

@a45s67 a45s67 commented Feb 19, 2021

TLDR

I try to solve #36951 in the way like "set fileencodings= xxx,xxx" in vim.


How to use :

If we want to try to open a file in the order of "utf-8", "big5", "...(some codepages you want)".
In the setting.json, just add -

file.encoding_candidate: [ "utf-8", "big5", "....(some codepages you want)"]

Then the editor will try to open file with "utf-8", if it failed, try "big5", and so on.

Note :

I use "TextDecoder" to check if the file can be opened.
Thus please check that the encoding you want to use exists in "TextDecoder" before use.

Priority:

auto detected encoding < file.encoding_candidate


For #36951.

@ghost
Copy link

ghost commented Feb 19, 2021

CLA assistant check
All CLA requirements met.

@bpasero bpasero added this to the Backlog milestone Feb 19, 2021
@bpasero bpasero marked this pull request as draft February 19, 2021 16:30
@bpasero
Copy link
Member

bpasero commented Feb 19, 2021

Thanks for the PR, a little bit on the expectation: I will probably not have time to review this until we get to our yearly issue grooming iteration in October.

@a45s67 a45s67 marked this pull request as ready for review February 20, 2021 18:56
@bpasero
Copy link
Member

bpasero commented Sep 18, 2021

This seems to be building around the fatal flag on TextDecoder (which I was not aware of). But are we sure this works as we think it does, signalling whether the encoding is valid for a buffer?

https://encoding.spec.whatwg.org/#error-mode

@bpasero bpasero added the info-needed Issue requires more information from poster label Sep 18, 2021
@bpasero
Copy link
Member

bpasero commented Sep 18, 2021

Besides, we cannot really decode the entire document at once just to check on error for performance/memory reasons and we cannot decode a chunk because we don't really know the byte size...

@bpasero
Copy link
Member

bpasero commented Sep 18, 2021

Also, TextDecoder with an encoding other than UTF-8 will not work in browsers, so I am inclined to close this PR.

@a45s67
Copy link
Author

a45s67 commented Oct 1, 2021

Thanks @bpasero to give me some comments!

Yes, I did not consider such details you mentioned (fatal flag, byte size) when implementing this feature.
Some code of my commits must make some changes to be more reliable.

To make it work in browsers, maybe I will try to find other text decoder instead of TextDecoder.

But is there a way to check the encoding of a file without decoding all its content? I thought it is necessary to get the accurate result.

@github-actions github-actions bot locked and limited conversation to collaborators Nov 10, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
info-needed Issue requires more information from poster
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants