Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong guess encoding as Windows 1252 #33720

Closed
maooyer opened this issue Sep 2, 2017 · 54 comments · Fixed by #101489
Closed

Wrong guess encoding as Windows 1252 #33720

maooyer opened this issue Sep 2, 2017 · 54 comments · Fixed by #101489
Assignees
Labels
insiders-released Patch has been released in VS Code Insiders on-release-notes Issue/pull request mentioned in release notes upstream Issue identified as 'upstream' component related (exists outside of VS Code) upstream-issue-linked This is an upstream issue that has been reported upstream verification-needed Verification of issue is requested verified Verification succeeded
Milestone

Comments

@maooyer
Copy link

maooyer commented Sep 2, 2017

Upstream issue: aadsm/jschardet#48

  • VSCode Version: Version 1.15.1
  • OS Version: Windows 10.0.15063

Steps to Reproduce:

  1. The settings.json of my vscode
    "files.encoding": "utf8",
    "files.autoGuessEncoding": true,
  1. Ceate two txt files, make sure the files are saved as utf-8

test1.txt

Created on: 2017年9月2日
测

test2.txt

Created on: 2017年9月2日
测试
  1. Reopen the files,test1.txt guessed encoding is Windows 1252 and test2.txt guessed encoding is utf-8.

Reproduces without extensions: Yes

@bpasero bpasero added this to the Backlog milestone Sep 4, 2017
@bpasero bpasero added upstream Issue identified as 'upstream' component related (exists outside of VS Code) file-explorer Explorer widget issues labels Sep 4, 2017
@rodolfomuller
Copy link

I have same problem with utf8 and iso88591.

@bpasero bpasero added the bug Issue identified by VS Code Team member as probable bug label Sep 25, 2017
@sou-lab
Copy link

sou-lab commented Oct 5, 2017

I had same problem.
However, it was maybe fixed in 1.17.

Please try Insiders.
https://code.visualstudio.com/insiders/

@codecrafting-io
Copy link

Well I tried on 1.17 and still happens. On my case a blank txt file with some word with accents even saving with UTF-8 still reopens with Western 1252 or ISO 8859-2.

@sou-lab
Copy link

sou-lab commented Oct 6, 2017

Sorry. not fixed by 1.17.0.
I still happens too😢
but, Insiders guessed encoding is UTF-8.

@bpasero bpasero added file-io File I/O file-encoding File encoding type issues and removed file-explorer Explorer widget issues file-io File I/O workbench labels Nov 13, 2017
@bpasero bpasero removed this from the Backlog milestone Nov 16, 2017
@Yanpas
Copy link
Contributor

Yanpas commented May 19, 2018

Still not fixed in 1.23:
Test case

#!/bin/sh

foo() {
	echo "starting …"
}

Ellipsis symbol makes vscode guess cp1252

@std4453
Copy link

std4453 commented Jun 14, 2018

Any updates? I'm still getting this issue today.

@Yanpas
Copy link
Contributor

Yanpas commented Jun 19, 2018

Could the fix in this issue #23997 lead to regression?

@Yanpas
Copy link
Contributor

Yanpas commented Jun 19, 2018

The code from latin1prober.js:113 looks very suspicious

    this.getCharsetName = function() {
        return "windows-1252";
    }

It's the cause of the problem.

For those of you who are interested in debugging here is a snippet:

const detect = require("./init").detect
const fs = require('fs')
let args = []
process.argv.forEach(v => args.push(v))
let fname = args[2]
let buf = fs.readFileSync(fname)

console.log(detect(buf))

Place it to src/tst.js. To run call node tst.js somefile from src folder

@chylex
Copy link

chylex commented Aug 6, 2018

Same issue here... autoGuessEncoding is disabled and yet VS Code still attempts to guess encoding as Windows-1252 even though that causes invalid characters because it's actually UTF-8...

@elmonty
Copy link

elmonty commented Oct 10, 2018

It happens to me when there is a copyright symbol in the file. VSCode incorrectly guesses Windows-1252, which shows an invalid character next to the copyright symbol.

@MxDany
Copy link

MxDany commented Nov 28, 2019

omg, Hard to believe that this problem actually existed for so long, 1.40.2 Still not fixed...

@JulioNobre
Copy link

How to upvote?

@codecrafting-io
Copy link

codecrafting-io commented Dec 17, 2019

@JulioNobre I get that may be ambiguous to identify the exact encoding, but something has to be done, even that does not involve a Microsoft code directly. Try to create a blank txt file with the Windows-1252 encoding and write the word "coração". Now open the file, and you still see that even something aparently simple and created by Code, the guessed encoding still wrong. I tried to simulate this on multiple Text Editors, and no one opened the file with wrong encoding. I also testd with ISO 8859-1, same issue. So this is a problem, because if the application offers the options to save with multiple encoding, it should at least open the file created with the same encoding, otherwise don't offer certain encoding options.

@cv0cv0
Copy link

cv0cv0 commented Dec 17, 2019

VSCode 1.41.0 Still not fixed...

@deadbaed
Copy link

still not fixed, version 1.42.1

@deadbaed
Copy link

although i got it working with a workaround:

in my settings.json file i disabled guessing the file encoding and i force files to be encoded in utf-8:

    "files.encoding": "utf8",
    "files.autoGuessEncoding": false

@ghost
Copy link

ghost commented Mar 21, 2020

although i got it working with a workaround:

in my settings.json file i disabled guessing the file encoding and i force files to be encoded in utf-8:

    "files.encoding": "utf8",
    "files.autoGuessEncoding": false

"files.autoGuessEncoding" can also be disabled per language, which mitigates the problem. It would be helpful to be able to do per file extension.

@savioret
Copy link

although i got it working with a workaround:

in my settings.json file i disabled guessing the file encoding and i force files to be encoded in utf-8:

This only has sense if your whole project is encoded as UTF-8
autoGuessEncoding is precisely useful for projects having mixed encoded files

@std4453
Copy link

std4453 commented Apr 12, 2020

Still experiencing this problem.
Forced to reopen with UTF-8 every time I open the file.
At least it should try to save my choices.
Annoyed.

@haba713
Copy link

haba713 commented May 17, 2020

People, on the 29th of May we have had this issue open for 1000 days.

Celebrate... 🍰

@hmcoder-zz
Copy link

1002 today..

@lingsamuel
Copy link

lingsamuel commented Jun 4, 2020

This issue originally describes this bug: aadsm/jschardet#56, which should be fixed by aadsm/jschardet#57 and aadsm/jschardet#59.

@lingsamuel
Copy link

I think this newest release solves this issue: https://github.com/aadsm/jschardet/releases/tag/v2.2.1

@bpasero

@bpasero
Copy link
Member

bpasero commented Jun 30, 2020

We can pick up a new version for July, as we are currently closing for June endgame.

@bpasero bpasero added this to the July 2020 milestone Jul 1, 2020
bpasero added a commit that referenced this issue Jul 1, 2020
@bpasero bpasero added the verification-needed Verification of issue is requested label Jul 3, 2020
@bpasero bpasero added the on-release-notes Issue/pull request mentioned in release notes label Jul 15, 2020
@alexr00 alexr00 added the verified Verification succeeded label Aug 5, 2020
@github-actions github-actions bot locked and limited conversation to collaborators Aug 17, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
insiders-released Patch has been released in VS Code Insiders on-release-notes Issue/pull request mentioned in release notes upstream Issue identified as 'upstream' component related (exists outside of VS Code) upstream-issue-linked This is an upstream issue that has been reported upstream verification-needed Verification of issue is requested verified Verification succeeded
Projects
None yet
Development

Successfully merging a pull request may close this issue.