Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Character Encoding not set as specified #41

Closed
walterdrink opened this issue Nov 3, 2021 · 6 comments · Fixed by #50
Closed

Character Encoding not set as specified #41

walterdrink opened this issue Nov 3, 2021 · 6 comments · Fixed by #50

Comments

@walterdrink
Copy link

Hi,
I can't believe that I am the only one to realize that this is not working, so probably I am doing something wrong:

I am using Notepad++ 8.1.5 (64bit), I have installed the EditorConfig plugin, I have created a .editorconfig file, in my case with this content:

[*]
charset = utf-8
indent_style = space
indent_size = 2

Now when I open another simple text file (no special characters or something) from the same folder, it still is interpreted as ANSI encoding.
When I click on "Show EditorConfig settings for this file" it correctly shows me the content of .editorconfig.

Any ideas what can be wrong?
Walterdrink

@ffes
Copy link
Member

ffes commented Nov 3, 2021

You are right that there is no support for charset at this moment. But note that charset is not mentioned as a Supported properties.

That said, you are right that it should do something with the charset. Notepad++ has support for various character sets.

And what should be done with a document that is in for instance Windows-1252 and the .editorconfig says utf-8? Should existing document be converted? Personally I think it should, just like we do with line endings.

@walterdrink
Copy link
Author

Thanks for your answer, this helps me to know what I can expect for now.

I agree with you, it should be defined what should be done.
When I have a file that "is in" Windows-1252 (which actually means that it is a file that Notepad++ would detect as Windows-1252 if "Autodetect character encoding" is on) but .editorconfig says utf-8, then I see several options:

  1. just take it as Windows-1252 - that does not make sense to me, because then the .editorconfig setting is useless.
  2. convert it to utf-8 - could be a reasonable option but could also mess it up in case that the detection was not correct.
  3. take the file unconverted but interpret it as utf-8 - might be the safest option but might lead to unexpected results in some cases.
  4. let Notepad++ open a dialog where the user can opt for one of the above.

@ffes
Copy link
Member

ffes commented Nov 3, 2021

Dutch is my native language, so I don't run into those encoding problems much. I have never had any problems with Notepad++ detecting the encoding of a document. But as said, Dutch doesn't have much special characters, mainly the regular vowels with accents.

Just setting the encoding will just break documents. Learned the hard way 😉

I am really no fan of the dialog, because that become very annoying very soon.

Based on that I would prefer option 2. There are Convert to... menu entries in Encoding menu. I would just utilize those menu entries from code to let Notepad++ do the heavy lifting, when the document type is not the same as in the .editorconfig.

@walterdrink
Copy link
Author

I see your point and I agree.
You intend to do the conversion when saving the file, not when opening it, right? So your approach makes sense to me.

@pryrt
Copy link

pryrt commented Aug 10, 2022

Based on the discussion in the Notepad++ community, there are users in the wild who would like to at least have the charset attribute available to the Notepad++ EditorConfig plugin.

But further, it would be nice if EditorConfig would recognize the -*- encoding: "IBM850" -*- and # encoding=ibm850 first-or-second-line comments inside the files as well... (I don't know whether this is a plugin-specific request, or would need to be at the core library level; and I don't know whether the EditorConfig ever looks that deeply at the file contents; if it looks at the contents, I would think it would be doable; if it just looks at the filename, then maybe this "further" request goes too far.)

@ffes
Copy link
Member

ffes commented Aug 11, 2022

Supporting -*- encoding is not related to this plugin. It is different functionality. What you are looking for is support for Emacs File Variables.

Funnily enough, it is a concept I have looked at a while ago. I have written a plugin that supports some basic vim modeline settings (a similar concept for vim). On the Linux CLI I am a vim user and have never used Emacs. But that plugin is "designed" to support multiple file settings formats. Adding support for Emacs File Variables should be possible, but I need someone to provide the right information and is willing to test it.

Note that that plugin is never released and I assume I am the only user. It does not support those encoding settings yet, because I don't have a personal need for it but can be done, just like it needs to be done for this editorconfig plugin. If there is a broader interest in my file settings plugin, I will gladly put some effort in it and release it. But in that case open an issue in that repo and let's not "spam" this issue with it too much.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants