Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't load .cs files containing "special" characters (like ç è ì) in comments or strings. #27083

Open
guineapenguin opened this issue Mar 15, 2019 · 19 comments

Comments

@guineapenguin
Copy link

Godot version:
3.1 Stable Mono (at least since 3.1 Beta 3)
Official builds, x64 versions.

OS/device including version:
Windows 7 Ultimate x64

Issue description:
Can't load .cs files containing "special" characters in comments or strings.
Example of such characters: ç è ì.

Unicode error: invalid utf8
modules/mono/utils/string_utils.cpp:181 - Method/Function Failed, returning: ERR_INVALID_DATA
Script 'res://Button03.cs' contains invalid unicode (utf-8), so it was not loaded. Please ensure that scripts are saved in valid utf-8 unicode.
modules/mono/csharp_script.cpp:2898 - Condition ' err != OK ' is true. returned: RES()
Failed loading resource: res://Button03.cs
scene/resources/resource_format_text.cpp:175 - Couldn't load external resource: res://Button03.cs
editor/editor_data.cpp:564 - Index p_idx=1 out of size (edited_scene.size()=1)

I'm using Visual Studio 2017 with default settings for encoding, those characters are valid UTF-8 (even ANSI).

Steps to reproduce:
Open any .cs script,
add any of those characters to a string or a comment,
try to build/run the game or load the scene containing said script.

Minimal reproduction project:

@karliss
Copy link
Contributor

karliss commented Mar 16, 2019

Could you attach an example containing one of your problematic source files so that someone can verify if it really is valid UTF-8 or not.

@guineapenguin
Copy link
Author

Sorry for the late reply, I've been trying to understand what was happening, because when I tried using an empty project in order to create those files, I was getting different results from before.
Short version: those files weren't UTF-8, but I'm leaving this open to let you guys decide if it's something to look at.
The reason why they weren't UTF-8 is this:
When you create a new .cs file in Godot, it doesn't save it as UTF-8, it only does so if you add any non ASCII character, even if said character is still within CP-1252, like the ones I've listed in the OP.
When you open a file with VS, it does leave it with the encoding it already has, unless it needs to change it. By default, it uses UTF-8 with signature.
You can set the encoding for all documents in a project to UTF-8 (or whatever), but it turns out it does apply that only to already existing files and to new ones created in VS itself.
New ones created in Godot are still non-UTF-8 by default, so VS will leave them as such, even after adding characters like the ones mentioned earlier, because they don't require UTF-8.
Then Godot complains because it wants files containing such characters to be encoded with UTF-8.

@akien-mga akien-mga added this to the 3.2 milestone Mar 16, 2019
@karliss
Copy link
Contributor

karliss commented Mar 16, 2019

Saying that Godot doesn't save file as UTF-8 as unless it contains non ASCII characters isn't quite right. As file containing only ASCII characters is a valid UTF-8 file. Problem is that VS doesn't like UTF-8 files without BOM too much and it assumes that they are in system encoding, at least it used to do so probably for backwards compatibility reasons. That is somewhat annoying as most(from personal experience) other code editors just assume that file is in UTF-8 and use of UTF8 with BOM is discouraged.

For mixed Godot/VS workflow you can try VS extension like this

@guineapenguin
Copy link
Author

Thank you, karliss!
Apologies for making the wrong assumption, I've always thought a file's econding was "explicit".
I'm leaving this open in case it's still considered something to look into.

@Calinou
Copy link
Member

Calinou commented Jan 14, 2020

Is there anything we can do to ease the loading of UTF-8 files with BOM?

Edit: It seems String::parse_utf8() is supposed to handle BOM after all…

@akien-mga akien-mga modified the milestones: 3.2, 4.0 Jan 15, 2020
@evan-boissonnot
Copy link

Hi, about this error, when I use VS Code, it detects UTF-8.
VS 2019 not.
So I use VS Code to detect theses problems :)

@raulsntos
Copy link
Member

Can anyone still reproduce this issue in Godot 3.5 or any later release?

I was unable to reproduce this in Linux using VSCode to create a file with UTF8-BOM encoding.

@RobTF
Copy link

RobTF commented Nov 12, 2022

Hi, I've reproduced it independently and found this issue whilst I was checking before posting my own. Affected demo project attached.

This project was built in Godot 4 Beta 4. I'm running Windows 11, the .cs file was written using VS2022 Pro.

ErrorTest.zip

The game opens a window but immediately closes and no game starts. If I remove the special character (© in this case) the game launches normally.

@RobTF
Copy link

RobTF commented Nov 12, 2022

Oh, ouch - I just re-opened my demo project and the problem manifests differently. If you open the demo project I supplied in Godot, it actually reports that Game.cs (the only C# file in the project) is missing when in fact it's not. If you remove the © character from the file and re-open the project it works.

So something in the guts of Godot is simply ignoring the C# file entirely.

@RobTF
Copy link

RobTF commented Nov 12, 2022

Final update -> re-saving the file in VS using Save As -> with encoding -> UTF-8 (65001) seems to fix the file. This ties in with the comments above.

@raulsntos
Copy link
Member

raulsntos commented Nov 12, 2022

@RobTF I tried your project, this is the console output:

Unicode parsing error, some characters were replaced with spaces: Invalid UTF-8 leading byte (a9)
ERROR: Method/function failed. Returning: ERR_INVALID_DATA
   at: read_all_file_utf8 (modules/mono/utils/string_utils.cpp:163)
ERROR: Script 'res://Game.cs' contains invalid unicode (UTF-8), so it was not loaded. Please ensure that scripts are saved in valid UTF-8 unicode.
   at: load_source_code (modules/mono/csharp_script.cpp:2637)
ERROR: Cannot load C# script file 'res://Game.cs'.
   at: load (modules/mono/csharp_script.cpp:2707)
ERROR: Failed loading resource: res://Game.cs. Make sure resources have been imported by opening the project in the editor at least once.
   at: _load (core/io/resource_loader.cpp:227)
ERROR: res://game.tscn:3 - Parse Error: [ext_resource] referenced nonexistent resource at: res://Game.cs
   at: load (scene/resources/resource_format_text.cpp:493)
ERROR: Failed loading resource: res://game.tscn. Make sure resources have been imported by opening the project in the editor at least once.
   at: _load (core/io/resource_loader.cpp:227)
ERROR: Failed loading scene: res://game.tscn
   at: start (main/main.cpp:2941)

So something in the guts of Godot is simply ignoring the C# file entirely.

Yes, as described by the error the file was not loaded because it contains invalid UTF-8 characters so later when it tries to use it in the scene it can't find it.

re-saving the file in VS using Save As -> with encoding -> UTF-8 (65001) seems to fix the file

Files should be encoded in UTF-8, if your file doesn't contain valid UTF-8 characters I think failing to load the file is expected and not a bug. Not sure why VS2022 Pro would write invalid characters to a UTF-8 encoded file though, could be you had selected a different encoding? VSCode detects the file encoding as UTF-8 though.

@RobTF
Copy link

RobTF commented Nov 12, 2022

Ah got it, so I think the issue is that I didn't create the .cs file; Godot did. I selected the node and did Attach Script, selected C# etc. and it created the .cs file for me in what appears to be a Windows character set. If I manually create the .cs file instead of having Godot use a template everything is fine.

By default, VS seems to maintain the character set it finds when it opens the file, but I can use VS to "fix" a Godot generated .cs file by forcing it to save specifically as UTF-8.

@Zireael07
Copy link
Contributor

Hmm. Why does Godot not use UTF-8 when creating the file then?

@RobTF
Copy link

RobTF commented Nov 12, 2022

I think that's the crux of it, at least it appears to be on my end. I get the feeling that full-fat Visual Studio handles these non-UTF8 encodings perhaps more completely/correctly than Godot or VSCode; maybe if you open the Godot generated .cs file in one of the latter two and re-save it, it silently converts to UTF-8 as that's all the tool understands which in turn makes the problem less clear to some users.

This also might be a Windows only problem, as the .cs files Godot creates appear to be in encoding Western European (Windows) - Codepage 1252 which I doubt MacOS or Linux would use as any sort of default.

@CordellierPaul
Copy link

If I manually create the .cs file instead of having Godot use a template everything is fine.

If I create the file with VS 2022, it works, thanks RobTF!

@Novido
Copy link

Novido commented Jan 26, 2024

I had this issue in Godot 4.2 and VS 2022. I created a .cs file from inside godot and when I used swedish special characters in a comment in the code, godot could no longer load the script file nor the scene it was attached to.

@bruvzg
Copy link
Member

bruvzg commented Jan 26, 2024

Why does Godot not use UTF-8 when creating the file then?

Godot should always use UTF-8, but it's possible that VS expect byte order mark to identify UTF-8 file and (Godot is not adding it).

@bruvzg
Copy link
Member

bruvzg commented Jan 26, 2024

Yep, VS 2022 still assume file without BOM and any Unicode characters is 1252, and saving it as 1252 if you add characters from this encoding. At least it's asking to convert it to UTF-8 if you add other characters and seems to detect files with Unicode characters in it. But we probably should add BOM, Godot (and most of the other software) do not care if it's there, so it should not cause any issues.

Adding it for .cs files specifically should be easy, something like:

diff --git a/modules/mono/csharp_script.cpp b/modules/mono/csharp_script.cpp
index 33fef2d58c..21826f7b3f 100644
--- a/modules/mono/csharp_script.cpp
+++ b/modules/mono/csharp_script.cpp
@@ -2950,6 +2950,9 @@ Error ResourceFormatSaverCSharpScript::save(const Ref<Resource> &p_resource, con
 		Ref<FileAccess> file = FileAccess::open(p_path, FileAccess::WRITE, &err);
 		ERR_FAIL_COND_V_MSG(err != OK, err, "Cannot save C# script file '" + p_path + "'.");
 
+		file->store_8(0xEF); // Store UTF-8 BOM.
+		file->store_8(0xBB);
+		file->store_8(0xBF);
 		file->store_string(source);
 
 		if (file->get_error() != OK && file->get_error() != ERR_FILE_EOF) {

@Xiphereal
Copy link

This also might be a Windows only problem, as the .cs files Godot creates appear to be in encoding Western European (Windows) - Codepage 1252 which I doubt MacOS or Linux would use as any sort of default.

Final update -> re-saving the file in VS using Save As -> with encoding -> UTF-8 (65001) seems to fix the file. This ties in with the comments above.

Hi there! Here to say that this was exactly my problem also, and was solved exactly as mentioned :). I don't know nothing to this matter, but I suppose it's not as easy as making Godot forcing the conversion of scripts to UTF-8 :/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests