New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faulty wide char conversion on Windows #733
Comments
The functions are supposed to covert to and from UTF8, and I think I would have tested them when I wrote them for at least some non-standard characters. iPlug2 strings in general should be UTF8. Can you say more about how you are inspecting the file name, as well as the setup of the system you are on? The fix suggested doesn't seem like the correct approach to me, but it would be good to know more and see if we can figure out what is going on here. |
Hmm, maybe I missed something here, sorry. |
First, there is nothing wrong with the current UTF8ToUTF16/UTF16ToUTF8 implementation! But currently we cannot use it in this way as there are some things to consider which you may comment:
So I think we first need to switch our file i/o from char to wchar_t and then force UTF8 instead of ANSI CP. |
For Mac you can use fopen without issue. You have a couple of options here. One is simply to do your own conversion from UTF8 to ANSI CP on windows before you open. Another is to wrap your file reading routines for each platform to handle different routes more generally. I have a library for plugins that does this and I use std::ifstream and std::ofstream. Files are opened with wide (16 byte) strings on windows post conversion and simply with the UTF8 path on Mac. Obviously, UTF8ToUTF16 can be used for the conversion when you need to do it. |
BTW - WDL_String is part of WDL, rather than plug2 so we aren't likely to add comments there - not sure if the UTF8 thing is documented anywhere - @olilarkin? |
Thank you Alex. BTW: As WDL_String misses wchar_t support we created a small class to do the conversion in a smart way:
Maybe someone likes it. |
Seems to be my fault, sorry. Works as expected :-) [One more question: What is the correct way to show "ä","ö","ü" with g.DrawText(...). BTW: IPLUG2 popup menu seems to have problems to handle "ä","ö","ü", system popup shows it correctly ....] |
Hi Alex, After some test on Mac I found a curious issue (for me): in some strings (char *) a special character is encoded with 2 bytes, in some 3 bytes are used. In the debugger both look the same, but have different length. Do you know how to detect the 2 byte or 3 byte scenario? And how to convert properly? |
It is usual for UTF8 to use a variable number of bytes for encoding characters (between 1 and 4) - this should happen on all platforms. |
Thanks, complicated stuff :-) |
Which backends and platforms? |
Sorry, I forgot: Mac, nanovg, e.g. APP (but I guess the other plug formats as well) |
one more note: |
Are you able to test with skia?, and (just for sanity) a different control (one that just draws a problem string would be fine). |
BTW - I've just tried putting the letters into the text in iPlugEffect and everything draws correctly with NanoVG, so there's some step we've not got the same - this could be control-specific or to do with how the string is being generated. FWIW as far as I can tell ä should be encoded as 2 bytes in UTF8. |
yes, I will try with skia. Just to sum up (all for Mac, nanovg): |
yes, UTF8 2 bytes works |
Yes - I've managed to confirm that here now - this seems like a potential bug, although I guess we should also check that the font you have supports the character in question - which might involve looking in a font editor. |
I get a space for a 3-byte char BTW, rather than "garbage" - hence my question about missing characters. |
correct, a space is drawn. This is why I asked if there is a method available on Mac to convert 3 bytes special characters to 2 bytes ... |
ö should not be a three byte encoding - it should have 2 bytes only: https://www.compart.com/en/unicode/U+00F6 Each Unicode encoding should (as far as I know) be unique. |
Looks possible that the umlaut might been separately encoded in your example as a combining diaerisis (2 bytes): https://www.compart.com/en/unicode/U+0308 and a lowercase o (1 byte). I tried that here and I also get a space, but it's intriguing that it is encoded that way, rather than as the unicode point I linked above. |
Windows only!
Functions UTF8ToUTF16 and UTF16ToUTF8 seem to produce garbage if language specific characters like "ä","ö","ü" etc are involved.
Currently the Windows functions MultiByteToWideChar and WideCharToMultiByte are called with the parameter "CP_UTF8" which seems to be the problem. Using "CP_ACP" instead seems to fix the issue.
All Windows based formats are affected (vst2, vst3, aax).
To Reproduce:
Make a folder with with the name "Öffnen" (engl. open) and create a file in this folder. Call PromptForFile and open this file. Full file name shows garbage.
The text was updated successfully, but these errors were encountered: