Faulty wide char conversion on Windows #733

TBProAudio · 2021-04-18T07:03:20Z

Windows only!

Functions UTF8ToUTF16 and UTF16ToUTF8 seem to produce garbage if language specific characters like "ä","ö","ü" etc are involved.
Currently the Windows functions MultiByteToWideChar and WideCharToMultiByte are called with the parameter "CP_UTF8" which seems to be the problem. Using "CP_ACP" instead seems to fix the issue.

All Windows based formats are affected (vst2, vst3, aax).

To Reproduce:
Make a folder with with the name "Öffnen" (engl. open) and create a file in this folder. Call PromptForFile and open this file. Full file name shows garbage.

AlexHarker · 2021-04-18T18:48:27Z

The functions are supposed to covert to and from UTF8, and I think I would have tested them when I wrote them for at least some non-standard characters. iPlug2 strings in general should be UTF8.

Can you say more about how you are inspecting the file name, as well as the setup of the system you are on? The fix suggested doesn't seem like the correct approach to me, but it would be good to know more and see if we can figure out what is going on here.

TBProAudio · 2021-04-19T05:18:04Z

Hmm, maybe I missed something here, sorry.
Maybe I have overlooked the fact that IPLUG2 is now fully UTF8, which means that old libraries/code using fopen/fread/write need more attention. I need to elaborate this much more ...

TBProAudio · 2021-04-19T08:34:47Z

First, there is nothing wrong with the current UTF8ToUTF16/UTF16ToUTF8 implementation!

But currently we cannot use it in this way as there are some things to consider which you may comment:

Currently we use fopen under Win/Mac to open files. _wfopen seems to be missing under Mac
WDL_String seems to miss wchar_t interfaces (e.g. SetFormated)

So I think we first need to switch our file i/o from char to wchar_t and then force UTF8 instead of ANSI CP.

AlexHarker · 2021-04-19T11:49:40Z

For Mac you can use fopen without issue. You have a couple of options here. One is simply to do your own conversion from UTF8 to ANSI CP on windows before you open. Another is to wrap your file reading routines for each platform to handle different routes more generally. I have a library for plugins that does this and I use std::ifstream and std::ofstream. Files are opened with wide (16 byte) strings on windows post conversion and simply with the UTF8 path on Mac.

Obviously, UTF8ToUTF16 can be used for the conversion when you need to do it.

AlexHarker · 2021-04-19T11:50:35Z

BTW - WDL_String is part of WDL, rather than plug2 so we aren't likely to add comments there - not sure if the UTF8 thing is documented anywhere - @olilarkin?

TBProAudio · 2021-04-19T14:38:01Z

Thank you Alex.
So in a first step we enabled wchar_t for all windows based file io, but still CP_ACP. As soon all file i/o supports wchar_t we can switch to UTF8.

BTW: As WDL_String misses wchar_t support we created a small class to do the conversion in a smart way:

class cwchar_t
{
public:
	cwchar_t(WDL_String str) : m_wc(NULL)
	{
		int str_len = str.GetLength() * sizeof(wchar_t) + 1;
		m_wc = new wchar_t[str_len];
		UTF8ToUTF16(m_wc, str.Get(), str_len);
	}

	cwchar_t(const WDL_String* str) : m_wc(NULL)
	{
		int str_len = str->GetLength() * sizeof(wchar_t) + 1;
		m_wc = new wchar_t[str_len];
		UTF8ToUTF16(m_wc, str->Get(), str_len);
	}

	~cwchar_t()
	{
		if (m_wc)
		{
			delete[] m_wc;
		}
	}

public:
	operator const wchar_t* () const { return m_wc; }

private:
	wchar_t* m_wc;
};

Maybe someone likes it.

TBProAudio · 2021-04-20T08:58:47Z

Seems to be my fault, sorry. Works as expected :-)

[One more question:

What is the correct way to show "ä","ö","ü" with g.DrawText(...).

BTW: IPLUG2 popup menu seems to have problems to handle "ä","ö","ü", system popup shows it correctly ....]

TBProAudio · 2021-04-22T08:52:54Z

For Mac you can use fopen without issue. You have a couple of options here. One is simply to do your own conversion from UTF8 to ANSI CP on windows before you open. Another is to wrap your file reading routines for each platform to handle different routes more generally. I have a library for plugins that does this and I use std::ifstream and std::ofstream. Files are opened with wide (16 byte) strings on windows post conversion and simply with the UTF8 path on Mac.

Obviously, UTF8ToUTF16 can be used for the conversion when you need to do it.

Hi Alex,

After some test on Mac I found a curious issue (for me): in some strings (char *) a special character is encoded with 2 bytes, in some 3 bytes are used. In the debugger both look the same, but have different length. Do you know how to detect the 2 byte or 3 byte scenario? And how to convert properly?
Thank you

AlexHarker · 2021-04-23T08:43:45Z

It is usual for UTF8 to use a variable number of bytes for encoding characters (between 1 and 4) - this should happen on all platforms.

https://en.wikipedia.org/wiki/UTF-8

TBProAudio · 2021-04-23T08:51:24Z

Thanks, complicated stuff :-)
In any case it seems that 3 byte unicode shows garbage with g.DrawText(), 2 byte seem to work ...
Any idea?

AlexHarker · 2021-04-23T09:18:46Z

Which backends and platforms?

TBProAudio · 2021-04-23T09:32:56Z

Sorry, I forgot: Mac, nanovg, e.g. APP (but I guess the other plug formats as well)

TBProAudio · 2021-04-23T14:38:33Z

one more note:
pGraphics->AttachPopupMenuControl(DEFAULT_LABEL_TEXT) is enabled.
If disabled (aka use the Mac platform popup-menu), strings with 3 bytes special characters are displayed properly.

AlexHarker · 2021-04-23T15:48:34Z

Are you able to test with skia?, and (just for sanity) a different control (one that just draws a problem string would be fine).

AlexHarker · 2021-04-23T15:58:18Z

BTW - I've just tried putting the letters into the text in iPlugEffect and everything draws correctly with NanoVG, so there's some step we've not got the same - this could be control-specific or to do with how the string is being generated.

FWIW as far as I can tell ä should be encoded as 2 bytes in UTF8.

TBProAudio · 2021-04-23T16:00:00Z

yes, I will try with skia.

Just to sum up (all for Mac, nanovg):
IPLUG2 popup: does not work
system popup: works
g.DrawText(): does not work (any control)

TBProAudio · 2021-04-23T16:04:05Z

yes, UTF8 2 bytes works
UTF8 3 bytes does not. But should or not?
I think it is a 3 byte UTF8 problem ...

AlexHarker · 2021-04-23T16:07:03Z

Yes - I've managed to confirm that here now - this seems like a potential bug, although I guess we should also check that the font you have supports the character in question - which might involve looking in a font editor.

AlexHarker · 2021-04-23T16:07:47Z

I get a space for a 3-byte char BTW, rather than "garbage" - hence my question about missing characters.

TBProAudio · 2021-04-23T16:20:11Z

correct, a space is drawn.
So, "böse" becomes "bo se".
OK I see, could be font thing ...

This is why I asked if there is a method available on Mac to convert 3 bytes special characters to 2 bytes ...

AlexHarker · 2021-04-23T16:32:28Z

ö should not be a three byte encoding - it should have 2 bytes only:

https://www.compart.com/en/unicode/U+00F6

Each Unicode encoding should (as far as I know) be unique.

AlexHarker · 2021-04-23T16:36:41Z

Looks possible that the umlaut might been separately encoded in your example as a combining diaerisis (2 bytes):

https://www.compart.com/en/unicode/U+0308

and a lowercase o (1 byte). I tried that here and I also get a space, but it's intriguing that it is encoded that way, rather than as the unicode point I linked above.

TBProAudio closed this as completed Apr 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Faulty wide char conversion on Windows #733

Faulty wide char conversion on Windows #733

TBProAudio commented Apr 18, 2021

AlexHarker commented Apr 18, 2021

TBProAudio commented Apr 19, 2021

TBProAudio commented Apr 19, 2021

AlexHarker commented Apr 19, 2021

AlexHarker commented Apr 19, 2021

TBProAudio commented Apr 19, 2021

TBProAudio commented Apr 20, 2021 •

edited

TBProAudio commented Apr 22, 2021

AlexHarker commented Apr 23, 2021

TBProAudio commented Apr 23, 2021

AlexHarker commented Apr 23, 2021

TBProAudio commented Apr 23, 2021 •

edited

TBProAudio commented Apr 23, 2021

AlexHarker commented Apr 23, 2021

AlexHarker commented Apr 23, 2021

TBProAudio commented Apr 23, 2021

TBProAudio commented Apr 23, 2021

AlexHarker commented Apr 23, 2021

AlexHarker commented Apr 23, 2021

TBProAudio commented Apr 23, 2021

AlexHarker commented Apr 23, 2021

AlexHarker commented Apr 23, 2021

Faulty wide char conversion on Windows #733

Faulty wide char conversion on Windows #733

Comments

TBProAudio commented Apr 18, 2021

AlexHarker commented Apr 18, 2021

TBProAudio commented Apr 19, 2021

TBProAudio commented Apr 19, 2021

AlexHarker commented Apr 19, 2021

AlexHarker commented Apr 19, 2021

TBProAudio commented Apr 19, 2021

TBProAudio commented Apr 20, 2021 • edited

TBProAudio commented Apr 22, 2021

AlexHarker commented Apr 23, 2021

TBProAudio commented Apr 23, 2021

AlexHarker commented Apr 23, 2021

TBProAudio commented Apr 23, 2021 • edited

TBProAudio commented Apr 23, 2021

AlexHarker commented Apr 23, 2021

AlexHarker commented Apr 23, 2021

TBProAudio commented Apr 23, 2021

TBProAudio commented Apr 23, 2021

AlexHarker commented Apr 23, 2021

AlexHarker commented Apr 23, 2021

TBProAudio commented Apr 23, 2021

AlexHarker commented Apr 23, 2021

AlexHarker commented Apr 23, 2021

TBProAudio commented Apr 20, 2021 •

edited

TBProAudio commented Apr 23, 2021 •

edited