Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Full Unicode Support #2541

Open
wants to merge 7 commits into
base: master
from

Conversation

Projects
None yet
3 participants
@cloudwu
Copy link

commented May 7, 2019

This PR add full unicode support, See #2538

  1. I defined two types: ImWchar16 and ImWchar32, and the macro ImWchar is ImWchar16 by default. We can change it in imconfig.h .

  2. ImTextCharFromUtf8 replace the code point above 0x10000 to replacement character U+FFFD when we use ImWchar16. So that we can remove some range checks outside.

  3. ImTextCharToUtf8 and ImTextCountUtf8BytesFromChar can manage full unicode set now. btw, There is a bug (for surrogate) in ImTextCountUtf8BytesFromChar before.

  4. GetClipboardTextFn_DefaultImpl and SetClipboardTextFn_DefaultImpl have some issues with surrogate of UTF-16 before, I reimplement them by ::WideCharToMultiByte and ::MultiByteToWideChar.

  5. I introduce a new API ImGuiIO::AddInputCharacterUTF16, it can handle UTF-16 correctly. examples/imgui_impl_win32.cpp assumed that one code point per WM_CHAR message before. It's not correct, because the wParam can be surrogate, we should combine two WM_CHAR message when the wParam is surrogate.

  6. For the ImFontGlyphRangesBuilder, UsedChars may 17x larger then before if we use ImWchar32.

I hope I haven't left something out.

@ocornut

This comment has been minimized.

Copy link
Owner

commented May 7, 2019

Thank you @cloudwu !
It may take me a while to chew through this as I am a little busy and not very familiar with all of this.

Quick notes:

(0) Do you have realistic expectation as to which ranges you or other people would consider using 0x10000..0x10FFFF ?
Could you give me instructions to test the input of some of those characters? (e.g. U+20628 𠘨) or whatever is easier, so I could later test with multiple back-ends.

(1) I think we should remove the filtering from the back-end/examples code. I have a small local patch over master to change the signature to AddInputCharacter(unsigned int). Then the ImGui function can freely perform filtering (e.g. if < 0x10000 for current code) and the back-end doesn't need to know about it. I can commit that already with my patch if you agree with this (may relate to point (2)).

(2) Do you see any issue with making AddInputCharacter() behave like your AddInputCharacterUTF16() and remove that new function? How do popular back-ends behave in respect to surrogate pairs? GLFW seems to pass the WM_CHAR/UNICHAR data straight to the user. SDL in its WIN_ConvertUTF32toUTF8 function appears to expect characters above 0x10000 passed to WM_CHAR and converts them to 4-byte UTF-8 sequences, which unless I am wrong seems to contradict what you are doing in imgui_impl_win32.cpp. Is SDL wrong, or are those messages affected by MSVC compilation settings?

(3) I haven't tried the patch yet but I suspect maybe ImFileOpen() would be broken as we cast ImWchar to wchar_t?

(4) I guess ImFontGlyphRangesBuilder will overflow when used with ImWchar16.

(5) If your git-fu makes you able to rebase this into two commits, you can separate the Win32 clipboard stuff into a separate commit (and maybe ImFileOpen fix) and i'll cherry-pick the first one faster.

Thanks!

@cloudwu cloudwu force-pushed the cloudwu:unicode branch from 42aacb9 to a9c741c May 8, 2019

@cloudwu

This comment has been minimized.

Copy link
Author

commented May 8, 2019

  1. Unicode has 17 planes now ( https://en.wikipedia.org/wiki/Plane_(Unicode) ), 0-FFFF is BMP (plane 0), SIP (plane 2) is rarely uncommon Chinese characters/ Kanji, I have seen some of them in my friends' name. SMP( plane 1) maybe more useful for emoji . For example, U+1F604 (😀) . You can find them here : https://unicode.org/emoji/charts/emoji-list.html .

If you want to test rendering SIP character, you may need find a font including them. I guess you can got one from your windows system. Have a look to your registry with the key HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\LanguagePack\SurrogateFallback . In my computer, C:\Windows\fonts\simsunb.ttf is a font for plane 2.

Testing WM_CHAR for surrogate pair is a little complicated. You should know how to input the character or emoji from IME. I make an user-defined word with character in plane 2 in my IME to test.

  1. You can commit your patch in master, I can rebase it.

  2. My opinion is reserving the AddInputCharacter for unicode code point. I think UTF-16's surrogate pair is not a good solution for Unicode. There is a similar issue for surrogate pairs : SFML/SFML#366

I read the document of glfw, the glfwSetCharCallback of glfw treat the input as the native endian UTF-32 . https://www.glfw.org/docs/latest/input_guide.html

Because an unsigned int is 32 bits long on all platforms supported by GLFW, you can treat the code point argument as native endian UTF-32.

I don't know why SDL pass the UTF-32 to WM_CHAR, but MSDN suggest using WM_UNICHAR instead. And UTF-32 has no surrogate pair, so it can be converted to UTF-8 directly.

  1. I fixed it.

  2. I haven't seen the overflow issue, I'll check it again. I think if the user use ImWchar16 , they would not use the range over 0x10000. btw, Maybe you can manage ranges internal by plane. Each plane is a 16bit
    space. Most of users use only one plane (BMP), some of us may want to use SMP for emoji or SIP for some rarely name.

  3. It's done. I think 2cba94a cf4c041 3bedb30 can be merged first.

@cloudwu

This comment has been minimized.

Copy link
Author

commented May 8, 2019

I don't know why AppVeyor build failed, it's strange that it can't find MultiByteToWideChar but OpenClipboard is ok. I use mingw and haven't visual studio environment to test .

@nicolasnoble

This comment has been minimized.

Copy link
Contributor

commented May 8, 2019

Try including stringapiset.h after windows.h.

@cloudwu cloudwu force-pushed the cloudwu:unicode branch from eaf080d to 01c4d56 May 8, 2019

@cloudwu

This comment has been minimized.

Copy link
Author

commented May 8, 2019

Try including stringapiset.h after windows.h.

It doesn't work.

@nicolasnoble

This comment has been minimized.

Copy link
Contributor

commented May 8, 2019

Oh. Reading my own code, I just realized these functions are defined behind #if defined UNICODE || defined _UNICODE

@cloudwu

This comment has been minimized.

Copy link
Author

commented May 9, 2019

Oh. Reading my own code, I just realized these functions are defined behind #if defined UNICODE || defined _UNICODE

It seems that the imgui project has already defined UNICODE.

@nicolasnoble

This comment has been minimized.

Copy link
Contributor

commented May 9, 2019

It seems that the imgui project has already defined UNICODE.

What? Where? The only reference in the code I could find was in some build_win32.bat files that aren't used by the appveyor script.

@cloudwu

This comment has been minimized.

Copy link
Author

commented May 9, 2019

I haven't a visual studio , but I guess the IDE will define UNICODE if charset is set to UNICODE. https://github.com/ocornut/imgui/blob/master/examples/example_win32_directx11/example_win32_directx11.vcxproj#L29

@nicolasnoble

This comment has been minimized.

Copy link
Contributor

commented May 9, 2019

You are correct, but only for the projects that do define this. The OpenGL projects that the solution is also trying to build don't have this setting.

But after double checking appveyor, it's the DX project that's failing, not the OpenGL one... So... I dunno.

@nicolasnoble

This comment has been minimized.

Copy link
Contributor

commented May 9, 2019

@ocornut you think you could increase the verbosity of the msbuild command on appveyor so we can see what's going on here with the command-line arguments?

@ocornut

This comment has been minimized.

Copy link
Owner

commented May 9, 2019

It doesn’t matter what each examples/ project uses, we should get imgui to compile/link properly on every combination of user’s settings?

@nicolasnoble

This comment has been minimized.

Copy link
Contributor

commented May 9, 2019

Yes, but right now, I am confused as to why it currently doesn't build exactly.

ocornut added a commit that referenced this pull request May 11, 2019

IO: changed AddInputCharacter(unsigned short c) signature to AddInput…
…Character(unsigned int c).

Examples/Backends: Don't filter characters under 0x10000 before calling io.AddInputCharacter(), the filtering is done in io.AddInputCharacter() itself. This is in prevision for fuller Unicode support. (#2538, #2541)

@ocornut ocornut force-pushed the ocornut:master branch from 1b84280 to aca6ee1 May 11, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.