-
Notifications
You must be signed in to change notification settings - Fork 800
Add mrb_utf8_from_locale, mrb_utf8_free, mrb_locale_from_utf8, mrb_locale_free #1822
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
If you worry needless memory allocation on non-utf8 locales, I can add |
Sorry about noisy commits. |
Nice idea, but direct usage of malloc/strdup/strndup/free is unacceptable in mruby. |
How about this? |
Much better. I prefer no overhead way. |
I named this inspired with glib string functions. http://www.gtk.org/api/2.6/glib/glib-Character-Set-Conversion.html#g-locale-to-utf8 BTW, It seems to be confrict. |
patched again and pushed forcely. |
Do you perfer the name is BEFORE
AFTER
|
I suppose this APIs will be used in mrbgems. So we must decide the name carefully. |
I am not sure, but I think it must be possible to do this using the C99 standard mbrtowc/mbtowc/wctomb, etc functions? I don't like platform-specific code too much if there is a standard way to do it. |
Unfortunately, C99 mbtowc etc. can only convert strings between locale dependent multibyte encoding (which may or may not be UTF-8) and opaque wide character encoding (which may or may not be UTF-32). We cannot switch locale in the middle of execution in C99 neither. So those functions are too weak to implement locale to/from UTF-8 conversion. |
I can write code for converting wide char code point to utf8 bytes. On 3/8/14, matz notifications@github.com wrote:
|
Hmm, I see. Windows i18n & l10n seems really complicated... Too bad the ansi functions are not powerful enough. If that's the case , then please carry on. However, I do wonder how plain ruby handles this problem that in the old 1.8.x days before we had Encoding? |
@beoran good question. Back in 1.8 days, strings do not handle any multibyte encoding, but Regexp do. Besides that, So if you want to handle multibyte strings in 1.8, you have to use Regexp matching with your locale. |
Hmmm, interesting. Currently a mruby string is just an arbitrary byte buffer. Wouldn't it be possible to do any conversions on the mruby side? Escpecially since we don't have working regexps yet. Also, we have to keep issue #1715 in mind. What should strings in mruby be? Dumb byte buffers like in Ruby 1.8 or in Lua? Always UTF-8 encoded, with a separate byte buffer class, like it is in in Python? Or support some form of Encoding...? We need to think about this well. The balance is to keep mruby small, whilst at the same time help portability and i18n. |
At least mruby-onig-regexp can support non-utf8 strings easily because Oniguruma has great support of many encodings. |
I'm not sure about this exact use case, but I think this is to input/output on Windows Console. |
Are you talking to me? |
See #1715, we made a spec how to store utf-8 bytes into RString. |
On output, you convert UTF-8 strings to (Wide Character and then convert to) ANSI strings (SJIS strings) and call fwrite(3). It losts Unicode characters. For example CRuby converts UTF-8 strings to Wide Characters and use WriteConsoleW(). |
Ah, I understand it now. The issue is, if anything, how to handle non-utf-8 strings with minimul changes in mruby. What you say is just thing for mrb_p. Right? And I guess, it's easy improvement after merging this. Thanks. |
I am sorry I don't understand. How can we convert locale string to UTF-8 in ANSI API? |
It's possible but it require call of |
What i say is not from/to locale string. I pointed we can get UTF-16 string from Console/output to Console.
|
What is the problem we're trying to solve here anyway? If I understand correctly, the problem is that currently it's not possible to output UTF-8 encoded strings to the console on Windows using I read a bit here: http://stackoverflow.com/questions/1371012/how-do-i-print-utf-8-from-c-console-application-on-windows, and I found that just entering But, if it's really a problem, we could use Edit: I compiled mruby using mingw and then mruby simply crashed on utf-8 input.. >_< |
BTW, tty of libuv treats |
It's not good way to solve. Changing console codepage affects console font. So window will resized. And, if don't have unicode fonts, we can't display any utf-8 strings. For example, I want to use mruby as script language.
Console window will be resized for each files. Asking @nurse's comment: Below is my patch is doing:
You say that I can be to step 2 to step 3 above.
But WriteConsoleW should be used for that the output handle is console.
In this case, output handle isn't console. @matz what is your worries or questions? |
As @mattn says, if you use cp65001 you invite another issues.
Good point, you can check it with _isatty( _fileno( stdout ) ). |
@nurse do you mean that it should put |
@mattn If mruby supports Unicode on Windows, it should do. But mruby has a option to split such feature into mrbgems. |
I like the idea of making an mrbgem for windows-specific code too. |
it's possible to implement |
Add mrb_utf8_from_locale, mrb_utf8_free, mrb_locale_from_utf8, mrb_locale_free. Just works for windows.
rebased. |
Add mrb_utf8_from_locale, mrb_utf8_free, mrb_locale_from_utf8, mrb_locale_free
Add mrb_cstr_from_locale/mrb_cstr_to_locale. ARGV should be utf8 strings converted from locale strings. And printstr should print locale strings.