New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Windows Traditional Chinese Edition: unknown encoding name - MS950 #5707
Comments
@umairsair Thanks for the analysis! I think you're right, we just need an alias. @lopex does that seem right to you? |
I see no such alias in CRuby. Does CRuby work properly in your environment? |
From the Wikipedia page, it sounds like MS950 is not exactly the same as Big5, but it's probably close enough to alias? |
I haven't tried it extensively. I just tried simple commands like reading the file. I tried reading the file with encoding "MS950" and "x-windows-950" and it just gave me warning.
May be we should do the same in jruby; instead of throwing exception, just use some default encoding.
Yes, it seems MS950 is a bit different but I couldn't find any details of the actual difference. I have the same opinion that it is close enough to alias. If we don't get the details on the encoding difference then probably we should add it as alias until someone comes up with the a problem with it. WDYS? |
I think we should warn and use default encoding then. Otherwise, maybe this could be raised as an MRI issue ? |
@lopex That's a good point. @umairsair Can you confirm whether CRuby/MRI has the same issue? If so, we probably should coordinate with them on a fix. |
IMO we should do both; instead of just failing, we should warn and move on same as CRuby. And an MRI issue to support MS950 encoding. I am using JRuby in eclipse as plugin dependency and there is no way to get around this problem except setting "file.encoding" property of java to some other encoding and it changes the complete eclipse environment and its not acceptable.
It doesn't support MS950 encoding but not blocking in anyway until I am doing stuff on non-Chinese language stuff on this windows edition. Is there anything specific that you want me to try out with MRI? |
@umairsair It helps to know that it still works. What encoding does it end up choosing? |
@umairsair Can you show us the full backtrace for that exception please? If we can determine what CRuby falls back on we can make this change fairly quickly. |
I read a file created with Chinese characters in it.
Following is the backtrace. Exception is thrown from here.
Update: removing unnecessary frame. |
Oh CP950! |
So do you have a possible solution to fix it? Anything else I can help you with? |
I'm discussing it on matrix now with @lopex. We could just add the alias, but our list of encodings is generated from CRuby. We'd rather figure out how they're falling back and why they warn but still apparently pick the right encoding. |
Oh one thing you might be able to do is force the JVM to use CP950 instead of MS950 by passing |
@umairsair I just noticed something odd about your stack trace above: JRubyParser. There is no such class in JRuby...the only place such a class exists is in the external jruby-parser project, which I believe has not been updated in some time (@enebo knows better than I). I would not at all be surprised to find that it's having trouble with unknown encodings since it doesn't use the same mechanisms as JRuby proper to deal with them. Please provide an example of how you're running JRuby to trigger this error. At this point we have been unable to reproduce your issue on any current version of JRuby, and that JRubyParser line in the stack trace is highly suspect. |
I think we don't have JRuby specific property for file encoding. I see
Sorry for causing confusion, its my own class that just calls
A very simple example.
Can you please tell how you are trying to reproduce? |
Well so far I've just been trying to get JRuby to run with MS950 as the system encoding, but it doesn't seem to trigger any issues. I'm thinking this may be specific to the |
On non-chinese edition of Windows 7, I am able to reproduce this issue by enforcing java file.encoding to MS950. So I guess you will also be able to reproduce it. |
@umairsair Sorry for the delay. I'm back to work this week and looking into this. |
So running the following code with container = org.jruby.embed.ScriptingContainer.new(
org.jruby.embed.LocalContextScope::SINGLETHREAD)
container.runScriptlet("require %{FileUtils}") A workaround for you might be to set |
@umairsair Ok so I'm not sure we know how to proceed at this point. For all cases I have tested with MS950 as an encoding, we behave the same as CRuby. When I run your ScriptingContainer code with At this point I have two suggestions for you:
Sorry we have been unable to help you, but without a reproduction we would have to blindly guess at what's wrong. |
Are you running it from a java application? From the snippet, it seems that you are running it from JRuby terminal. If you are unable to reproduce it using java application, I'll share the java application to reproduce this issue.
This workaround will work but as I mentioned earlier, we cannot enforce any encoding because it'll change the whole java environment. |
@headius , I have pushed a sample at following location and added the instructions in readme. |
Reproduced! |
Ok so now I see where it's happening and why it only affects windows: jruby/core/src/main/java/org/jruby/Ruby.java Lines 1488 to 1495 in a7d0c0c
|
@umairsair Ok so my suggested workaround of setting Short term fix in JRuby will be to simply fall back on default external, but I'm not certain this is the right fix just yet. cc @lopex @enebo |
Thanks @headius for the quick fix. In basic testing, I have verified the fix on Windows Traditional Chinese edition and it is working fine; ASCII-8BIT is default external encoding. I'll back port this fix and try to build JRuby (quick guide to build only jruby jar would be helpful :) BTW when is 9.2.8.0 release expected? |
If you need the complete jar, run: The complete jar will be built into the lib/ dir. 9.2.8.0 could probably go any time but there's a large rework of load/require I'd hoped to finish. We will discuss today. |
I looked into how CRuby does this. Basically the piece we're missing is the ability to get the exact code page number and then look up based on that. Both MS950 and CP950 are names for code page 950, so MRI never sees the "MS" part when picking the default filesystem encoding. We could of course bind those methods via FFI but I'd rather have a consistent way to do this without a native dependency. That might be as simple as looking for encoding patterns of "MS####" and swapping them for "CP####". |
I pushed an update that will attempt to translate |
FWIW the logic for this is in localeinit.c in MRI: int
Init_enc_set_filesystem_encoding(void)
{
int idx;
#if NO_LOCALE_CHARMAP
idx = ENCINDEX_US_ASCII;
#elif defined _WIN32
char cp[SIZEOF_CP_NAME];
const UINT codepage = ruby_w32_codepage[1] ? ruby_w32_codepage[1] :
AreFileApisANSI() ? GetACP() : GetOEMCP();
CP_FORMAT(cp, codepage);
idx = rb_enc_find_index(cp);
if (idx < 0) idx = ENCINDEX_ASCII;
#elif defined __CYGWIN__
idx = ENCINDEX_UTF_8;
#else
idx = rb_enc_to_index(rb_default_external_encoding());
#endif
return idx;
} |
I'm going to call this fixed with the merge of #5733. It will be in 9.2.8.0. |
Thanks a lot once again @headius ! I'll try it on my end. |
Environment
JRuby 9.2.6.0
Windows 7 Traditional Chinese Edition
I am getting following error with JRuby.
org.jruby.exceptions.MainExitException: unknown encoding name - MS950
Quick analysis shows that we don't have an entry for MS950 in org.jcodings.EncodingList.
This problem was not present in very old JRuby version. Seems that the following commits changed the behavior on this Windows edition.
1ceddaa
239c726
This problem is blocker on this windows edition.
It seems to me that MS950 is alias of Big5, if this is correct then we just need one liner change in EncodingList class.
The text was updated successfully, but these errors were encountered: