Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'あ'.to_sym == :'あ' is false #3035

Closed
yousuketto opened this Issue Jun 11, 2015 · 10 comments

Comments

Projects
None yet
7 participants
@yousuketto
Copy link
Contributor

yousuketto commented Jun 11, 2015

jruby version is 9.0.0.0.rc1.
'あ'.to_sym and :'あ' should be the same object, but I got the following result.

# coding: utf-8
p 'a'.to_sym == :'a'
p ''.to_sym == :'あ'
$ jruby example.rb 
true
false
@Ch4s3

This comment has been minimized.

Copy link

Ch4s3 commented Jun 11, 2015

I just ran this on jruby-9.0.0.0.pre2and it evals to true.

@billdueber

This comment has been minimized.

Copy link

billdueber commented Jun 11, 2015

Evaluates to true in 1.7.18 as well, and verified false on rc1.

@trejkaz

This comment has been minimized.

Copy link

trejkaz commented Jun 17, 2015

:'あ' comes via RubySymbol.SymbolTable#getSymbol(String)

name = {java.lang.String@7561} "�"
 value = {char[3]@7582} 
  0 = 'ã' 227
  1 = '\u0081' 129
  2 = '\u0082' 130
 hash = 222276

'あ'.to_sym comes via RubySymbol.SymbolTable#getSymbol(ByteList):

bytes = {org.jruby.util.ByteList@7670} "�"
 bytes = {byte[15]@7674} 
 begin = 0
 realSize = 3
 encoding = {org.jcodings.specific.UTF8Encoding@7675} "UTF-8"
 hash = -255662183
 stringValue = {java.lang.String@7671} "�"
  value = {char[3]@7677} 
    0 = 'ã' 0xE3
    1 = '\u0081' 0x81
    2 = '\u0082' 0x82
  hash = 222276
  hash (local var) = -31932

getSymbol(ByteList) appears to be simulating String#hashCode() somehow by calling javaStringHashCode, but this method has not returned the same hash as String#hashCode which is probably why it couldn't find the symbol in the table. So you end up with two entries for the same symbol in the symbol table. Oops.

@trejkaz

This comment has been minimized.

Copy link

trejkaz commented Jun 17, 2015

The bug is that String#hashCode is working with only positive numbers, but RubySymbol.SymbolTable#getSymbol(ByteList) is working with a byte[] which can (and does) contain negative numbers. I think it has to cast to int and mask again before it can use any of the values in that. I'm surprised it didn't break for some other case sooner.

Also, in both cases, stringValue seems suspicious. The surrounding ByteList claims to be UTF-8, but that is not how you decode UTF-8...

@yousuketto

This comment has been minimized.

Copy link
Contributor Author

yousuketto commented Jun 26, 2015

@trejkaz
Thank you for explanation. I understood.
If the hash code can not be calculated from Bytelist, it seems that the way to calculate hash code should be changed to via String.

@k77ch7

This comment has been minimized.

Copy link
Contributor

k77ch7 commented Jun 26, 2015

Hi all,

The following patch fixed this issue for me on the master branch.

diff --git a/core/src/main/java/org/jruby/RubySymbol.java b/core/src/main/java/org/jruby/RubySymbol.java
index c4f8186..5a37881 100644
--- a/core/src/main/java/org/jruby/RubySymbol.java
+++ b/core/src/main/java/org/jruby/RubySymbol.java
@@ -42,6 +42,7 @@ import org.jruby.anno.JRubyClass;
 import org.jruby.anno.JRubyMethod;
 import org.jruby.ast.util.ArgsUtil;
 import org.jruby.compiler.Constantizable;
+import org.jruby.RubyEncoding;
 import org.jruby.parser.StaticScope;
 import org.jruby.runtime.Arity;
 import org.jruby.runtime.Binding;
@@ -956,13 +957,21 @@ public class RubySymbol extends RubyObject implements MarshalEncoding, Constanti
     }

     private static int javaStringHashCode(String str) {
-        return str.hashCode();
+        byte val[] = str.getBytes(RubyEncoding.ISO);
+        int h = 0;
+        int length = val.length;
+        if (length > 0) {
+            for (int i = 0; i < length; i++) {
+                h = 31 * h + val[i];
+            }
+        }
+        return h;
     }

     private static int javaStringHashCode(ByteList iso8859) {
         int h = 0;
         int length = iso8859.length();
-        if (h == 0 && length > 0) {
+        if (length > 0) {
             byte val[] = iso8859.getUnsafeBytes();
             int begin = iso8859.begin();

I will test and send a pull request with the bug. Please let me know what you think.

@trejkaz

This comment has been minimized.

Copy link

trejkaz commented Jun 27, 2015

I think the name of the methods would have to change because it no longer matches the behaviour of Java's String class, but the two implementations match, so it would certainly fix it. I don't know whether there is a particular point in having it match String's hash (and if I remember correctly, weren't there issues with String's hashing algorithm anyway? I don't remember the details.)

@k77ch7

This comment has been minimized.

Copy link
Contributor

k77ch7 commented Jul 2, 2015

@trejkaz thanks. I change the above my patch.
A new patch uses Java's String class hashCode() method.

new patch:

diff --git a/core/src/main/java/org/jruby/RubySymbol.java b/core/src/main/java/o
rg/jruby/RubySymbol.java
index c4f8186..8f3aa9f 100644
--- a/core/src/main/java/org/jruby/RubySymbol.java
+++ b/core/src/main/java/org/jruby/RubySymbol.java
@@ -42,6 +42,7 @@ import org.jruby.anno.JRubyClass;
 import org.jruby.anno.JRubyMethod;
 import org.jruby.ast.util.ArgsUtil;
 import org.jruby.compiler.Constantizable;
+import org.jruby.RubyEncoding;
 import org.jruby.parser.StaticScope;
 import org.jruby.runtime.Arity;
 import org.jruby.runtime.Binding;
@@ -65,6 +66,7 @@ import org.jruby.util.SipHashInline;

 import java.lang.ref.WeakReference;
 import java.util.concurrent.locks.ReentrantLock;
+import java.util.Arrays;

 import static org.jruby.util.StringSupport.codeLength;
 import static org.jruby.util.StringSupport.codePoint;
@@ -962,13 +964,11 @@ public class RubySymbol extends RubyObject implements MarshalEncoding, Constanti
     private static int javaStringHashCode(ByteList iso8859) {
         int h = 0;
         int length = iso8859.length();
-        if (h == 0 && length > 0) {
+        if (length > 0) {
             byte val[] = iso8859.getUnsafeBytes();
             int begin = iso8859.begin();
-
-            for (int i = 0; i < length; i++) {
-                h = 31 * h + val[begin + i];
-            }
+            byte copyVal[] = Arrays.copyOfRange(val, begin, begin + length);
+            h = new String(copyVal, RubyEncoding.ISO).hashCode();
         }
         return h;
     }
@headius

This comment has been minimized.

Copy link
Member

headius commented Jul 2, 2015

Patch looks right. Will merge after review.

@headius headius closed this in 3530b24 Jul 2, 2015

headius added a commit that referenced this issue Jul 2, 2015

@headius

This comment has been minimized.

Copy link
Member

headius commented Jul 2, 2015

Thanks for the fix, @k77ch7!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.