Update PrimitiveFieldType$STRING.java for UTF-8 #19
Conversation
Add support of UTF-8! For i18n (eg. TTS). "A string must always contain UTF-8 encoded or 7-bit ASCII text." https://developers.google.com/protocol-buffers/docs/proto#scalar "unicode strings are currently not supported as a ROS data type. utf-8 should be used to be compatible with ROS string serialization. " http://wiki.ros.org/msg but it works. (testing with python)
@damonkohler will this have any unforeseen side effects? |
LGTM but please also fix the serialize method. |
@@ -581,7 +581,7 @@ public int getSerializedSize() { | |||
public String deserialize(ChannelBuffer buffer) { | |||
int length = buffer.readInt(); | |||
ByteBuffer stringBuffer = buffer.readSlice(length).toByteBuffer(); | |||
return Charset.forName("US-ASCII").decode(stringBuffer).toString(); | |||
return Charset.forName("UTF-8").decode(stringBuffer).toString(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You also need to specify "UTF8" to getBytes() on line 574. See http://docs.oracle.com/javase/tutorial/i18n/text/string.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my apologies...
I have not tested the serializer in every context.
I use ROS on Android, which by default is UTF-8
add symmetric encoding transformation (serializer/deserializer)
Code Refactoring
Remove dead code
@@ -679,6 +680,7 @@ public String getJavaTypeName() { | |||
}; | |||
|
|||
private static final ImmutableSet<String> TYPE_NAMES; | |||
private static final Charset DEFAULT_CHARSET = Charset.forName("utf-8"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use "UTF-8"
Last nit and then LG. |
Change parameter of constant
@stonier : I testing the change with common case. Do you have other ideas for test cases ? in org.ros.internal.message.MessageSerializationTest @Test
public void testStringUTF8() {
std_msgs.String message = defaultMessageFactory.newFromType(std_msgs.String._TYPE);
message.setData("éêè €àáßëœ 文字化け");
checkSerializeAndDeserialize(message);
// i18n test case
// base on http://www.inter-locale.com/whitepaper/learn/learn-to-test.html
// Combining Marks and Accents test
message.setData("àéîōũ");
checkSerializeAndDeserialize(message);
// DOS 860 test
message.setData("você nós mãe avô irmã criança");
checkSerializeAndDeserialize(message);
// Windows-1252 test
message.setData("€ŒœŠš™©‰ƒ");
checkSerializeAndDeserialize(message);
// Turkish test
message.setData("ışık bir İyi Günler");
checkSerializeAndDeserialize(message);
// Dakuten and handakuten marks test
message.setData("がざばだぱか゛さ゛た゛は");
checkSerializeAndDeserialize(message);
// Combining Grapheme Joiner character
message.setData("אִ͏ַ");
checkSerializeAndDeserialize(message);
// Bidi with Latin test
message.setData("abcאבגדabc ");
checkSerializeAndDeserialize(message);
message.setData("אבגדabcאבגד");
checkSerializeAndDeserialize(message);
message.setData("אבגד012אבגד");
checkSerializeAndDeserialize(message);
message.setData("אבגד 012 012");
checkSerializeAndDeserialize(message);
// Complex Scripts test
message.setData("สวัสดี");
checkSerializeAndDeserialize(message);
message.setData("டாஹ்கோ");
checkSerializeAndDeserialize(message);
message.setData("بِسْمِ اللّهِ الرَّحْمـَنِ الرَّحِيمِ");
checkSerializeAndDeserialize(message);
// Numeric Shaping test
message.setData("عدد مارس ١٩٩٨");
checkSerializeAndDeserialize(message);
// Common Scripts and Encodings test
message.setData("Слава Жанна Ювеналий Ярополк");
checkSerializeAndDeserialize(message);
} |
Can you please add those tests to this change? |
for [rosjava_bootstrap](rosjava/rosjava_bootstrap#19)
I am not sure of the location. (I use the groovy branch.) |
Sorry, I don't understand the problem. Can you please add the message serialization tests you pasted above to this patch? |
Add the message serialization tests for UTF-8.
The problem was : I use groovy stack and the test is available in https://github.com/rosjava/rosjava_core/blob/master/rosjava/src/test/java/org/ros/internal/message/MessageSerializationTest.java (always available in master) It's done in rosjava_core and rosjava_bootstrap repository. It's ok for you? |
Raw format style.
Apologies I just see the difference (Raw vs Std_msg) |
@@ -679,6 +680,7 @@ public String getJavaTypeName() { | |||
}; | |||
|
|||
private static final ImmutableSet<String> TYPE_NAMES; | |||
private static final Charset DEFAULT_CHARSET = Charset.forName("UTF-8"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please move this to the top of the class (line 34).
Looks good, thanks! Just one last nit. The constant DEFAULT_CHARSET belongs at the top of the class. |
Clean code
Done. |
Update PrimitiveFieldType$STRING.java for UTF-8
@mh0rst is this the error you were also seeing? This was coming from a ros answers post: http://answers.ros.org/question/140371/compilation-problem-in-rosjava-master-branch/ |
@stonier yes, exactly. The fix is trivial, I created a pull request if that helps. |
Definitely. @damonkohler will come undoubtedly across it when he's free. |
Add support of UTF-8!
For ~i18n (eg. TTS).
Docs
Quote
It works.