Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

result not right when deal with bytes in string #256

Closed
chenxuuu opened this issue Jun 24, 2019 · 5 comments
Closed

result not right when deal with bytes in string #256

chenxuuu opened this issue Jun 24, 2019 · 5 comments

Comments

@chenxuuu
Copy link

chenxuuu commented Jun 24, 2019

I have two functions:

-- @usage
-- string.toHex("\1\2\3") -> "010203" 3
-- string.toHex("123abc") -> "313233616263" 6
-- string.toHex("123abc"," ") -> "31 32 33 61 62 63 " 6
function string.toHex(str, separator)
    return str:gsub('.', function(c)
        return string.format("%02X" .. (separator or ""), string.byte(c))
    end)
end

-- @usage
-- string.fromHex("010203")       ->  "\1\2\3"
-- string.fromHex("313233616263:) ->  "123abc"
function string.fromHex(hex)
    local hex = hex:gsub("[%s%p]", ""):upper()
    return hex:gsub("%x%x", function(c)
        return string.char(tonumber(c, 16))
    end)
end

when I used in nlua or some other such as elua, I got right results:

local bytes = ("0002405060ffff"):fromHex()
print(bytes:toHex())
--result: 0002405060FFFF	7

but when I use those code in moonsharp, it's wrong:

local bytes = ("0002405060ffff"):fromHex()
print(bytes:toHex())
--result: 00024050603F3F	7
@ebeem
Copy link
Contributor

ebeem commented Apr 10, 2020

yup, you're right
I can confirm this too, to be more specific, the bug is in the string.char
I did a simple test

for i = 0, 255 do
    print(i .. " => " .. string.byte(string.char(i)))
end

you can add equality check to get only characters that don't get mapped correctly using string.byte,
it turned out that for some reason, only 255 fails and outputs 63.
255 => 63

that's why your FF was translated into 3F, to fix this we need to fix the string.char function and resolve this issue where 255 doesn't get mapped to the correct ascii.

@ebeem
Copy link
Contributor

ebeem commented Apr 10, 2020

after more investigation, it seems like the problem is not with string.char, but with string.byte

it's calling this method in StringModule.cs

		private static int Unicode2Ascii(int i)
		{
			if (i >= 0 && i < 255)
				return i;

			return (int)'?';
		}

it will return the character ? which maps to 63 if the input is < 0 or >= 255 which is incorrect logic, 255 should be included

		private static int Unicode2Ascii(int i)
		{
			if (i >= 0 && i <= 255)
				return i;

			return (int)'?';
		}

I will do more testing and confirm this

@ebeem
Copy link
Contributor

ebeem commented Apr 10, 2020

check #273 in case you're still interested

@chenxuuu
Copy link
Author

maybe the true reason is that moonsharp save Lua string as C# string, and it use utf8 encoding....

return DynValue.NewString(sb.ToString());

@ebeem
Copy link
Contributor

ebeem commented Apr 11, 2020

yeah, you're right, I think this also causes other problems. I will need to investigate more to confirm the root of this issue (Lua string as C# string).
right now, it seems like sharing ascii strings between lua and C# doesn't always work.

I have one scenario, if you share a string that contains the character 3F in ascii, C# interprets it as E9, but I think this is another issue as it's related to the string is handled and not a bug in string.byte.

I called a C# function from lua, this function accepts a string and I am passing ascii
I will write them as hex to make it easier to see the characters
lua ascii in hex 40000003F0300000
C# ascii in hex 4000000E90300000

I faced this issue initially, so I decided to go and convert strings into hex code and then move them to my C# code, seems like you went through the same story.

LimpingNinja added a commit that referenced this issue Nov 14, 2021
fix Unicode2Ascii does not return char of ascii code 255 (fixes #256)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants