Skip to content
This repository has been archived by the owner on Nov 22, 2023. It is now read-only.

Add 'utf8Size' opcode #437

Closed
lock9 opened this issue Oct 7, 2021 · 4 comments
Closed

Add 'utf8Size' opcode #437

lock9 opened this issue Oct 7, 2021 · 4 comments

Comments

@lock9
Copy link
Contributor

lock9 commented Oct 7, 2021

Hi,

The compiler / VM is not processing strings properly. It is always considering one byte per character, but this is not true for UTF-8 strings. We should probably add an 'utf8Size' opcode in addition to the current 'size' one. The compiler must use the correct opcode when dealing with strings.

There may be other places and opcodes to be added, like the 'substring' method (I don know how it is implemented, but I can confirm that it considers 1 byte per character).

Related to neo-project/neo-devpack-dotnet#681

@shargon
Copy link
Member

shargon commented Oct 8, 2021

If we use another encoding we will have the same error

@lock9
Copy link
Contributor Author

lock9 commented Oct 8, 2021

What do you mean by another encoding? We only support UTF-8, right? We can call it unicodeSize if that is the matter, but it does have a problem that has to be fixed.

@erikzhang
Copy link
Member

I'm not sure if it's necessary. Because it is also possible to operate by byte length.

@devhawk
Copy link
Contributor

devhawk commented May 9, 2022

This seems more appropriate to add to neo than neo-vm. NeoVM doesn't have a string type, so it seems odd to have a string length opcode in the VM.

Since Neo leverages NeoVM ByteStrings and Buffers to store UTF8 strings, it makes sense to me that there should be mechanism in Neo.dll to calculate the length of a UTF8 string that is multibyte character aware

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants