Reverse a ByteArray #17

notatestuser · 2017-09-25T21:39:53Z

Like CAT, SUBSTR and LEFT, it would be nice to have an opcode for reversing a byte array. Useful for script hashes.

igormcoelho · 2018-09-29T18:04:39Z

Although I also would like this feature,, use case for scripthash is not clear to me.

ixje · 2018-09-30T13:58:17Z

Back when I participated in the CoZ dApp contest I needed to reverse script_hashes before feeding them to "dynamic APPCALLs" or the contract would not be found (e.g. as a user during runtime I provide a script_hash of an oracle contract that gets called by the smart contract). That always required me to reverse the bytearray. Due to the time constraints I never investigated why this was needed, but I'm assuming that's going to be the use case here as well. Interested to see the use-case of @notatestuser

shargon · 2018-09-30T14:31:08Z

i think that we should remove this requirement of reverse the address on Neo 3.0

notatestuser · 2018-09-30T14:32:31Z

Thanks @ixje, that's exactly the issue I faced too. It had something to do with the varying endianness of the script hash, which is was represented in some places as a 160-bit uint and in others as a byte array.

igormcoelho · 2018-09-30T19:01:20Z

@shargon one thing that I've been trying to clarify on my scripts is exactly where things are little-endian and when they are big-endian. And one thing still bothers me, that I still don't know what is the precise endianess of the "scripthash" we use. Since all machine processing stuff are big-endian on neovm, I believe it should be big-endian inside, and little endian outside, is it not the case?
I finally created a javascript library that simulates EXACTLY the C# BigInteger behavior on javascript, called csBigInteger.js: https://github.com/igormcoelho/csBigInteger.js
It works as bigendian from inside, and using this, I intend to precisely rebuild NeoVM from scratch on javascript in the following days: https://github.com/neoresearch/neovm.js

igormcoelho · 2018-10-01T15:58:17Z

Guys, we could reuse opcode REVERSE, that is currently used for arrays only. I guess that makes sense to allow this operation on ByteArrays too.

https://github.com/neo-project/neo-vm/blob/master/src/neo-vm/OpCode.cs#L429

https://github.com/neo-project/neo-vm/blob/master/src/neo-vm/ExecutionEngine.cs#L837-L849

shargon · 2018-10-01T19:18:47Z

@erikzhang what is the meaning for reverse the addresses?

ixje · 2018-10-01T20:05:43Z

Since we're in question mode on reversing data; Why do we reverse the data of UInt's in the string representation (and not in the ToArray())? See https://github.com/neo-project/neo/blob/9e3c08f4f411e0a9557a96de6def9bcef1ee6c3f/neo/UIntBase.cs#L76-L79

igormcoelho · 2018-10-01T20:38:35Z

In fact this last one is easier @ixje, Neo represents big endian bytearrays with "0x" on front... thats on website documentation. So I guess its a little endian becoming big endian. And I think that also solves my eternal mistery, because Neo compiler puts 0x in front of scripthash, meaning its big endian (that explains the reverse on Address, @shargon)

This is a possible solution to the issue raised at neo-vm page: neo-project#17

ixje · 2018-10-02T08:31:39Z

@igormcoelho

Neo represents big endian bytearrays with "0x" on front... thats on website documentation. So I guess its a little endian becoming big endian.

This is how they present the data, I'm interested in why it was chosen to represent it like that. It even conflicts with the documentation which states

All integer types of NEO are Little Endian except for IP address and port number, these 2 are Big Endian.

Anything deriving from UIntbase is clearly an integertype, so my question is still "why does this reversing happen?"

Then to elaborate on 0x as a way of saying big endian (btw can you link where thats in the documentation?). I think thats a really misleading approach as 0x is just a prefix indicating the number is hexadecimal/base16. It's intended to allow you to differentiate between 10 meaning 10 in base10 and 0x10 meaning 16 in base10.

igormcoelho · 2018-10-02T14:12:32Z

@ixje Ok, let's try to clarify everything here :) This is the reference to "0x": http://docs.neo.org/en-us/exchange/v2.7.4.html
Note that for the hexadecimal string with "0x" prefix, it is processed as big endian; otherwise, it is processed as small endian. However, I tried to use RPC getrawtransaction with two formats (with and without "0x"), but "0x" does not seem to work in this context, and I never found code in Neo that performs a "if begin == 0x, reverse", so I'm guessing this is just for information/documentation purposes.

UIntBase works as little endian (at least it looks like), and the ToString method appends 0x and reverses it (to display as "big endian"). This part is fine to me.
One thing that just got me confused, and I thought I had understood, is related to C# BigInteger. I was pretty sure it worked as big endian. In Microsoft documentation, they claim it works as little endian, which is weird because all my examples work as big endian: https://docs.microsoft.com/en-us/dotnet/api/system.numerics.biginteger?redirectedfrom=MSDN&view=netframework-4.7.2

But even their example is crazy if it's not big endian:

// The example displays the following output:
//       Positive value: 15,777,216
//       C0 BD F0 00

using System;
using System.Numerics;
public class Test {
	public static void Main() 	{
		byte[] b = new byte[]{0x00, 0x01};
		BigInteger bi = new BigInteger(b);
		Console.Write(bi); // 256.  How can it be little endian??
	}
}

Shouldn't little endian store the last elements as the smallest? Isn't the decimal value 01 in this example?

ixje · 2018-10-02T14:44:06Z

UIntBase works as little endian (at least it looks like), and the ToString method appends 0x and reverses it (to display as "big endian"). This part is fine to me.

Without taking into account what some document says how they differentiate between little- and big-endian, why would it be fine to change the representation of a bytearray that is internally one endianness and then only when represented as a string becomes another? That makes zero sense in my opinion.

I'm going to comment on the BigInteger just once because it deviates from the topic if you ask me.
I agree with MS that it is little-endian and that example looks fine to me

>>> int.from_bytes(bytes.fromhex("C0BDF000"),byteorder='little')
15777216

To quote MS from your link:

For example, 0xC0 0xBD 0xF0 0xFF is the little-endian hexadecimal representation of either -1,000,000 or 4,293,967,296.

I still agree with them on that

>>> int.from_bytes(bytes.fromhex("C0BDF0FF"),byteorder='little')
4293967296

I believe this is the crucial part for your Biginteger question

if you convert byte arrays to BigInteger values, you must consider the order of bytes. The BigInteger structure expects the individual bytes in a byte array to appear in little-endian order

localhuman · 2018-10-02T14:55:09Z

I would just like to chime in that regardless of the script hash discussion this functionality will be very useful! There's been many times where I wished I could reverse a byte array.

I should note as well that as currently implemented, it will also be able to reverse strings!

igormcoelho · 2018-10-02T14:59:16Z

@ixje thanks for pointing out this python example... now I'm sure I was understand everything upside down xD hahahah I'll need some time to recover, and to update all my scripts again ;)

And now that you explained me this and I'm convinced BigInteger is little endian, I give you an answer to:

Without taking into account what some document says how they differentiate between little- and big-endian, why would it be fine to change the representation of a bytearray that is internally one endianness and then only when represented as a string becomes another? That makes zero sense in my opinion

I think it makes full sense for me, because numbers are only represented internally as little endians, but for "public" representation they are seen as big endians (that's why I believe this "0x" prefix is quite important to make sure of the big endianess on toString() method).

@shargon In the end, the "reverse" on Address makes a lot of sense, because the ToAddress function takes as parameter a little-endian UInt160:
https://github.com/neo-project/neo/blob/c64748ecbac3baeb8045b16af0d518398a6ced24/neo/Wallets/Helper.cs#L16

Suppose we take an AVM, we apply HASH160 (Sha256 + Ripemd160) and get 0100...0000 (20 bytes), which is number 1 in UInt160. We present ScriptHash as 0x0000...0001 (big-endian visual representation on frontends). So, when we want to create an Address on frontend tools, we just ignore this 0x, reverse the rest (getting original 010000...), merge with Neo hash code and checksum, that's it. So, why this reverse is needed? Because the original scripthash number was little-endian, we transformed it to big-endian to display it (on ToString methods), but transformed back to little-end to create Base58 Address. I'm pretty sure now that's correct.

igormcoelho · 2018-10-03T15:10:58Z

@ixje I hope this documentation helps clarifying UInt160 for all of us: neo-project/neo#405

ixje · 2018-10-03T17:35:52Z

@igormcoelho your description only explains that the code is functional. I wasn't doubting that part, but functional != logical. If you look closely at the following text you'll see we do useless back and forth manipulation of the data in the bytearray.

why this reverse is needed? Because the original scripthash number was little-endian, we transformed it to big-endian to display it (on ToString methods), but transformed back to little-end to create Base58 Address.

If we would not do the bold quoted text, then we would have gotten the same results. That's the point I'm trying to make. It just adds unnecessary overhead, has no added value and it's actually confusing as proven by the fact that we're discussing it.

Let me ask you this

Eriks-Air:~ erik$ md5 some_random_file.txt
MD5 (some_random_file.txt) = d41d8cd98f00b204e9800998ecf8427e

Can you tell the endianness of this hash? Do you think a user cares what endianness it is?
I believe the answer is "no" for both questions. Because it actually doesn't matter. The user just sees a bunch of numbers and characters.

How about this:

        public static string ByteArrayToString(byte[] ba)
        {
            StringBuilder hex = new StringBuilder(ba.Length * 2);
            foreach (byte b in ba)
                hex.AppendFormat("{0:x2}", b);
            return hex.ToString();
        }

        public static void Main(string[] args)
        {
            byte[] internal_data = new byte[] { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x01 };
            UInt160 script_hash = new UInt160(internal_data);

            Console.WriteLine(ByteArrayToString(script_hash.ToArray()));
            Console.WriteLine(script_hash.ToString());

            // Output
            //  0000000000000000000000000000000000000001
            //0x0100000000000000000000000000000000000000
        }

Saying the above makes sense is like saying that you can represent the decimal number 5000 as 0x0005 because you can just tell the user that "0x" means you need to reverse number pairs. It seriously can't get any worse than that.

igormcoelho · 2018-10-03T18:46:21Z

No @ixje, that makes sense, it's just that we learned that in a bad way, but people who learn that right now "correctly" will see it actually makes sense.

The point is that a ScriptHash is a UInt160 number on Neo, and its internal mechanics makes it work as little-endian, you agree on that? For example, ScriptHash UInt160 0100000000000000000000000000000000000000 is in fact the decimal number 1. If you take this UInt160, put inside a BigInteger, and multiply by two, you will need to have decimal 2, but if we invert the logic at this point, internal arithmetic will break.
On the real world, our tools are made for big-endian numbers, that's why ScriptHash is represented as 0x0000000000000000000000000000000000000001. This 0x does not change any hex interpretation, because it is assuming value is big-endian, what is the standard way we do usually. Example: 0x1388 means decimal 5000, but 0x0005 just means decimal 5, as usual. Nothing changed about it.
Note that I don't agree with that (inverting the endianess), because we don't need to operate bigintegers on our interfaces, so it is just "a bunch of hex digits", and the most natural way was to preserve little-endianess, without any reverse.

So, answering to your question:

Can you tell the endianness of this hash? Do you think a user cares what endianness it is?

No, users don't care, and I don't want to care either. I just needed to go deeper in just to clarify the precise endianess of everything, to be able to explain that no "extra reverse" is being done. In fact, I believe we should start presenting ScriptHash in little-endian, so no reverse will ever be needed. That's what I'm proposing for neon-js team, and every others that use this notation.

In my opinion, ScriptHash should never be exposed to users, we have base-58 Address for that, which is legible and safe. Even Neon Wallet presents scripthash, that's strange, I'll open an issue about it. So even when invoking contracts, we should seriously think on adopting Address for this, because it's much safer and 100% clear (with checksum and other protective stuff).

What do we need to change to accomplish that? Nothing. Because Neo was well designed to put this heavenly-sent "0x" in front of big-endian numbers, so we just abolish this 0x on ScriptHash, we can just use little-endian on our frontends and do not worry to do any reverse. And I'm pretty sure that if C# BigInteger was big-endian we wouldn't be having this conversation, but since we cannot change that, at least we need to know it.

ixje · 2018-10-04T14:11:31Z

@igormcoelho if you want to discuss it further DM me on Discord. We reason from 2 different perspectives. I reason from a generic computer science perspective and point to unexpected behavior/handling in the NEO implementation. You reason from the NEO implementation perspective.

igormcoelho · 2018-12-01T21:33:05Z

This problem is currently solved on C# (in PR neo-project/neo-devpack-dotnet#37), you can use Reverse() method.
Example:

public static byte[] Main(byte[] v)
{
    return v.Reverse();
}

In order to work, you need to have latest C# compiler, after this commit: https://github.com/neo-project/neo-compiler/commit/d6e18710a43b1fbd83452f6a3e3f8e8bb5b7fa98

gsmachado · 2022-03-24T11:30:24Z

Guys, we are in 2022 and big endian and little endian in Neo still confuses me. ❤️

Especially the decision to add "0x" as the prefix to represent big endian not being a standard in the software industry....... 🤣 😄

roman-khimov · 2022-03-25T14:49:32Z

Each time we have some issue with endianness I link neo-project/neo#938. The list of links grows with time. But I don't know how can we fix it now, N3 is out and it works the way it works.

erikzhang added the enhancement label Oct 1, 2017

igormcoelho added a commit to igormcoelho/neo-vm that referenced this issue Oct 1, 2018

Applying REVERSE opcode for byte arrays too

b99d242

This is a possible solution to the issue raised at neo-vm page: neo-project#17

This was referenced Oct 1, 2018

Applying REVERSE opcode for byte arrays too #57

Closed

Create function bigEndianToLittleEndian CityOfZion/neon-js#322

Closed

Explaning endianess of ToScriptHash function neo-project/neo-devpack-dotnet#28

Merged

igormcoelho mentioned this issue Oct 3, 2018

Improve PACK/UNPACK opcodes #59

Closed

igormcoelho mentioned this issue Oct 3, 2018

Present Address instead of ScriptHashes on transfer history CityOfZion/neon-wallet#1480

Closed

erikzhang added discussion and removed enhancement labels Oct 8, 2018

igormcoelho mentioned this issue Dec 1, 2018

method to reverse byte[] neo-project/neo-devpack-dotnet#37

Merged

igormcoelho mentioned this issue Dec 3, 2018

Create CONVERT opcode #70

Closed

erikzhang closed this as completed in neo-project/neo-devpack-dotnet#37 Dec 3, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reverse a ByteArray #17

Reverse a ByteArray #17

notatestuser commented Sep 25, 2017

igormcoelho commented Sep 29, 2018

ixje commented Sep 30, 2018

shargon commented Sep 30, 2018

notatestuser commented Sep 30, 2018

igormcoelho commented Sep 30, 2018 •

edited

igormcoelho commented Oct 1, 2018

shargon commented Oct 1, 2018

ixje commented Oct 1, 2018

igormcoelho commented Oct 1, 2018 •

edited

ixje commented Oct 2, 2018

igormcoelho commented Oct 2, 2018 •

edited

ixje commented Oct 2, 2018

localhuman commented Oct 2, 2018

igormcoelho commented Oct 2, 2018 •

edited

igormcoelho commented Oct 3, 2018

ixje commented Oct 3, 2018

igormcoelho commented Oct 3, 2018 •

edited

ixje commented Oct 4, 2018

igormcoelho commented Dec 1, 2018 •

edited

gsmachado commented Mar 24, 2022 •

edited

roman-khimov commented Mar 25, 2022

Reverse a ByteArray #17

Reverse a ByteArray #17

Comments

notatestuser commented Sep 25, 2017

igormcoelho commented Sep 29, 2018

ixje commented Sep 30, 2018

shargon commented Sep 30, 2018

notatestuser commented Sep 30, 2018

igormcoelho commented Sep 30, 2018 • edited

igormcoelho commented Oct 1, 2018

shargon commented Oct 1, 2018

ixje commented Oct 1, 2018

igormcoelho commented Oct 1, 2018 • edited

ixje commented Oct 2, 2018

igormcoelho commented Oct 2, 2018 • edited

ixje commented Oct 2, 2018

localhuman commented Oct 2, 2018

igormcoelho commented Oct 2, 2018 • edited

igormcoelho commented Oct 3, 2018

ixje commented Oct 3, 2018

igormcoelho commented Oct 3, 2018 • edited

ixje commented Oct 4, 2018

igormcoelho commented Dec 1, 2018 • edited

gsmachado commented Mar 24, 2022 • edited

roman-khimov commented Mar 25, 2022

igormcoelho commented Sep 30, 2018 •

edited

igormcoelho commented Oct 1, 2018 •

edited

igormcoelho commented Oct 2, 2018 •

edited

igormcoelho commented Oct 2, 2018 •

edited

igormcoelho commented Oct 3, 2018 •

edited

igormcoelho commented Dec 1, 2018 •

edited

gsmachado commented Mar 24, 2022 •

edited