Skip to content
This repository has been archived by the owner on Nov 22, 2023. It is now read-only.

Reverse a ByteArray #17

Closed
notatestuser opened this issue Sep 25, 2017 · 21 comments
Closed

Reverse a ByteArray #17

notatestuser opened this issue Sep 25, 2017 · 21 comments

Comments

@notatestuser
Copy link

Like CAT, SUBSTR and LEFT, it would be nice to have an opcode for reversing a byte array. Useful for script hashes.

@igormcoelho
Copy link
Contributor

Although I also would like this feature,, use case for scripthash is not clear to me.

@ixje
Copy link
Contributor

ixje commented Sep 30, 2018

Back when I participated in the CoZ dApp contest I needed to reverse script_hashes before feeding them to "dynamic APPCALLs" or the contract would not be found (e.g. as a user during runtime I provide a script_hash of an oracle contract that gets called by the smart contract). That always required me to reverse the bytearray. Due to the time constraints I never investigated why this was needed, but I'm assuming that's going to be the use case here as well. Interested to see the use-case of @notatestuser

@shargon
Copy link
Member

shargon commented Sep 30, 2018

i think that we should remove this requirement of reverse the address on Neo 3.0

@notatestuser
Copy link
Author

Thanks @ixje, that's exactly the issue I faced too. It had something to do with the varying endianness of the script hash, which is was represented in some places as a 160-bit uint and in others as a byte array.

@igormcoelho
Copy link
Contributor

igormcoelho commented Sep 30, 2018

@shargon one thing that I've been trying to clarify on my scripts is exactly where things are little-endian and when they are big-endian. And one thing still bothers me, that I still don't know what is the precise endianess of the "scripthash" we use. Since all machine processing stuff are big-endian on neovm, I believe it should be big-endian inside, and little endian outside, is it not the case?
I finally created a javascript library that simulates EXACTLY the C# BigInteger behavior on javascript, called csBigInteger.js: https://github.com/igormcoelho/csBigInteger.js
It works as bigendian from inside, and using this, I intend to precisely rebuild NeoVM from scratch on javascript in the following days: https://github.com/neoresearch/neovm.js

@igormcoelho
Copy link
Contributor

Guys, we could reuse opcode REVERSE, that is currently used for arrays only. I guess that makes sense to allow this operation on ByteArrays too.

https://github.com/neo-project/neo-vm/blob/master/src/neo-vm/OpCode.cs#L429

https://github.com/neo-project/neo-vm/blob/master/src/neo-vm/ExecutionEngine.cs#L837-L849

@shargon
Copy link
Member

shargon commented Oct 1, 2018

@erikzhang what is the meaning for reverse the addresses?

@ixje
Copy link
Contributor

ixje commented Oct 1, 2018

Since we're in question mode on reversing data; Why do we reverse the data of UInt's in the string representation (and not in the ToArray())? See https://github.com/neo-project/neo/blob/9e3c08f4f411e0a9557a96de6def9bcef1ee6c3f/neo/UIntBase.cs#L76-L79

@igormcoelho
Copy link
Contributor

igormcoelho commented Oct 1, 2018

In fact this last one is easier @ixje, Neo represents big endian bytearrays with "0x" on front... thats on website documentation. So I guess its a little endian becoming big endian. And I think that also solves my eternal mistery, because Neo compiler puts 0x in front of scripthash, meaning its big endian (that explains the reverse on Address, @shargon)

@ixje
Copy link
Contributor

ixje commented Oct 2, 2018

@igormcoelho

Neo represents big endian bytearrays with "0x" on front... thats on website documentation. So I guess its a little endian becoming big endian.

This is how they present the data, I'm interested in why it was chosen to represent it like that. It even conflicts with the documentation which states

All integer types of NEO are Little Endian except for IP address and port number, these 2 are Big Endian.

Anything deriving from UIntbase is clearly an integertype, so my question is still "why does this reversing happen?"

Then to elaborate on 0x as a way of saying big endian (btw can you link where thats in the documentation?). I think thats a really misleading approach as 0x is just a prefix indicating the number is hexadecimal/base16. It's intended to allow you to differentiate between 10 meaning 10 in base10 and 0x10 meaning 16 in base10.

@igormcoelho
Copy link
Contributor

igormcoelho commented Oct 2, 2018

@ixje Ok, let's try to clarify everything here :) This is the reference to "0x": http://docs.neo.org/en-us/exchange/v2.7.4.html
Note that for the hexadecimal string with "0x" prefix, it is processed as big endian; otherwise, it is processed as small endian. However, I tried to use RPC getrawtransaction with two formats (with and without "0x"), but "0x" does not seem to work in this context, and I never found code in Neo that performs a "if begin == 0x, reverse", so I'm guessing this is just for information/documentation purposes.

UIntBase works as little endian (at least it looks like), and the ToString method appends 0x and reverses it (to display as "big endian"). This part is fine to me.
One thing that just got me confused, and I thought I had understood, is related to C# BigInteger. I was pretty sure it worked as big endian. In Microsoft documentation, they claim it works as little endian, which is weird because all my examples work as big endian: https://docs.microsoft.com/en-us/dotnet/api/system.numerics.biginteger?redirectedfrom=MSDN&view=netframework-4.7.2

But even their example is crazy if it's not big endian:

// The example displays the following output:
//       Positive value: 15,777,216
//       C0 BD F0 00
using System;
using System.Numerics;
public class Test {
	public static void Main() 	{
		byte[] b = new byte[]{0x00, 0x01};
		BigInteger bi = new BigInteger(b);
		Console.Write(bi); // 256.  How can it be little endian??
	}
}

Shouldn't little endian store the last elements as the smallest? Isn't the decimal value 01 in this example?

@ixje
Copy link
Contributor

ixje commented Oct 2, 2018

UIntBase works as little endian (at least it looks like), and the ToString method appends 0x and reverses it (to display as "big endian"). This part is fine to me.

Without taking into account what some document says how they differentiate between little- and big-endian, why would it be fine to change the representation of a bytearray that is internally one endianness and then only when represented as a string becomes another? That makes zero sense in my opinion.

I'm going to comment on the BigInteger just once because it deviates from the topic if you ask me.
I agree with MS that it is little-endian and that example looks fine to me

>>> int.from_bytes(bytes.fromhex("C0BDF000"),byteorder='little')
15777216

To quote MS from your link:

For example, 0xC0 0xBD 0xF0 0xFF is the little-endian hexadecimal representation of either -1,000,000 or 4,293,967,296.

I still agree with them on that

>>> int.from_bytes(bytes.fromhex("C0BDF0FF"),byteorder='little')
4293967296

I believe this is the crucial part for your Biginteger question

if you convert byte arrays to BigInteger values, you must consider the order of bytes. The BigInteger structure expects the individual bytes in a byte array to appear in little-endian order

@localhuman
Copy link
Contributor

I would just like to chime in that regardless of the script hash discussion this functionality will be very useful! There's been many times where I wished I could reverse a byte array.

I should note as well that as currently implemented, it will also be able to reverse strings!

@igormcoelho
Copy link
Contributor

igormcoelho commented Oct 2, 2018

@ixje thanks for pointing out this python example... now I'm sure I was understand everything upside down xD hahahah I'll need some time to recover, and to update all my scripts again ;)

And now that you explained me this and I'm convinced BigInteger is little endian, I give you an answer to:

Without taking into account what some document says how they differentiate between little- and big-endian, why would it be fine to change the representation of a bytearray that is internally one endianness and then only when represented as a string becomes another? That makes zero sense in my opinion

I think it makes full sense for me, because numbers are only represented internally as little endians, but for "public" representation they are seen as big endians (that's why I believe this "0x" prefix is quite important to make sure of the big endianess on toString() method).

@shargon In the end, the "reverse" on Address makes a lot of sense, because the ToAddress function takes as parameter a little-endian UInt160:
https://github.com/neo-project/neo/blob/c64748ecbac3baeb8045b16af0d518398a6ced24/neo/Wallets/Helper.cs#L16

Suppose we take an AVM, we apply HASH160 (Sha256 + Ripemd160) and get 0100...0000 (20 bytes), which is number 1 in UInt160. We present ScriptHash as 0x0000...0001 (big-endian visual representation on frontends). So, when we want to create an Address on frontend tools, we just ignore this 0x, reverse the rest (getting original 010000...), merge with Neo hash code and checksum, that's it. So, why this reverse is needed? Because the original scripthash number was little-endian, we transformed it to big-endian to display it (on ToString methods), but transformed back to little-end to create Base58 Address. I'm pretty sure now that's correct.

@igormcoelho
Copy link
Contributor

@ixje I hope this documentation helps clarifying UInt160 for all of us: neo-project/neo#405

@ixje
Copy link
Contributor

ixje commented Oct 3, 2018

@igormcoelho your description only explains that the code is functional. I wasn't doubting that part, but functional != logical. If you look closely at the following text you'll see we do useless back and forth manipulation of the data in the bytearray.

why this reverse is needed? Because the original scripthash number was little-endian, we transformed it to big-endian to display it (on ToString methods), but transformed back to little-end to create Base58 Address.

If we would not do the bold quoted text, then we would have gotten the same results. That's the point I'm trying to make. It just adds unnecessary overhead, has no added value and it's actually confusing as proven by the fact that we're discussing it.

Let me ask you this

Eriks-Air:~ erik$ md5 some_random_file.txt
MD5 (some_random_file.txt) = d41d8cd98f00b204e9800998ecf8427e

Can you tell the endianness of this hash? Do you think a user cares what endianness it is?
I believe the answer is "no" for both questions. Because it actually doesn't matter. The user just sees a bunch of numbers and characters.

How about this:

        public static string ByteArrayToString(byte[] ba)
        {
            StringBuilder hex = new StringBuilder(ba.Length * 2);
            foreach (byte b in ba)
                hex.AppendFormat("{0:x2}", b);
            return hex.ToString();
        }

        public static void Main(string[] args)
        {
            byte[] internal_data = new byte[] { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x01 };
            UInt160 script_hash = new UInt160(internal_data);

            Console.WriteLine(ByteArrayToString(script_hash.ToArray()));
            Console.WriteLine(script_hash.ToString());

            // Output
            //  0000000000000000000000000000000000000001
            //0x0100000000000000000000000000000000000000
        }

Saying the above makes sense is like saying that you can represent the decimal number 5000 as 0x0005 because you can just tell the user that "0x" means you need to reverse number pairs. It seriously can't get any worse than that.

@igormcoelho
Copy link
Contributor

igormcoelho commented Oct 3, 2018

No @ixje, that makes sense, it's just that we learned that in a bad way, but people who learn that right now "correctly" will see it actually makes sense.

  1. The point is that a ScriptHash is a UInt160 number on Neo, and its internal mechanics makes it work as little-endian, you agree on that? For example, ScriptHash UInt160 0100000000000000000000000000000000000000 is in fact the decimal number 1. If you take this UInt160, put inside a BigInteger, and multiply by two, you will need to have decimal 2, but if we invert the logic at this point, internal arithmetic will break.
  2. On the real world, our tools are made for big-endian numbers, that's why ScriptHash is represented as 0x0000000000000000000000000000000000000001. This 0x does not change any hex interpretation, because it is assuming value is big-endian, what is the standard way we do usually. Example: 0x1388 means decimal 5000, but 0x0005 just means decimal 5, as usual. Nothing changed about it.
    Note that I don't agree with that (inverting the endianess), because we don't need to operate bigintegers on our interfaces, so it is just "a bunch of hex digits", and the most natural way was to preserve little-endianess, without any reverse.

So, answering to your question:

Can you tell the endianness of this hash? Do you think a user cares what endianness it is?

No, users don't care, and I don't want to care either. I just needed to go deeper in just to clarify the precise endianess of everything, to be able to explain that no "extra reverse" is being done. In fact, I believe we should start presenting ScriptHash in little-endian, so no reverse will ever be needed. That's what I'm proposing for neon-js team, and every others that use this notation.

In my opinion, ScriptHash should never be exposed to users, we have base-58 Address for that, which is legible and safe. Even Neon Wallet presents scripthash, that's strange, I'll open an issue about it. So even when invoking contracts, we should seriously think on adopting Address for this, because it's much safer and 100% clear (with checksum and other protective stuff).

What do we need to change to accomplish that? Nothing. Because Neo was well designed to put this heavenly-sent "0x" in front of big-endian numbers, so we just abolish this 0x on ScriptHash, we can just use little-endian on our frontends and do not worry to do any reverse. And I'm pretty sure that if C# BigInteger was big-endian we wouldn't be having this conversation, but since we cannot change that, at least we need to know it.

@ixje
Copy link
Contributor

ixje commented Oct 4, 2018

@igormcoelho if you want to discuss it further DM me on Discord. We reason from 2 different perspectives. I reason from a generic computer science perspective and point to unexpected behavior/handling in the NEO implementation. You reason from the NEO implementation perspective.

@igormcoelho
Copy link
Contributor

igormcoelho commented Dec 1, 2018

This problem is currently solved on C# (in PR neo-project/neo-devpack-dotnet#37), you can use Reverse() method.
Example:

public static byte[] Main(byte[] v)
{
    return v.Reverse();
}

In order to work, you need to have latest C# compiler, after this commit: https://github.com/neo-project/neo-compiler/commit/d6e18710a43b1fbd83452f6a3e3f8e8bb5b7fa98

@gsmachado
Copy link

gsmachado commented Mar 24, 2022

Guys, we are in 2022 and big endian and little endian in Neo still confuses me. ❤️

Especially the decision to add "0x" as the prefix to represent big endian not being a standard in the software industry....... 🤣 😄

@roman-khimov
Copy link
Contributor

Each time we have some issue with endianness I link neo-project/neo#938. The list of links grows with time. But I don't know how can we fix it now, N3 is out and it works the way it works.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

8 participants