Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add several varint prefix codes + lenghts for Holochain IDs #111

Merged
merged 11 commits into from Feb 12, 2019
Merged

Add several varint prefix codes + lenghts for Holochain IDs #111

merged 11 commits into from Feb 12, 2019

Conversation

pjkundert
Copy link
Contributor

@pjkundert pjkundert commented Jan 16, 2019

Holochain is an open source framework for building fully distributed, peer-to-peer applications. See https://holo.host for more background.

We will be launching in the next few months, and would like our Holochain node Addresses to have the 0x86 prefix, if possible. That, along with our 70-byte long core keys+parity data will result in a multicodec prefix of 0x8646. When base-64 encoded, this yields Addresses with prefix "hkZ...", which is useful for our users to help them quickly identify potential Holochain Addresses.

Thanks for your consideration!

@pjkundert
Copy link
Contributor Author

Just a quick query to see if there is anything we can do to help this move ahead... Do you have any recommendations?

Thanks!

@vmx
Copy link
Member

vmx commented Jan 25, 2019

It would be great if you could pick number that is > 0x80. We'd like to keep some numbers that can be encoded into a single byte varint for widely used codecs.

@pjkundert
Copy link
Contributor Author

Thanks @vmx; I've changed the multicodec prefix from 0x1E to 0x86, which will also work for us.

@pjkundert pjkundert changed the title Add the 0x1e prefix code for Holochain Addresses Add the 0x86 prefix code for Holochain Addresses Jan 25, 2019
@neonphog
Copy link

fyi - here is a test suite exercising this multihash format, including reed-solomon ecc: https://github.com/holochain/n3h/blob/master/packages/hc-dpki/lib/util.test.js

An example agent address:

hkZsq1BnQrinojTyYwWD3uhxVzJrz13sFl6Y3AAn5pG0j1Ghj8Kye5aNX5-7THvIoC76armV_cyNSSeffI0O6gp8WYxE1vI-

@pjkundert
Copy link
Contributor Author

pjkundert commented Jan 31, 2019

Hi, @Stebalien, @vmx;

Thanks for considering this addition; we really appreciate your time and effort on this project. Any more recommendations for us, to make this easier for you to merge?

Copy link
Member

@Stebalien Stebalien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Given that this is now in the two-byte range, I don't see any reason to debate this.

@Stebalien
Copy link
Member

@neonphog Is this really a multihash format?

@Stebalien
Copy link
Member

Stebalien commented Jan 31, 2019

We will be launching in the next few months, and would like our Holochain node Addresses to have the 0x86 prefix, if possible. That, along with our 70-byte long core keys+parity data will result in a multicodec prefix of 0x8646

Note: Prefixes are usually encoded using 128 bit varints. That way one prefix won't be a prefix of another prefix. 0x86 is going to encode to [0x86, 0x01] or hgE= (which, IIRC, will change depending on the following byte).

FYI, you can get base64 encoded prefix of "holo" if you use 0x0d0243. This will encode to a 3 byte varint: [0x86, 0x89, 0x68]. This should also be stable because base64 encodes every 3 byte string to a 4 byte string.

@Stebalien
Copy link
Member

You may also want to consider using multibase to allow alternative encodings. However, that may cause some issues with vanity prefixes.

@neonphog
Copy link

@Stebalien

@neonphog Is this really a multihash format?

I see what you are saying. You are correct, this is not a hash. It is an identifier comprised of two 256 bit (32 byte) public keys, and 48 bits (6 bytes) of reed-solomon parity data.

But these IDs will be used to address agent entries in a content addressable store that also includes data entries that will be keyed using sha256 hashes. It will be useful for us to easily identify the "hash" type.

@neonphog
Copy link

FYI, you can get base64 encoded prefix of "holo" if you use 0x0d0243

Thanks for the suggestion! Discussing this now 👍

@pjkundert
Copy link
Contributor Author

@Stebalien We are having trouble confirming that 0x0d0243 converts into a varint as [0x86, 0x89, 0x68]; can you recommend an implementation of Varint (eg. in Python, Go, Rust, Javascript, ...) that we can confirm that with?

Also; does this mean that we would actually reserve 0x0d0243 in table.csv?

@Stebalien
Copy link
Member

Thanks for checking that! I was encoding it as a signed varint, not an unsigned varint. The code should have been 0x1a0486 which encodes to [0x86, 0x89', 0x68].

Code:

package main

import (
	"encoding/base64"
	"encoding/binary"
	"fmt"
)

func CodeToBase64(code uint64) string {
	buf := make([]byte, binary.MaxVarintLen64)
	len := binary.PutUvarint(buf, code)
	buf = buf[:len]
	return base64.StdEncoding.EncodeToString(buf)
}

func Base64ToCode(s string) (uint64, error) {
	buf, err := base64.StdEncoding.DecodeString(s)
	if err != nil {
		return 0, err
	}
	code, n := binary.Uvarint(buf)
	if n == 0 {
		return 0, fmt.Errorf("code too large")
	} else if n != len(buf) {
		return 0, fmt.Errorf("invalid varint")
	}
	return code, nil
}

func main() {
	fmt.Println(CodeToBase64(0x1a0486))
	fmt.Println(Base64ToCode("holo"))
	fmt.Println(base64.StdEncoding.DecodeString("holo"))
}

Also; does this mean that we would actually reserve 0x0d0243 in table.csv?

Yes. Well, 0x1a0486.

@pjkundert pjkundert changed the title Add the 0x86 prefix code for Holochain Addresses Add several varint prefix codes + lenghts for Holochain IDs Feb 4, 2019
@pjkundert
Copy link
Contributor Author

Sorry, don't merge yet -- still not correct!

@vmx
Copy link
Member

vmx commented Feb 4, 2019

@pjkundert As you are playing around with getting a nice prefix, I'd just like to mention that IPFS is moving towards base32 encoding things (ipfs/kubo#4143), you might want to take that into account.

@pjkundert
Copy link
Contributor Author

Yes, base-32 is nice; also, because we can add Reed-Solomon error detection/correction using 5-bit symbols directly to end of the base-32 hash (instead of to the original payload), providing greater error/erasure detection and correction power per R-S symbol (because erroneous input symbols are not blurred across multiple Reed-Solomon symbols). Base64 URL encoding would also allow this, with 6-bit R-S symbols -- but the inclusion of -_ in the symbol set was just too hard to handle, for keys that users are going to occasionally copy and paste.

Our Holochain keys are already very large (102 Base58 symbols), and further expansion with a less powerful encoding would make them just too large.

@pjkundert
Copy link
Contributor Author

Sheesh! No joy. Do not merge. This approach won't work for us, either. Base58 propagates errors into all subsequent octets output during a decode -- error correction built into the underlying payload is worthless...

@pjkundert
Copy link
Contributor Author

pjkundert commented Feb 7, 2019

At long last... We've validated the round-tripping of these Varint prefixes with our encoding, and everything is good to merge! Thanks so much for your patience...

@vmx We did end up settling on Base-32 with 63-symbol encoded values, so that we could use our values as DNS labels.

@Stebalien
Copy link
Member

I'll leave this open for a few days in case someone else wants to chime in. Please bug me if I don't merge it within a week.

@Stebalien Stebalien merged commit 1c8ef85 into multiformats:master Feb 12, 2019
@pjkundert pjkundert deleted the feature-holochain-address branch February 12, 2019 02:05
@pjkundert
Copy link
Contributor Author

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants