Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compress public key for chat #1937

Closed
hesterbruikman opened this issue Apr 14, 2020 · 50 comments
Closed

Compress public key for chat #1937

hesterbruikman opened this issue Apr 14, 2020 · 50 comments
Assignees

Comments

@hesterbruikman
Copy link
Contributor

hesterbruikman commented Apr 14, 2020

Problem

Chat key is currently shown (as used) as the public key in full. This is unnecessarily daunting. Compressing the public key used for chat can make it less daunting, more space efficient in the UI as well as reducing file size and visual complexity of the QR code.

Implementation

Implement two functions:

  • decompress public key - anywhere the full public key is required
  • request compressed key - places where the key is shown or shared in the UI

Use 02 and 03 for compressed and maintain 04 for decompressed key

status-react should always receive the decompressed key (no changes to current implementation) unless the compressed key is requested. Changes for this will be documented in a separate issue and include:

  • Profile > Share icon
  • Invite friends url from Invite friends button and + -button > Invite friends
  • Other user Profile details and Profile details > Share icon

Acceptance Criteria

Documentation is updated if needed: https://specs.status.im/spec/2

Notes

for guidance:
https://stackoverflow.com/questions/43629265/deriving-an-ecdsa-uncompressed-public-key-from-a-compressed-one
https://bitcoin.stackexchange.com/questions/69315/how-are-compressed-pubkeys-generated
https://crypto.stackexchange.com/questions/8914/ecdsa-compressed-public-key-point-back-to-uncompressed-public-key-point

Future Steps

Implementation in status-react to request compressed key

@gravityblast
Copy link
Member

gravityblast commented Apr 14, 2020

for reference, since we use go-ethereum we can use its functions to compress and decompress keys:

https://github.com/ethereum/go-ethereum/blob/master/crypto/secp256k1/secp256.go#L126

@Samyoul
Copy link
Member

Samyoul commented Apr 28, 2020

Some Code

A little update on this. I've spent a little bit of time experimenting with this and here is the play code I wrote:

func Test(t *testing.T) {
	pk := "04261c55675e55ff25edb50b345cfb3a3f35f60712d251cbaaab97bd50054c6ebc3cd4e22200c68daf7493e1f8da6a190a68a671e2d3977809612424c7c3888bc6"

	// decode hex string to bytes
	pkb, err := hex.DecodeString(pk)
	if err != nil {
		t.Error(err)
	}

	// encode bytes in to variously base encodings
	b32pk := base32.StdEncoding.EncodeToString(pkb)
	b58pk := base58.Encode(pkb)
	b64pk := base64.StdEncoding.EncodeToString(pkb)

	// create crypto public key from decoded bytes
	x, y := elliptic.Unmarshal(secp256k1.S256(), pkb)

	// compress public key
	cpkb := secp256k1.CompressPubkey(x, y)

	// encode compressed to hex string
	cpk := hex.EncodeToString(cpkb)

	// encode compressed bytes in to variously base encodings
	b32cpk := base32.StdEncoding.EncodeToString(cpkb)
	b58cpk := base58.Encode(cpkb)
	b64cpk := base64.StdEncoding.EncodeToString(cpkb)

	// Dump values
	spew.Dump(
		pk,
		b32pk,
		b58pk,
		b64pk,

		cpk,
		b32cpk,
		b58cpk,
		b64cpk,
	)

	// decode base58 compressed public key
	b58cpkb := base58.Decode(b58cpk)

	// decompress decoded base58 compressed public key
	dx, dy := secp256k1.DecompressPubkey(b58cpkb)

	// marshall decompressed public key
	mpk := elliptic.Marshal(secp256k1.S256(), dx, dy)

	// decode marshalled decompressed public key
	dpk := hex.EncodeToString(mpk)
	
	spew.Dump(dpk)
}

The Results

Length String Description
130 04261c55675e55ff25edb50b345cfb3a3f35f60712d251cbaaab97bd50054c6ebc3cd4e22200c68daf7493e1f8da6a190a68a671e2d3977809612424c7c3888bc6 My starting point, using my public key
104 AQTBYVLHLZK76JPNWUFTIXH3HI7TL5QHCLJFDS5KVOL32UAFJRXLYPGU4IRABRUNV52JHYPY3JVBSCTIUZY6FU4XPAEWCJBEY7BYRC6G base32 encoded public key
88 NEdXDe56CBU5hkxoCd2cnBZytiE5t2QrVT4YnUvcG4E6UABMr95f5ugLEKbmSv2ayspqb9mk3U2u3kceyRoFhMK3 base58 encoded public key
88 BCYcVWdeVf8l7bULNFz7Oj819gcS0lHLqquXvVAFTG68PNTiIgDGja90k+H42moZCmimceLTl3gJYSQkx8OIi8Y base64 encoded public key
66 02261c55675e55ff25edb50b345cfb3a3f35f60712d251cbaaab97bd50054c6ebc hex encoded compressed public key
56 AITBYVLHLZK76JPNWUFTIXH3HI7TL5QHCLJFDS5KVOL32UAFJRXLY=== base32 encoded compressed public key
44 e2QHwp5qjYj6i3jTCfzKVdB1k1dy7NDuoRngzTrARkpT base58 encoded compressed public key
44 AiYcVWdeVf8l7bULNFz7Oj819gcS0lHLqquXvVAFTG68 base64 encoded compressed public key
130 04261c55675e55ff25edb50b345cfb3a3f35f60712d251cbaaab97bd50054c6ebc3cd4e22200c68daf7493e1f8da6a190a68a671e2d3977809612424c7c3888bc6 decoded, decompressed, marshalled hex decoded string. Which matches the starting public key 🎉

QR Code Comparison

Uncompressed hex encoded QR code with full https URL

uncompressed_hex_encoded

Compressed base58 encoded QR code with full https URL

compressed_base58_encoded (1)

Compressed base58 encoded QR code text only public key

compressed_base58_encoded_text

Summary

We can reduce the size of the key from 130 characters to 44 characters in this sample case. Meaning a reduction in size of 66%!

@gravityblast
Copy link
Member

good job! I think the only one we need is the normal hex format, so it's basically the original without the y value and the 02 or 03 added as prefix

@Samyoul
Copy link
Member

Samyoul commented Apr 28, 2020

I think displaying the compressed key as a hex encoded string would allow the key to keep its "Ethereumy" appearance, but if our aim is to reduce the size of the QR code as much as possible then the best choice in my mind is to use the compressed base58 encoded key.

A compromise approach would be to display the hex encoding in plain text and show the base58 encoded QR code. Something like below:

0x02261c55675e55ff25edb50b345cfb3a3f35f60712d251cbaaab97bd50054c6ebc

@gravityblast
Copy link
Member

gravityblast commented Apr 28, 2020

I think having the standard hex format is important for compatibility. for the qrcode we can ask @hesterbruikman to understand how compressed it should be. In general I think if we can keep it in hex it can be easily compatible with other clients.

@gravityblast
Copy link
Member

anyway good ideas and good job! I think we can decide this thing and implement it easily!

@hesterbruikman
Copy link
Contributor Author

Compatibility is important.... Especially if we are to ever sync with other identity solutions (afaik there's no standards there yet really. Not sure what's going on with Iden3, but I imagine we want to keep some doors open).

Let me check what this image compression can get us and follow up. This is already a huge step and improvement to legibility and url sharing

@Samyoul
Copy link
Member

Samyoul commented Apr 28, 2020

@Samyoul
Copy link
Member

Samyoul commented Apr 28, 2020

FYI. We can do a lot of things to make the QR less complex looking, but will require additional specs. But once the specs are in place there should be very little compatibility problems. Decoding hex vs decoding base58 is a fairly minor issue, and I don't see any developers struggling with the concept. But we can have that discussion.

In the meantime here is a brief overview of my ideas for reducing the QR code complexity.

  • Reduce the QR code's error correction to L
  • Encode the key using base58 🧙
  • Only include the key in plain text, without the URL data. Example e2QHwp5qjYj6i3jTCfzKVdB1k1dy7NDuoRngzTrARkpT rather than https://join.status.im/user/e2QHwp5qjYj6i3jTCfzKVdB1k1dy7NDuoRngzTrARkpT
All the compressions! Compressed hex encoded M error correction Compressed hex encoded L error correction

@hesterbruikman
Copy link
Contributor Author

Sorry for the delay on this @Samyoul !

@andremedeiros can you please take a look as well for review of the compression approach and planning based on effort required to implement?

Your brief looks good to me @Samyoul, with the most compressed version, given that this is documented well compatibility with any applications or interfaces requiring the full key.

I think we should leave the full url as we don't know what application will scan the QR and there's logic in there to drive Status installs

@Samyoul
Copy link
Member

Samyoul commented May 6, 2020

No problem. I just wondered if there any blockers I wasn't aware of.

API

There needs to be some clarity on the API signature.

  • decompress public key - anywhere the full public key is required
  • request compressed key - places where the key is shown or shared in the UI

In my mind the functions should be:

func DecompressPublicKey(compressedPublicKey string) publicKey string {...}
func CompressPublicKey(publicKey string) compressedPublicKey string {...}

This would allow the API to compress and decompress any given public key and not just the key(s) belonging to the user / client.

I'd estimate implementing the code for this in status-go would take about 1.5 hours with tests. It isn't a hard job at all.

Specification

If we want to compress the QR code as much as possible, and retain the URL, then I'd first implement a compressed base58 encoded public key with a modified url.

From :

https://join.status.im/u/e2QHwp5qjYj6i3jTCfzKVdB1k1dy7NDuoRngzTrARkpT

To:

https://j.status.im/u/e2QHwp5qjYj6i3jTCfzKVdB1k1dy7NDuoRngzTrARkpT

Removing an additional (3 * 8 =) 24 data blocks from the QR code.

Set the QR code error correction to L.

I'd estimate that writing the initial specs for this would take about 1 hour, although I don't know how long the review would take. I suppose it depends on how well the specs are written and how much consensus there is around the new specs.

Infra

We'd need to register a new j subdomain on the DNS and set up redirection to the join subdomain.

Additionally we'd need to make a change to the website so that it could interpret the new compressed public key.

I do not think this would take much time, but @jakubgs would be the best estimation of how much time this would take.

@jakubgs
Copy link
Member

jakubgs commented May 6, 2020

Domain redirect is trivial. Parsing of compressed keys seems also trivial if all we're doing is just making it into a base58. Probably a few hours of work including updating tests.

@jakubgs jakubgs closed this as completed May 6, 2020
@jakubgs jakubgs reopened this May 6, 2020
@jakubgs
Copy link
Member

jakubgs commented May 6, 2020

Mis-clicks, sorry. But there's one tiny clash we would have which I don't think matters.

Right now the /u/ path supports both keys as /u/0x04... and ENS names like /u/jakubgs. Now, if the base58 string is always 45 characters long then if someone had a 45 character long ENS name that could cause it to be treated as a compressed key. But that a very edgy edge case.

Oh, and if the compressed key is mis-copied as for example 1 character short it will be treated as ENS name too.

@Samyoul
Copy link
Member

Samyoul commented May 6, 2020

Consistent key length is an interesting consideration. From my experimenting the compressed keys are 44 chars, but I will generate 100,000+ keys and see if any produce a length that is not 44 chars. I will publish the results here after I've done this.

if the base58 string is always 45 characters long then if someone had a 45 character long ENS name that could cause it to be treated as a compressed key

Interesting point. I think that if we use the following process flow we should have few problems.

  • check string length
  • if len == 132 - Parse as a standard key
  • else - Perform ENS lookup (possibly introducing an additional Ethereum node call per URL hit, without caching.)
    • If found parse as ENS name
    • else parse as compressed public key
      • if parse succeeds present results
      • else fail gracefully

By parsing as an ENS first we should reduce the chances of collision significantly. Although there is still the chance that someone's ENS name is exactly the same as a compressed public key.

if the compressed key is mis-copied as for example 1 character short it will be treated as ENS name too.

This should hopefully not be a problem as the compressed key would only be used with QR codes. But if in some future application the compressed key isn't only used with QR codes, then perhaps we can formulate a checksum for the compressed keys.

@jakubgs
Copy link
Member

jakubgs commented May 6, 2020

We don't do ENS lookups on the universal links backend.

@Samyoul
Copy link
Member

Samyoul commented May 6, 2020

Would it be a problem to introduce ENS lookups?

If we want to implement a version of ID parsing that doesn't require outbound calls, we could do the following:

  • check string length
  • if len == 132 - Parse as a standard key
  • else attempt to parse as compressed public key
    • if parse succeeds present results
    • else - presume it is an ENS name

@jakubgs
Copy link
Member

jakubgs commented May 6, 2020

Yes it would be. Because right now the universal links service doesn't depend on any other service. It's stateless, which keeps it simple, robust, and stable. By introducing something like ENS lookup you introduce complexity, delays due to slow lookups which result in delays in rendering the page, and so on. Sounds like a bad idea to me.

And the flow you proposed makes sense. Except we attempt to parse only if it's 44 chars as you said.

@Samyoul
Copy link
Member

Samyoul commented May 6, 2020

Ok, I will probably have the tests on the public keys done some time tomorrow. So I can see if there is any deviation from 44 chars, in that case we can implement some additional code to handle that. Hypothetically if the variation is +-2 chars we can do something like len >= 44-2 && len <= 44+2. I'll know more after the tests.

@Samyoul
Copy link
Member

Samyoul commented May 6, 2020

Test results:

I ran a test on 10 million random public keys and the summary of the tests were as follows:

Number of Chars Count
44 6,832,205
45 3,167,795

So 68% of the time keys will be 44 chars long and 32% of the time keys will be 45 chars long. There were no other compressed key lengths.

The code
func Test(t *testing.T) {
	tot := map[int]int{}
	for j := 0; j < 100; j++{
		fmt.Printf("test number : %d\n", j+1)

		out := map[int]int{}
		for i := 0; i < 100000; i++ {
			_, x, y, _ := elliptic.GenerateKey(secp256k1.S256(), rand.Reader)
			cpk := compress(x, y)
			l := len(cpk)

			out[l]++
			tot[l]++
		}
		spew.Dump(out)
	}
	spew.Dump(tot)
}

func compress(x, y *big.Int) string {
	// compress public key
	cpkb := secp256k1.CompressPubkey(x, y)

	// encode compressed bytes in to base58 encoding
	return base58.Encode(cpkb)
}
Full test results
=== RUN   Test
test number : 1
(map[int]int) (len=2) {
 (int) 44: (int) 68120,
 (int) 45: (int) 31880
}
test number : 2
(map[int]int) (len=2) {
 (int) 44: (int) 68264,
 (int) 45: (int) 31736
}
test number : 3
(map[int]int) (len=2) {
 (int) 44: (int) 68454,
 (int) 45: (int) 31546
}
test number : 4
(map[int]int) (len=2) {
 (int) 45: (int) 31514,
 (int) 44: (int) 68486
}
test number : 5
(map[int]int) (len=2) {
 (int) 44: (int) 68148,
 (int) 45: (int) 31852
}
test number : 6
(map[int]int) (len=2) {
 (int) 44: (int) 68537,
 (int) 45: (int) 31463
}
test number : 7
(map[int]int) (len=2) {
 (int) 44: (int) 68263,
 (int) 45: (int) 31737
}
test number : 8
(map[int]int) (len=2) {
 (int) 44: (int) 68233,
 (int) 45: (int) 31767
}
test number : 9
(map[int]int) (len=2) {
 (int) 44: (int) 68076,
 (int) 45: (int) 31924
}
test number : 10
(map[int]int) (len=2) {
 (int) 45: (int) 31985,
 (int) 44: (int) 68015
}
test number : 11
(map[int]int) (len=2) {
 (int) 44: (int) 68171,
 (int) 45: (int) 31829
}
test number : 12
(map[int]int) (len=2) {
 (int) 44: (int) 68245,
 (int) 45: (int) 31755
}
test number : 13
(map[int]int) (len=2) {
 (int) 45: (int) 31722,
 (int) 44: (int) 68278
}
test number : 14
(map[int]int) (len=2) {
 (int) 45: (int) 31643,
 (int) 44: (int) 68357
}
test number : 15
(map[int]int) (len=2) {
 (int) 44: (int) 68158,
 (int) 45: (int) 31842
}
test number : 16
(map[int]int) (len=2) {
 (int) 45: (int) 31609,
 (int) 44: (int) 68391
}
test number : 17
(map[int]int) (len=2) {
 (int) 44: (int) 68366,
 (int) 45: (int) 31634
}
test number : 18
(map[int]int) (len=2) {
 (int) 44: (int) 68748,
 (int) 45: (int) 31252
}
test number : 19
(map[int]int) (len=2) {
 (int) 45: (int) 31997,
 (int) 44: (int) 68003
}
test number : 20
(map[int]int) (len=2) {
 (int) 44: (int) 68730,
 (int) 45: (int) 31270
}
test number : 21
(map[int]int) (len=2) {
 (int) 45: (int) 31467,
 (int) 44: (int) 68533
}
test number : 22
(map[int]int) (len=2) {
 (int) 45: (int) 31730,
 (int) 44: (int) 68270
}
test number : 23
(map[int]int) (len=2) {
 (int) 44: (int) 68377,
 (int) 45: (int) 31623
}
test number : 24
(map[int]int) (len=2) {
 (int) 44: (int) 68485,
 (int) 45: (int) 31515
}
test number : 25
(map[int]int) (len=2) {
 (int) 45: (int) 31671,
 (int) 44: (int) 68329
}
test number : 26
(map[int]int) (len=2) {
 (int) 44: (int) 68449,
 (int) 45: (int) 31551
}
test number : 27
(map[int]int) (len=2) {
 (int) 44: (int) 68079,
 (int) 45: (int) 31921
}
test number : 28
(map[int]int) (len=2) {
 (int) 45: (int) 31814,
 (int) 44: (int) 68186
}
test number : 29
(map[int]int) (len=2) {
 (int) 44: (int) 68372,
 (int) 45: (int) 31628
}
test number : 30
(map[int]int) (len=2) {
 (int) 44: (int) 68262,
 (int) 45: (int) 31738
}
test number : 31
(map[int]int) (len=2) {
 (int) 45: (int) 31430,
 (int) 44: (int) 68570
}
test number : 32
(map[int]int) (len=2) {
 (int) 44: (int) 68280,
 (int) 45: (int) 31720
}
test number : 33
(map[int]int) (len=2) {
 (int) 45: (int) 31567,
 (int) 44: (int) 68433
}
test number : 34
(map[int]int) (len=2) {
 (int) 44: (int) 68190,
 (int) 45: (int) 31810
}
test number : 35
(map[int]int) (len=2) {
 (int) 44: (int) 68244,
 (int) 45: (int) 31756
}
test number : 36
(map[int]int) (len=2) {
 (int) 45: (int) 31762,
 (int) 44: (int) 68238
}
test number : 37
(map[int]int) (len=2) {
 (int) 45: (int) 31642,
 (int) 44: (int) 68358
}
test number : 38
(map[int]int) (len=2) {
 (int) 45: (int) 31840,
 (int) 44: (int) 68160
}
test number : 39
(map[int]int) (len=2) {
 (int) 45: (int) 31477,
 (int) 44: (int) 68523
}
test number : 40
(map[int]int) (len=2) {
 (int) 44: (int) 68126,
 (int) 45: (int) 31874
}
test number : 41
(map[int]int) (len=2) {
 (int) 45: (int) 31738,
 (int) 44: (int) 68262
}
test number : 42
(map[int]int) (len=2) {
 (int) 44: (int) 68273,
 (int) 45: (int) 31727
}
test number : 43
(map[int]int) (len=2) {
 (int) 44: (int) 68181,
 (int) 45: (int) 31819
}
test number : 44
(map[int]int) (len=2) {
 (int) 44: (int) 68313,
 (int) 45: (int) 31687
}
test number : 45
(map[int]int) (len=2) {
 (int) 44: (int) 68308,
 (int) 45: (int) 31692
}
test number : 46
(map[int]int) (len=2) {
 (int) 44: (int) 68469,
 (int) 45: (int) 31531
}
test number : 47
(map[int]int) (len=2) {
 (int) 45: (int) 31499,
 (int) 44: (int) 68501
}
test number : 48
(map[int]int) (len=2) {
 (int) 44: (int) 68236,
 (int) 45: (int) 31764
}
test number : 49
(map[int]int) (len=2) {
 (int) 44: (int) 68458,
 (int) 45: (int) 31542
}
test number : 50
(map[int]int) (len=2) {
 (int) 44: (int) 68008,
 (int) 45: (int) 31992
}
test number : 51
(map[int]int) (len=2) {
 (int) 44: (int) 68273,
 (int) 45: (int) 31727
}
test number : 52
(map[int]int) (len=2) {
 (int) 44: (int) 68632,
 (int) 45: (int) 31368
}
test number : 53
(map[int]int) (len=2) {
 (int) 44: (int) 68190,
 (int) 45: (int) 31810
}
test number : 54
(map[int]int) (len=2) {
 (int) 45: (int) 32017,
 (int) 44: (int) 67983
}
test number : 55
(map[int]int) (len=2) {
 (int) 44: (int) 68398,
 (int) 45: (int) 31602
}
test number : 56
(map[int]int) (len=2) {
 (int) 44: (int) 68505,
 (int) 45: (int) 31495
}
test number : 57
(map[int]int) (len=2) {
 (int) 44: (int) 68353,
 (int) 45: (int) 31647
}
test number : 58
(map[int]int) (len=2) {
 (int) 44: (int) 68197,
 (int) 45: (int) 31803
}
test number : 59
(map[int]int) (len=2) {
 (int) 44: (int) 68380,
 (int) 45: (int) 31620
}
test number : 60
(map[int]int) (len=2) {
 (int) 44: (int) 68535,
 (int) 45: (int) 31465
}
test number : 61
(map[int]int) (len=2) {
 (int) 44: (int) 68260,
 (int) 45: (int) 31740
}
test number : 62
(map[int]int) (len=2) {
 (int) 45: (int) 31676,
 (int) 44: (int) 68324
}
test number : 63
(map[int]int) (len=2) {
 (int) 45: (int) 31808,
 (int) 44: (int) 68192
}
test number : 64
(map[int]int) (len=2) {
 (int) 45: (int) 31629,
 (int) 44: (int) 68371
}
test number : 65
(map[int]int) (len=2) {
 (int) 44: (int) 68224,
 (int) 45: (int) 31776
}
test number : 66
(map[int]int) (len=2) {
 (int) 45: (int) 31634,
 (int) 44: (int) 68366
}
test number : 67
(map[int]int) (len=2) {
 (int) 44: (int) 68382,
 (int) 45: (int) 31618
}
test number : 68
(map[int]int) (len=2) {
 (int) 45: (int) 31758,
 (int) 44: (int) 68242
}
test number : 69
(map[int]int) (len=2) {
 (int) 44: (int) 68447,
 (int) 45: (int) 31553
}
test number : 70
(map[int]int) (len=2) {
 (int) 44: (int) 68531,
 (int) 45: (int) 31469
}
test number : 71
(map[int]int) (len=2) {
 (int) 45: (int) 31787,
 (int) 44: (int) 68213
}
test number : 72
(map[int]int) (len=2) {
 (int) 45: (int) 31718,
 (int) 44: (int) 68282
}
test number : 73
(map[int]int) (len=2) {
 (int) 44: (int) 68171,
 (int) 45: (int) 31829
}
test number : 74
(map[int]int) (len=2) {
 (int) 44: (int) 68098,
 (int) 45: (int) 31902
}
test number : 75
(map[int]int) (len=2) {
 (int) 44: (int) 68461,
 (int) 45: (int) 31539
}
test number : 76
(map[int]int) (len=2) {
 (int) 44: (int) 68458,
 (int) 45: (int) 31542
}
test number : 77
(map[int]int) (len=2) {
 (int) 45: (int) 31555,
 (int) 44: (int) 68445
}
test number : 78
(map[int]int) (len=2) {
 (int) 44: (int) 68277,
 (int) 45: (int) 31723
}
test number : 79
(map[int]int) (len=2) {
 (int) 44: (int) 68355,
 (int) 45: (int) 31645
}
test number : 80
(map[int]int) (len=2) {
 (int) 44: (int) 68448,
 (int) 45: (int) 31552
}
test number : 81
(map[int]int) (len=2) {
 (int) 45: (int) 31461,
 (int) 44: (int) 68539
}
test number : 82
(map[int]int) (len=2) {
 (int) 44: (int) 68312,
 (int) 45: (int) 31688
}
test number : 83
(map[int]int) (len=2) {
 (int) 44: (int) 68169,
 (int) 45: (int) 31831
}
test number : 84
(map[int]int) (len=2) {
 (int) 44: (int) 68291,
 (int) 45: (int) 31709
}
test number : 85
(map[int]int) (len=2) {
 (int) 45: (int) 31753,
 (int) 44: (int) 68247
}
test number : 86
(map[int]int) (len=2) {
 (int) 45: (int) 31732,
 (int) 44: (int) 68268
}
test number : 87
(map[int]int) (len=2) {
 (int) 44: (int) 68293,
 (int) 45: (int) 31707
}
test number : 88
(map[int]int) (len=2) {
 (int) 45: (int) 31604,
 (int) 44: (int) 68396
}
test number : 89
(map[int]int) (len=2) {
 (int) 45: (int) 31471,
 (int) 44: (int) 68529
}
test number : 90
(map[int]int) (len=2) {
 (int) 44: (int) 68254,
 (int) 45: (int) 31746
}
test number : 91
(map[int]int) (len=2) {
 (int) 44: (int) 68395,
 (int) 45: (int) 31605
}
test number : 92
(map[int]int) (len=2) {
 (int) 44: (int) 68492,
 (int) 45: (int) 31508
}
test number : 93
(map[int]int) (len=2) {
 (int) 45: (int) 31765,
 (int) 44: (int) 68235
}
test number : 94
(map[int]int) (len=2) {
 (int) 44: (int) 68230,
 (int) 45: (int) 31770
}
test number : 95
(map[int]int) (len=2) {
 (int) 45: (int) 31769,
 (int) 44: (int) 68231
}
test number : 96
(map[int]int) (len=2) {
 (int) 45: (int) 31599,
 (int) 44: (int) 68401
}
test number : 97
(map[int]int) (len=2) {
 (int) 44: (int) 68435,
 (int) 45: (int) 31565
}
test number : 98
(map[int]int) (len=2) {
 (int) 44: (int) 68458,
 (int) 45: (int) 31542
}
test number : 99
(map[int]int) (len=2) {
 (int) 44: (int) 68375,
 (int) 45: (int) 31625
}
test number : 100
(map[int]int) (len=2) {
 (int) 44: (int) 68146,
 (int) 45: (int) 31854
}
(map[int]int) (len=2) {
 (int) 44: (int) 6832205,
 (int) 45: (int) 3167795
}
--- PASS: Test (3641.31s)
PASS

@hesterbruikman
Copy link
Contributor Author

hesterbruikman commented May 7, 2020

That sounds solid. @Samyoul @jakubgs @gravityblast what do you think? Seems to me like this is the proposal:

  • Reduce the QR code's error correction to L
  • Encode the key using base58, resulting in 44 character keys (instead of 128)
  • Use of j.status.im instead of join.status.im to further shorten url. Result: https://j.status.im/u/e2QHwp5qjYj6i3jTCfzKVdB1k1dy7NDuoRngzTrARkpT
  • join.status.im checks for key length and parses as regular or compressed key

Result: https://join.status.im/u/e2QHwp5qjYj6i3jTCfzKVdB1k1dy7NDuoRngzTrARkpT

@yenda This issue looks at compression of the chat key to shorten url and decrease QR code size. Do you foresee any issues requesting compressed keys in status-react? status-im/status-mobile#10325

@hesterbruikman
Copy link
Contributor Author

also @jakubgs can join.status.im and j.status.im be used in parallel? As Keycard is looking at printing QR codes

@jakubgs
Copy link
Member

jakubgs commented May 7, 2020

Using j.status.im will mean that the app won't open from it if it's installed. The domain implemented in the App is join.status.im. That means that I can only do an HTTP redirect from j.status.im to join.status.im, which means the user will have to open the browser first, click on the button, and then it will take him to the app.

@hesterbruikman
Copy link
Contributor Author

Ok, sounds like it's better to leave out that change as we already have enough challenges getting universal link behavior in line across platforms

@Samyoul
Copy link
Member

Samyoul commented May 7, 2020

I hate iOS

@Samyoul
Copy link
Member

Samyoul commented May 7, 2020

@status-im/status-core-ui Does anyone know if it is possible to register more than one app url in an iOS app?

@Samyoul
Copy link
Member

Samyoul commented May 7, 2020

I've just been working in the status specs and I noticed the 3 word pseudonym.

We'd have to ensure that parsing the compressed public key also derives the 3 word pseudonym. Presumably not an issue.

@jakubgs
Copy link
Member

jakubgs commented May 7, 2020

There's https://github.com/status-im/js-status-chat-name which does that for normal public keys, it would be trivial to add a method for doing the same for base58 encoded version.

@Samyoul
Copy link
Member

Samyoul commented May 13, 2020

@status-im/status-core Could you confirm the following:

  • Does the iOS app implement something similar to the Android intent filter for opening apps from a browser URL? (example, navigating to join.status.im/0x... opens the app)
  • Is it possible to register more than one URL and have the intent functionality work for both URLs?

Yesterday @andremedeiros said he'd be surprised if this functionality wasn't in place but could I get a solid confirmation from anyone that knows for certain?
Chat from yesterday

Thank you 😄

@jakubgs
Copy link
Member

jakubgs commented May 13, 2020

The first point is obviously true. We already have that working.

@Samyoul
Copy link
Member

Samyoul commented May 21, 2020

I did a little bit of digging and it seems possible to have multiple url > app links with iOS.

In the current code we have:

https://github.com/status-im/status-react/blob/c68f734b983d9366d63dd87b42e07934674312ea/ios/StatusIm/StatusIm.entitlements

  
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
	<key>aps-environment</key>
	<string>development</string>
	<key>com.apple.developer.associated-domains</key>
	<array>
		<string>applinks:join.status.im</string>
	</array>
	<key>keychain-access-groups</key>
	<array>
		<string>$(AppIdentifierPrefix)im.status.ethereum</string>
	</array>
</dict>
</plist>

Which according to this https://stackoverflow.com/questions/39959347/mutiple-associated-domains-with-one-ios-app , allows us to add multiple applink

@0xc1c4da
Copy link
Contributor

0xc1c4da commented May 25, 2020

@hesterbruikman has let me know @Samyoul @cammellos had raised an issue if introducing compression is worth introducing a 'breaking change'.

The meta-problem here is we don't know how to handle breaking changes. This could be a great opportunity for the team to learn how to do this which is going to make life easier in the future.

Without having the breaking change concerns specified I can only assume it's a matter of backwards compatibility. Without thinking about the solution deeply, it seems we could introduce client read support in an earlier release, and introduce user-facing support in a future release once the majority of clients have upgraded, thereby minimizing backwards compatibility issues.

Since I posted last, I'm unsure if version info is useful, rather the read function should handle all forms of input, trying newest to oldest.

Also this feature isn't purely aesthetic, we encode public keys on-chain. Every on-chain byte costs money and we have an opportunity to half those costs. That's reason enough, no?

This is a miniature version of the same issue we faced introducing the last breaking change in beta, where we had opportunity to, but didn't solve the migration path and rationalized the problem away by saying we're in beta - which did come back and bite us, having to offer support to users who had difficulties accessing their funds... until we address that issue and think deeply about how we handle migrations we will continually run into the same issue until we're forced to handle it, or make a big mistake and lose reputation with our users.

@arnetheduck has experience and even communicated solutions to this concern last time this issue came up, but the team wasn't in a position to listen, I'd definitely hit him up to get deeper on the topic.

@cammellos
Copy link
Member

The meta-problem here is we don't know how to handle breaking changes. This could be a great opportunity for the team to learn how to do this which is going to make life easier in the future.

I am not sure that this is accurate, it is often repeated, but I think we have gathered a fair amount of experience in handling breaking changes, and repeating this point is just ignoring (or not knowing) all the effort and thoughts that went into this topic.

As a note, we broke compatibility twice, once when the "new protocol" was introduced (just a week after I joined in spring 2018), and with v1, which was agreed on by the team collectively (and we also proposed ways to maintain compatibility, as we knew how, but it was decided otherwise).
Multi account is one of those cases where we pushed down a huge change without reflecting too much, and worked on it in isolation (core was not involved for the most part), and the problem is fair to say was mainly one with management and coordination, rather than technical implementation.

We made huge amounts of changes to the protocol, without really breaking, or at the very least handling them in a graceful way: PFS, waku, negotiated topic, the most recent images, are all examples where we didn't break compatibility, and those are fundamental changes to the protocol.
So one side is saying we don't know how to handle breaking changes, the other is building a lot of solutions that are technically complex and handling them gracefully.
That's not to say that we have always been successful, but I think we should acknowledge the successes of the team as much as when it didn't work as well.

One way we do this is how to technically introduce a new change, and the other is to thinking long and hard before making a change. It's not only a question of how to handle changes, but also which changes we want to push for.

By now we also understand what kind of changes are breaking and which one is technically impossible to make non-breaking.
Any time you change (not add) the format of something and you are not aware of the clients consuming that piece of data, unless the clients are already able to deal with the new format, it will inevitably be a breaking change.

This is one of those cases.
If a user is allowed to paste in a public chat the new format (as I understand qr codes are broken at the moment so less of a concern), any user of the current version who copies it and adds it to the start new chat box will not be able to use it, as they only support non compressed pubkeys. This is a purely UI concern, it's not a protocol level concern, and is the one we raised.

The best we can do, and that's what we also suggested, is to add the parsing now, and delay going live with the feature until we are fairly certain that old clients are not around anymore. Nonetheless this is a breaking change, but we can minimize the impact.

Also this feature isn't purely aesthetic, we encode public keys on-chain. Every on-chain byte costs money and we have an opportunity to half those costs. That's reason enough, no?

There seem to be a mix of concern here, from the tasks created (here and status-im/status-mobile#10325 ), nothing is related to the blockchain, all the impact is purely UI/UX.

If we want to save bytes on the blockchain, we should address the specific problem, and not necessarily change the UI (which is the concern we raised).

First we should understand the problem, which no one has done here, before jumping to a solution, and then see how to address it, and whether it needs addressing.

How to store keys on-chain is not necessarily related to how we display them to the user, and might not be a breaking change, or it might have a different impact.

As a side note, we widely use compressed keys in the protocol already, so might be that also those are used when writing to the blockchain.

If on-chain space is an issue, I take is related to ENS management, as status-go doesn't deal with the blockchain for most parts, we should involve the people familiar with those parts (neither me nor samuel are probably very familiar with this).

So my suggestion would be:

  1. We should clearly identify the user stories first, for example:
As a user I want to store my public key on ENS in the most efficient format so that I can spend less money when registering one.
As a user I want to see a shorter public key as my contact code so that it's less intimidating and easier to type.
  1. Understand the current state of things (how and where keys are stored on the blockchain, which format, is there an advantage for the user if we reduce the size, quantify the advantage)

  2. Understand the implications of changing them (i.e changing the ui is a breaking change as described above, ENS might be or not)

  3. Decide whether we should go ahead, with one or both (they are not really dependent on each other)

  4. Decide on a format

I think we focused on 5, and really haven't spent enough time on 1-4, so I would spend more time on the requirements phase.

@Samyoul
Copy link
Member

Samyoul commented May 25, 2020

I have little of substance that I can add to Andrea's comment, apart from emphasising a few points.

seems we could introduce client read support in an earlier release, and introduce user-facing support in a future release once the majority of clients have upgraded, thereby minimizing backwards compatibility issues.

As Andrea has stated, we discussed strategies for mitigating compatibility issues and coincidentally this very strategy was discussed.

The breaking change element is not the real problem, we can implement sensible steps to mitigate the impact of changes we introduce.

Also this feature isn't purely aesthetic, we encode public keys on-chain. Every on-chain byte costs money and we have an opportunity to half those costs.

I don't have any problem with aesthetics being the only reason for changing the public key format, and if there are also functional reasons that is great too.

The core of our questions were centred around "Why is this change needed?". If we know the why we can work out everything else from there.

@arnetheduck
Copy link
Member

I'm unsure if version info is useful, rather the read function should handle all forms of input, trying newest to oldest.

if we're going in the direction of multiple key formats, or above all, self-describing key formats, the problem is somewhat addressed in https://github.com/multiformats/multihash / multicodec - maybe we can extend that one?

oldest-to-newest schemes tend to cleverly work up to the point where there's no ambiguity - it's often a nice save in case you didn't consider the problem of upgradeability in the first implementation - one would have to "price" the extra type bytes in terms of UX degradation to make a good call here.

cost

would we run an A/B test on things like this? it has the additional advantage that the feature implicitly get developed behind a toggle, meaning the upgrade/downgrade path is baked in from the start, additionally de-risking the final decision, and crucially, not disabling support for the old behaviour until it's comfortable to do so.

I ran a test on 10 million

the key conversion algorithm is deterministically bounded - I would not rely on specific lengths until the code that generates the key has been analysed - in particular, I would not build in any assumptions about this into code that gets distributed to users based on an exploratory test alone.

compressed or not

has the performance angle of decompressing keys been explored? for a QR code it won't matter, but for a group chat or your address book it might

@Samyoul
Copy link
Member

Samyoul commented Jun 3, 2020

Edit:

This research may be redundant in the case we implement Multiformat public keys. 😭

At least the maths is interesting 😎


the key conversion algorithm is deterministically bounded - I would not rely on specific lengths until the code that generates the key has been analysed - in particular, I would not build in any assumptions about this into code that gets distributed to users based on an exploratory test alone.

I've done a little research on this and I've found that deterministically the number of characters of a base58 encoded compressed secp256k1 public key will only be 44 to 46 characters long.

Basically, the compression algorithm will always give a result of 33 bytes for a given secp256k1 pk.

The variation is introduced with the novel way that base58 encoding works. Base58 converts a byte array into a decimal representation and then proceeds to repeatedly modulo the decimal value by 58. After each round of modulo the remainder is used as an index on the base58 character table, and the modulo product is used as the input for the next round.

The final round is determined when the product of the input modulo 58 is 0.

The theoretical maximum number of rounds is ceil(len(b)* (log(256) / log(58))), where 256 is the number of states a byte can have and 58 is the number by which we divide the input by every round.

For a 33 byte array we get 33 * (log(256) / log(58)) = 45.0667, rounding up we get the theoretical maximum of 46 chars long.

The theoretical minimum number of characters in a base58 encoded string is the number of bytes in the input array. If any leading bytes have a value of 0 then the encoding algorithm uses index 0 in the base58 character table which is 1. Combined with the compression algorithm the theoretical minimum is not possible as the first byte is always guaranteed to be higher than 0.

Meaning the minimum value would be 44, as even the lowest possible value of 25632

[]byte{
	1,0,0,0,0,0,0,0,0,0,0,
	0,0,0,0,0,0,0,0,0,0,0,
	0,0,0,0,0,0,0,0,0,0,0,
}

would give a value of 32 * (log(256) / log(58)) = 43.701. Meaning combined with the compression algorithm the theoretical minimum is 44 characters of a base58 encoded byte array like the above.

@Samyoul
Copy link
Member

Samyoul commented Jun 4, 2020

Multiformat keys

if we're going in the direction of multiple key formats, or above all, self-describing key formats, the problem is somewhat addressed in https://github.com/multiformats/multihash / multicodec - maybe we can extend that one?

I've explored multiformats and I think that we can make a use of:

A Secp256k1 public key can be described with a prepended byte code 0xe7. I've realised that the good thing about Secp256k1 is that it self-describes whether it is compressed or not. The first byte is either 0x04 for uncompressed or 0x02 for compressed.

Implementation

My current key

Hex encoded, secp256k1 public key

04261c55675e55ff25edb50b345cfb3a3f35f60712d251cbaaab97bd50054c6ebc3cd4e22200c68daf7493e1f8da6a190a68a671e2d3977809612424c7c3888bc6

In raw bytes

|04:26:1c:55:67:5e:55:ff|25:ed:b5:0b:34:5c:fb:3a|
|3f:35:f6:07:12:d2:51:cb|aa:ab:97:bd:50:05:4c:6e|
|bc:3c:d4:e2:22:00:c6:8d|af:74:93:e1:f8:da:6a:19|
|0a:68:a6:71:e2:d3:97:78|09:61:24:24:c7:c3:88:8b|
|c6                     |                       |

Compressed bytes

|02:26:1c:55:67:5e:55:ff|25:ed:b5:0b:34:5c:fb:3a|
|3f:35:f6:07:12:d2:51:cb|aa:ab:97:bd:50:05:4c:6e|
|bc                     |                       |

Compressed bytes encoded into base58 (44 characters)

e2QHwp5qjYj6i3jTCfzKVdB1k1dy7NDuoRngzTrARkpT

Multikey version

Hex encoded uncompressed secp256k1 public key

fe704261c55675e55ff25edb50b345cfb3a3f35f60712d251cbaaab97bd50054c6ebc3cd4e22200c68daf7493e1f8da6a190a68a671e2d3977809612424c7c3888bc6

In raw bytes

|e7:04:26:1c:55:67:5e:55|ff:25:ed:b5:0b:34:5c:fb|
|3a:3f:35:f6:07:12:d2:51|cb:aa:ab:97:bd:50:05:4c|
|6e:bc:3c:d4:e2:22:00:c6|8d:af:74:93:e1:f8:da:6a|
|19:0a:68:a6:71:e2:d3:97|78:09:61:24:24:c7:c3:88|
|8b:c6                  |                       |

Compressed bytes

|e7:02:26:1c:55:67:5e:55|ff:25:ed:b5:0b:34:5c:fb|
|3a:3f:35:f6:07:12:d2:51|cb:aa:ab:97:bd:50:05:4c|
|6e:bc                  |                       |

Compressed bytes encoded into base58 (48 characters)

z6DthaqgddwEpYrT8o18jY6XzTuPUXr5WNxx9AaouvUVbxHq

Analysis

Notice the difference between the two keys

Current hex encoded uncompressed pk

  • 04261c55675e55ff25edb50b345cfb3a3f35f60712d251cbaaab97bd50054c6ebc3cd4e22200c68daf7493e1f8da6a190a68a671e2d3977809612424c7c3888bc6
    • 04 - uncompressed
    • 261c55675e55ff25edb50b345cfb3a3f35f60712d251cbaaab97bd50054c6ebc3cd4e22200c68daf7493e1f8da6a190a68a671e2d3977809612424c7c3888bc6 - x and y data

Multiformat hex encoded uncompressed pk

  • fe704261c55675e55ff25edb50b345cfb3a3f35f60712d251cbaaab97bd50054c6ebc3cd4e22200c68daf7493e1f8da6a190a68a671e2d3977809612424c7c3888bc6
    • f - multibase hex identifier
    • e7 - multikey secp256k1 identifier
    • 04 - compression identifier (uncompressed)
    • 261c55675e55ff25edb50b345cfb3a3f35f60712d251cbaaab97bd50054c6ebc3cd4e22200c68daf7493e1f8da6a190a68a671e2d3977809612424c7c3888bc6 - x and y data

Current base58 encoded compressed pk

  • e2QHwp5qjYj6i3jTCfzKVdB1k1dy7NDuoRngzTrARkpT
    • all encoded data, decoding is required to read the compression state
    • Ambiguous whether it is base58 or base64

Multiformat base58 encoded compressed pk

  • z6DthaqgddwEpYrT8o18jY6XzTuPUXr5WNxx9AaouvUVbxHq
    • z - multibase base58 bitcoin identifier
    • 6DthaqgddwEpYrT8o18jY6XzTuPUXr5WNxx9AaouvUVbxHq - encoded data, including the multikey identifier

Summary

  • Implementing Multiformat will cut the specification R&D to practically zero and will allow our public keys to be more easily parsed by 3rd parties.
  • The computation cost of implementing Multiformat is negligible
  • The key size trade-off is noticeable, but not disruptive.
    • Increasing base58 encoded keys from theoretical maximum of 46 chars to 50 chars
    • Increasing hex encoded keys from 130 char to 133 chars without the 0x prefix
      • If we assume the 0x is also a hex identifier the change is 132 chars to 134 chars
  • Significant advantage to determining encoding type over counting the key's characters, or trial and error.

@Samyoul
Copy link
Member

Samyoul commented Jun 4, 2020

After speaking with @jarradh today we've settled that the main reason for making these changes is for UX. Any functional benefit is coincidental and is not the focus of this change.

All we want to do is:

  • make chat keys smaller and less daunting for the users.
  • make the key format more robust for any future change, and/or interoperability requirements with 3rd parties.

@arnetheduck
Copy link
Member

arnetheduck commented Jun 12, 2020

With regards to on-chain formats, it should generally be less necessary to use something like multiformats or self-identifying keys - as long as the contract can be uniquely identified, that is enough to disambiguate the data within - among other things, this is why ethereum itself for example strips the compression type from the secp signatures it stores - it's just waste - a similar argument can be applied here.

@Samyoul
Copy link
Member

Samyoul commented Jun 16, 2020

FYI, I've got a working unspeced version of this #1990 .

This supports

  • secp256k1 pks
  • bls12-381 g1 pks
  • bls12-381 g2 pks

Uses multicodec identifiers for key types, and multibase encoding of bytes.

@Samyoul
Copy link
Member

Samyoul commented Jun 17, 2020

Spec is now submitted for peer review status-im/specs#137

Samyoul added a commit that referenced this issue Jun 23, 2020
## What has changed?

I've introduced to the public binding functionality that will compress and decompress public keys of a variety of encoding and key types. This functionality supports all major byte encoding formats and the following EC public key types:

- `secp256k1` pks
- `bls12-381 g1` pks
- `bls12-381 g2` pks

## Why make the change?

We want shorter public (chat) keys and we want to be future proof and encoding agnostic. See the issue here #1937

---

* Added basic signature for compresspk and uncompresspk

* Added basic encoding information

* make vendor

* formatted imports for the linter

* Reformatted imports hoping linter likes it

* This linter is capricious

* Added check that the secp256k1 key is valid

* Added test for valid key

* Added multiformat/go-varint dep

* Added public key type handling

* Added key decompression with key type handling

* Added handling for '0x' type indentifying

* Added more robust testing

* Less lint for the linting gods

* make vendor for bls12_381

* Added bls12-381 compression tests

* Added decompress key expected results

* Refactor of typed and untyped keys in tests

* Lint god appeasment

* Refactor of sample public keys

* Implemented bls12-381 decompression

* gofmt

* Renamed decode/encode funcs to be more descriptive

* Added binary bindings for key de/compression

* Refactor of func parameters

gomobile is a bit tempermental using raw bytes as a parameter, so I've decided to use string only inputs and outputs

* gofmt

* Added function documentation

* Moved multiformat de/compression into api/multiformat ns

* Moved multiformat de/compression into api/multiformat ns

* Changed compress to serialize on API
@Samyoul
Copy link
Member

Samyoul commented Jun 23, 2020

#1990 Has been merged.

Samyoul added a commit to status-im/specs that referenced this issue Jun 24, 2020
## What has changed?

I've added detailed specs for the implementation of public key compression and decompression. The specifications detail the use of the following `multiformat` features:

- `multibase`
- `multicodec`
- `unsigned-varint`

`multiformat` is used to ensure that the implementation has as much flexibility and robustness as feasible.

## Why make the change?

The usage of key de/compression is outside the typical usage of public keys and requires a degree of background knowledge to correctly implement. The purpose of this specification change is to provide this needed background knowledge.

Please also see

- status-im/status-go#1937
- status-im/status-go#1990

---

* Added Public key compression specs

* Added recommendation for encoding type of compressed keys

* Added unrecognised words to wordlist

* Add multibase to the wordlist

* Added a basic example of the multiformat EC key compression concept

* Added parsable to wordlist

* Hex is the only Lingua Franca we need

* Language to make pk de/compression SHOULD implement

* Added terms to glossary explaining key de/compression

* Change terminology from compress to serialise

* Added rationale for public key compression

* Added deserialization to the wordlist

* Concise sentence

* Added url to the wordlist
@Samyoul
Copy link
Member

Samyoul commented Jun 24, 2020

status-im/specs#137 Has been merged.

Now readable at https://specs.status.im/spec/2#public-key-serialization

@Samyoul
Copy link
Member

Samyoul commented Jul 1, 2020

As far as status-go is concerned this issue is resolved. There still remains consuming or dependant implementations:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants