Enable multihash support for swarm root hashes & ENS #186

cobordism · 2018-01-02T12:40:57Z

Goal: as described in #166 we want to be able to request swarm data using URLs of the form bzz://<multi-hash>/path/in/manifest.

The reason is that this will allow people to store multi-hashes in the ENS resolver contracts at "content" and thereby allowing swarm and ipfs and other systems to exists side by side.

This change also allows us to add to ENS Swarm content that has been uploaded with the --encrypt flag. In the current system that is not possible.

Enable retrieval of swarm-content using a multi-hash in the URL
Generate a multi-hash when uploading swarm content
Document the functionality in the swarm docs
Notify the ENS guys -> Need new resolver and new ENS tools.
Update all our own ENS names to use a multi-hash

The text was updated successfully, but these errors were encountered:

gbalint · 2018-01-04T09:07:40Z

Is this surely needed for 0.3?

cobordism · 2018-01-04T11:05:36Z

I bundled all the backward compatibility breaking changes under the 0.3 label.

cobordism · 2018-01-04T11:06:06Z

let's not change the hashes that go into ENS twice. Once for BMT and once more for multihash.

gbalint · 2018-01-04T14:53:20Z

Ok, I get it, let's not break compatibility again after 0.3 if possible

cobordism · 2018-04-26T14:31:12Z

I'll repeat my comment from Issue 440 - because the --encrypt flag is new.

With the advent of encrypted swarm uploads (swarm up --encrypt) requiring double length hashes, it is time to update swarm to use multi-hash.

Adding the double-length encrypted swarm hashes to ENS would require a lot of changes. According to Arachnid:
"Presently the return type for content hashes is 'bytes32'. Changing this would require writing a new resolver (ideally, after writing an EIP describing the new type). Existing tools that use the resolver (including manager.ens.domains) would need updating to support the new type."

And if we are already in need of changing so much, this is the perfect opportunity to:

"Instead of bytes32, we can use 'bytes', and start using multihash, so we can support Swarm, IPFS, and any other content-addressed system."

Neurone · 2018-11-24T02:12:12Z

Enable retrieval of swarm-content using a multi-hash in the URL

Saying multihash does not seems to me enough to set a standard for URLs, because a multihash is a binary.
I suggest to choose a base58 representation using the IPFS dictionary because:

even using 2 bytes longer hashes (34 vs 32 in this case), it is shorter then the hex representation currently used by Swarm URLs
it can work seamlessly with IPFS

acud · 2018-11-24T14:59:39Z

I suggest to choose a base58 representation using the IPFS dictionary because ...

All well and good, but Keccak256 hashes were picked in order to provide seamless interoperability with the Ethereum blockchain (which uses keccak256). This allows better and easier integration with smart contracts for various purposes. And since Swarm is, after all, a core project of the Ethereum Foundation, this is highly unlikely to change.

@homotopycolimit to veer back into your questions, there's a few considerations to give room for:

should we allow both normal hash format and multihash?
I still don't understand how specifically multihashes would solve the problem with ENS
Multihash for keccak256 is prefixed with 0x1b, are we supposed to add just two more bytes (1b) or four more bytes (0x1b)?
I don't understand how multihashes would solve the problem of publishing content uploaded with --encrypt. You'd still have to include the decryption key and so unless there would be another layer of encapsulation by another manifest which would generate the shorter hash.
This could go in as part of rewriting manifests. Most of the changes go there, since the code basically has to support traversal of multihash manifests and content. I have already written most of the spec for that, however we've de-prioritised it till we finish the xmas edition sprint

cobordism · 2018-11-24T17:15:02Z

A few things to note:
Since creating this issue, there have been discussion at ENS about changes to the default resolver and we should incorporate those changes. In particular I don't think the 'content' field is going to be used in future.

1(one) I thought yes.
2(two)I had been under the impression that a multi hash describes itself i.e. can identify itself as a swarm or ipfs hash... but I guess that's not quite accurate.

4(four) It doesn't. I'm just saying that it order to save to ENS the reference to an encrypted swarm upload, we have to change the ENS resolvers anyway.
5(five). I don't envision any new hashes in any manifest. This is only about requesting something via the http interface. Internally we continue to use standard swarm hashes.

This all came about when Swarm Feeds were introduced. They were (are?) using multihashes. Do you know the current state of affairs there?

edit: autoformatting messed up my numbering.

jpeletier · 2018-11-24T17:25:10Z

should we allow both normal hash format and multihash?. I still don't understand how specifically multihashes would solve the problem with ENS

I don't see how multihash is helping us except by complicating things around Swarm code. Swarm works using 32-byte hashes only, so why does it need to understand what type of hash it is reading out of ENS? It should just expect 32 bytes out of the resolver, and if those 32 bytes interpreted as a Keccak256 don't yield content after looking up, well, so that content does not exist.

Multihash for keccak256 is prefixed with 0x1b, are we supposed to add just two more bytes (1b) or four more bytes (0x1b)?

I think you are confusing a hex representation of a byte array with how a byte array is stored. The 0x portion is not stored. And 2 hex digits are stored in ENS as 1 byte. Thus, 1b takes 1 byte.

Multihash for Keccak256 is actually prefixed with 0x1b20 which is 2 bytes. 1 byte hash type and 1 byte hash length (0x20 = 32 ). Again, the 0x is just part of the printable, human-readable representation. This is not stored.

I don't understand how multihashes would solve the problem of publishing content uploaded with --encrypt. You'd still have to include the decryption key and so unless there would be another layer of encapsulation by another manifest which would generate the shorter hash.

What is the point of encrypting something and then publishing the decryption key somewhere like ENS where everyone can see it? When browsing such encrypted website, only the content hash should be in ENS, and the decryption key should be something you have in some sort of wallet. Perhaps the browser should prompt you for it like when you try to access a username/password protected URL.

If you insist publishing the decryption key, then some sort of manifest should do the trick

jpeletier · 2018-11-24T17:36:29Z

This all came about when Swarm Feeds were introduced. They were (are?) using multihashes. Do you know the current state of affairs there?

Swarm Feeds does not use multihashes anymore. It works like a key-value store in which the key is YourEthAddress | Topic and the value is an arbitrary byte array of up to 3963 bytes. You can store arbitrary data there. It can be a multihash, a small JSON file or whatever.

If, however, you choose to store in that arbitrary data something that looks like a multihash, then you can use that Feed in combination with bzz:. When a bzz: URL is requested with a Feeds manifest hash, the referenced feed is looked up and the value is checked to see if it is a multihash for content. If so, that multihash is looked up to see if it points to content and that content is returned.

Note that for the above scheme to work, the hash stored in the Feed would not need to be a multihash. It could be a straightforward regular Swarm hash if we make the necessary change.

The whole 0x1b20 prefix thing is driving most people using Feeds+bzz: + ENS nuts. I am in favor or removing it altogether. If you agree, I'll be more than happy to open a PR. This is just a few lines of code.

cobordism · 2018-11-24T17:40:55Z

one of the points of encrypting something is so that the storers don't know the contents of what they are storing.
Even if you publicise the decryption key of the root chunk on ENS, you still gain something.
If you are running a swarm node and receive an encrypted chunk, you will not know what root hash it belongs to, and you will not know if the decryption key is in ENS unless you try downloading everything registered there.

Indeed in future the encrypted upload might be the only one. No more plaintext.

cobordism · 2018-11-24T17:43:01Z

@ Javier, what it the best way forward for feeds?

jpeletier · 2018-11-24T17:43:01Z

one of the points of encrypting something is so that the storers don't know the contents of what they are storing.

ok.

Indeed in future the encrypted upload might be the only one. No more plaintext.

I would then store the encrypted key in a manifest. The hash of that manifest would go to ENS. That way no changes to ENS are required. This is a huge advantage.

jpeletier · 2018-11-24T17:45:20Z

@ Javier, what it the best way forward for feeds?

@homotopycolimit regarding what? Regarding multihash, Feeds does not use/have any multihash anywhere in its code.

I posted above another comment about Feeds. Does that answer your question?

cobordism · 2018-11-24T17:52:07Z

I saw your comment and thus asked what is the best way forward? Shall we close this issue and change swarm feeds to just use regular swarm hashes?

in regards to decryption keys -- the manifests are themselves encrypted. Everything is. You need a decryption key just to get started. (calling @nagydani - what do you make of unencrypted manifests containing references to encrypted content?)

jpeletier · 2018-11-24T18:02:55Z

I saw your comment and thus asked what is the best way forward? Shall we close this issue and change swarm feeds to just use regular swarm hashes?

TL;DR. Yes

Longer read:

Just a clarification: Feeds does not "use" multihashes, it blindly stores arbitrary data. Whether that data is a multihash or not, Feeds does not know or care.

bzz: is a "client" of Feeds. In that usage, it expects Feeds to return a 34-byte multihash. (2 bytes 0x1b20 prefix + 32 bytes swarm hash. So somehow, you should have stored those 34 bytes in a feed update before you use bzz, or your lookup will fail.

I would change bzz: to expect a regular 32-byte swarm hash instead of a multihash. I can do this pretty quickly and more than happy to and reduce confusion everywhere.

jpeletier · 2018-11-24T18:08:51Z

in regards to decryption keys -- the manifests are themselves encrypted. Everything is. You need a decryption key just to get started.

I would add an additional manifest that points to the actually encrypted manifest+data, because even if you do not store the decryption key in Swarm you would then be storing it in ENS: nodes could also scan ENS for decryption keys too (?), so what is the point?

cobordism · 2018-11-24T23:46:19Z

I think using the same hashes for feeds as for regular content sounds good, but is not my decision to make.
let's discuss at next round-table on Tuesday.

To clarify, one more point about encryption: every chunk in the chunk tree of a dataset is encrypted with a different key. Only the decryption key of the root chunk would be visible in ENS. You'd have to recursively download the entire trie in order to decrypt the data chunks. I particular, if you hold one chunk and do not know what dataset it belongs to, you cannot decrypt it.

acud · 2018-11-25T04:57:41Z

I think you are confusing a hex representation of a byte array with how a byte array is stored. The 0x portion is not stored. And 2 hex digits are stored in ENS as 1 byte. Thus, 1b takes 1 byte.

I am facepalming myself 😭

Multihash for Keccak256 is actually prefixed with 0x1b20 which is 2 bytes. 1 byte hash type and 1 byte hash length (0x20 = 32 ). Again, the 0x is just part of the printable, human-readable representation. This is not stored.

I remembered this but somehow missed the point where the length is written into the multihash. Again facepalm.

in regards to decryption keys -- the manifests are themselves encrypted. Everything is. You need a decryption key just to get started. (calling @nagydani - what do you make of unencrypted manifests containing references to encrypted content?)

That's a bit of a problem actually because when you create an unencrypted manifest with that reference it would be possible to a node to intercept that manifest (it would probably fit into one chunk) and the decryption key could possibly leak. That would be the same case for retrieval requests as forwarding nodes could intercept decryption keys and encrypted references.

@homotopycolimit I think that this issue is within an overlap with this issue (which will actually give a real solution to the problem. well. it won't solve storing decryption keys on ENS) #940

multihashes are just half of a solution since they just describe the type of the hashing algorithm, they don't describe the storage infrastructure on which the content hash resides on. Unless web3 storage and delivery providers (swarm, ipfs, storj) work out a convention to use between themselves on extending multihashes with Dstorage provider identifiers embedded into the multihash, this won't really solve anything.

zelig · 2018-11-25T05:38:22Z

I particular, if you hold one chunk and do not know what dataset it belongs to,

reverse indexing ENS is prettty simple thing to do. So only who does not want to know will not know.
but yes it is better than nothing.
BTW 'root access' manifests (the ones that contain the encrypted reference and a link to an ACT) are of course unenctypted so they can be referenced by 32 bytes.
I remember when we ditched this whole topic with ENS because of this.

Now as for supporting multihash on bzz, i am ok with that but find it a bit pointless, since if any client knows that 0x1b20.... hashes need swarm to resolve they can just as well trim this prefix before handing it to bzz ...

acud · 2018-11-25T05:38:38Z

Whoops, it seems that in the last few weeks the discussion has progressed in the provided EIP.

See:
https://ethereum-magicians.org/t/eip1577-multiaddr-support-for-ens/1969
status-im/status-mobile#6688
https://eips.ethereum.org/EIPS/eip-1577
ethereum/EIPs#1577

New contract: https://github.com/ensdomains/resolvers/blob/master/contracts/PublicResolver.sol

@homotopycolimit I think we can close this issue

jpeletier · 2018-11-25T09:03:50Z

Now as for supporting multihash on bzz, i am ok with that but find it a bit pointless, since if any client knows that 0x1b20.... hashes need swarm to resolve they can just as well trim this prefix before handing it to bzz ...

Given what I have read by exploring the links posted by @justelad regarding multiaddr, I think we should remove the whole multihash stuff from `bzz:`, since it is confusing people using Feeds, and any indicator that a hash is intended for Swarm will live in ENS anyway. This will reduce our complexity. If you agree, I'll open a PR to do this.

acud · 2018-11-25T10:49:38Z

@jpeletier my vote is 👍for getting rid of multihashes at this point. they provide no real benefit in the codebase for our current use cases

cobordism · 2018-11-25T11:29:35Z

Ok. Let get rid of multihash in swarm and work towards using https://eips.ethereum.org/EIPS/eip-1577 for ENS.

I will now close this issue. We can create new ones as needed.

jpeletier · 2018-11-26T00:13:50Z

PR to remove multihash: ethereum/go-ethereum#18175

cobordism added the enhancement label Jan 2, 2018

cobordism added this to To do in Swarm Core - Sprint planning via automation Jan 2, 2018

cobordism mentioned this issue Jan 2, 2018

multihash support in swarm #166

Closed

cobordism assigned zelig Jan 2, 2018

cobordism added this to the 0.3 milestone Jan 2, 2018

gbalint moved this from To do to Backlog in Swarm Core - Sprint planning Jan 2, 2018

cobordism assigned nolash Jan 25, 2018

cobordism mentioned this issue Apr 26, 2018

Multihash support for root-manifests and ENS #440

Closed

This was referenced Apr 26, 2018

ENS resolver for mutable resource updates #205

Closed

swarm/api: Multihash handler for resource updates in swarm api #400

Closed

gbalint added the http/rpc/ipc label Apr 27, 2018

cobordism modified the milestones: 0.3 breaking changes, 0.3.1 Jun 7, 2018

cobordism moved this from Backlog to To do in Swarm Core - Sprint planning Jun 14, 2018

gbalint moved this from To do to Backlog in Swarm Core - Sprint planning Jun 20, 2018

gbalint removed this from the 0.3.1 milestone Aug 2, 2018

cobordism unassigned nolash Nov 25, 2018

cobordism closed this as completed Nov 25, 2018

Swarm Core - Sprint planning automation moved this from Backlog to Done Nov 25, 2018

This was referenced Nov 29, 2018

database rewrite - meta + timeline #1027

Closed

Hashing refactor - meta + timeline #1039

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable multihash support for swarm root hashes & ENS #186

Enable multihash support for swarm root hashes & ENS #186

cobordism commented Jan 2, 2018 •

edited

Loading

gbalint commented Jan 4, 2018

cobordism commented Jan 4, 2018

cobordism commented Jan 4, 2018

gbalint commented Jan 4, 2018

cobordism commented Apr 26, 2018

Neurone commented Nov 24, 2018

acud commented Nov 24, 2018 •

edited

Loading

cobordism commented Nov 24, 2018 •

edited

Loading

jpeletier commented Nov 24, 2018

jpeletier commented Nov 24, 2018 •

edited

Loading

cobordism commented Nov 24, 2018

cobordism commented Nov 24, 2018

jpeletier commented Nov 24, 2018 •

edited

Loading

jpeletier commented Nov 24, 2018

cobordism commented Nov 24, 2018

jpeletier commented Nov 24, 2018

jpeletier commented Nov 24, 2018

cobordism commented Nov 24, 2018 •

edited

Loading

acud commented Nov 25, 2018 •

edited

Loading

zelig commented Nov 25, 2018

acud commented Nov 25, 2018

jpeletier commented Nov 25, 2018 via email •

edited

Loading

acud commented Nov 25, 2018

cobordism commented Nov 25, 2018

jpeletier commented Nov 26, 2018

Enable multihash support for swarm root hashes & ENS #186

Enable multihash support for swarm root hashes & ENS #186

Comments

cobordism commented Jan 2, 2018 • edited Loading

gbalint commented Jan 4, 2018

cobordism commented Jan 4, 2018

cobordism commented Jan 4, 2018

gbalint commented Jan 4, 2018

cobordism commented Apr 26, 2018

Neurone commented Nov 24, 2018

acud commented Nov 24, 2018 • edited Loading

cobordism commented Nov 24, 2018 • edited Loading

jpeletier commented Nov 24, 2018

jpeletier commented Nov 24, 2018 • edited Loading

cobordism commented Nov 24, 2018

cobordism commented Nov 24, 2018

jpeletier commented Nov 24, 2018 • edited Loading

jpeletier commented Nov 24, 2018

cobordism commented Nov 24, 2018

jpeletier commented Nov 24, 2018

jpeletier commented Nov 24, 2018

cobordism commented Nov 24, 2018 • edited Loading

acud commented Nov 25, 2018 • edited Loading

zelig commented Nov 25, 2018

acud commented Nov 25, 2018

jpeletier commented Nov 25, 2018 via email • edited Loading

acud commented Nov 25, 2018

cobordism commented Nov 25, 2018

jpeletier commented Nov 26, 2018

cobordism commented Jan 2, 2018 •

edited

Loading

acud commented Nov 24, 2018 •

edited

Loading

cobordism commented Nov 24, 2018 •

edited

Loading

jpeletier commented Nov 24, 2018 •

edited

Loading

jpeletier commented Nov 24, 2018 •

edited

Loading

cobordism commented Nov 24, 2018 •

edited

Loading

acud commented Nov 25, 2018 •

edited

Loading

jpeletier commented Nov 25, 2018 via email •

edited

Loading