-
Notifications
You must be signed in to change notification settings - Fork 31
if the Permanent Web is “content-addressable”, could it be designed so that each file has only one address? #126
Description
Recently at ipfs/kubo#875 (comment) I have once again encountered the following fact:
- one file might have several different IPFS hashes,
- if several files have different IPFS hashes, then these files still might be the same (content-wise),
- the result of
ipfs cat $HASH | ipfs add
is not always the original hash
(it's all the same fact, just rephrased). Apparently there are several factors (encoding, sharding, IPLD) that influence the IPFS hash and make it different even if the file's contents are not different at all.
I have then found myself questioning whether the current IPFS URIs (designed in ipfs/kubo#1678) are useful in hyperlinks of the Permanent Web. After all, if a hyperlink references an IPFS hash, then that hyperlink by design becomes broken (forever) once that hash can no longer be used to retrieve the file. Even if someone somewhere discovers such lost file in some offline archive and decides to upload that file (or the whole archive) to the Permanent Web, the file is likely to yield a different IPFS hash and thus an old hyperlink (which references the original IPFS hash) is still doomed to remain broken forever. Such behaviour is not okay for the Permanent Web.
What can be done to improve it?
After a few minutes of hectic googling I've encountered @btrask's design of “Common content address URI format” which uses URIs such as hash://sha256/98493caa8b37eaa26343bbf7
that are based on cryptographic hashes of the addressed content. As long as the hash (“sha256”) stays the same, each file has only one address.
In addition to its main advantage (the improved immutability of the addresses), it also has a couple of additional advantages:
- Different content-addressable P2P-distributed storages can use mutually compatible URI.
- Well-known cryptographic tools can be used to calculate hashes (and subsequently generate addresses) of files even if an implementation of a specific P2P-distributed storage is not readily available. For example, consider
SubtleCrypto.digest
available in some Web browsers before JS IPFS is completed.
Therefore here's a proposal: implement such addressing on top of IPFS to ensure that each file has only one address (minor correction: “only one” until multihash is upgraded from sha256
to another algorithm and changes the address inevitably), an address that is determined only by the cryptographic hash of the file's content.
As an address @btrask's scheme of hash://algorithm/hashString
is too long and also not similar to the other IPFS addresses. I propose the form /ipmh/hashString
where hashString
is a base58-encoded multihash of the file's content (not of the file's merkledag!) and ipmh
means “InterPlanetary Multihash”. It's better to refrain from the idea of /iphs/
(“InterPlanetary Hash System”) because iphs
and ipns
are visually alike (their likeness might cause perception errors in OCR and human vision).
I am certain that an implementation won't be an easy task and would need at least the following:
- some DHT to track a multitude of IPFS hashes that correspond to cryptographic IPMH hashes
- Note 1. These are currently
sha256
hashes. Such (or similar) DHT would also eventually be necessary to find new (upgraded) multihashes that correspond to the currentsha256
multihashes. - Note 2. If a system starts with
/ipfs/
address which it cannot resolve (the “forever dead hyperlink” case, discussed above), it should try using DHT backwards (to find an IPMH for such IPFS) and then use IPMH to look for equivalent IPFS hashes (where “equivalent” means that they designate the same content as the original IPFS).
- Note 1. These are currently
- changes in
ipfs add
to ensure that/ipmh/
addresses are issued by default - changes in
ipfs get
andipfs cat
to ensure that/ipmh/
addresses can be used to retrieve files - changes in
ipfs mount
to ensure that/ipmh
mountpoint is mounted - similar changes in other commands
- changes in the main gate to ensure that
https://ipfs.io/ipmh/
addresses are served - similar changes in the local gates listening on
/ip4/127.0.0.1/tcp/8080
- changes in Firefox addon and Chrome extension to redirect
https://ipfs.io/ipmh/
addresses (and, optionally, also @btrask'shash://sha256/
addresses).- Here “optionally” means that
hash://
(unlikeipmh://
orhttps://ipfs.io/ipmh/
) is not necessarily IPFS-related and thus the user might want another application (such as StrongLink) to handle it. (Such ambiguity is similar to the case ofmagnet:
hyperlinks.)
- Here “optionally” means that
However, it really seems that there's no other way to make the Permanent Web more permanent, to prevent dead hyperlinks from staying dead.
(Everything that is said here about the files can probably be also said about IPFS directory structures; but I am not sure.)