Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standard URI for ipfs and ipns protocols (Discussion) #1678

Closed
Tracked by #248
larsks opened this issue Sep 10, 2015 · 90 comments
Closed
Tracked by #248

Standard URI for ipfs and ipns protocols (Discussion) #1678

larsks opened this issue Sep 10, 2015 · 90 comments
Labels
need/community-input Needs input from the wider community

Comments

@larsks
Copy link

larsks commented Sep 10, 2015

I would like to add ipfs support to a tool that expects a URL-format specification. Hypothetically, let's say I wanted to add ipfs suport to curl. I would need a scheme:data format specification that follows the standard url format.

I asked about this on irc and immediately folks started trying to direct me away from URLs to the multiaddr spec. Setting aside for the moment then I'm not clear what problem multiaddr is trying to solve or why URLs aren't appropriate, some tools will simply require a URL format to operate.

In the absence of any other suggestions, I would like to suggest that we document the following standard forms:

  • ipfs:<hash>[/<path>] for IPFS objects, as in:

    ipfs:QmPXME1oRtoT627YKaDPDQ3PwA8tdP9rWuAAweLzqSwAWT/readme
    
  • ipns:<hash>[/path] for IPNS names:

    ipns:QmXfrS3pHerg44zzK6QKQj6JDk8H6cMtQS7pdXbohwNQfK/pages/gpg.md
    
@willglynn
Copy link

I strongly agree that IPFS objects should be identifiable by URI, mostly because of uniformity as described by RFC 3986:

  Uniformity provides several benefits.  It allows different types
  of resource identifiers to be used in the same context, even when
  the mechanisms used to access those resources may differ.  It
  allows uniform semantic interpretation of common syntactic
  conventions across different types of resource identifiers.  It
  allows introduction of new types of resource identifiers without
  interfering with the way that existing identifiers are used.  It
  allows the identifiers to be reused in many different contexts,
  thus permitting new applications or protocols to leverage a pre-
  existing, large, and widely used set of resource identifiers.

I don't care if go-ipfs uses URIs internally, or if browsers will support it, or anything like that – there should be a canonical, standard way to refer to IPFS objects using URIs.

The above suggestion seems entirely reasonable to me:

ipfs:QmPXME1oRtoT627YKaDPDQ3PwA8tdP9rWuAAweLzqSwAWT/readme

I could also see an argument for IPFS identifiers being URNs per RFC 2141:

urn:ipfs:QmPXME1oRtoT627YKaDPDQ3PwA8tdP9rWuAAweLzqSwAWT/readme

IPNS is separate. I could see an argument for it being a separate scheme, as proposed above:

ipns:QmXfrS3pHerg44zzK6QKQj6JDk8H6cMtQS7pdXbohwNQfK/pages/gpg.md

On the other hand, I could see the IPNS public key hash being a naming authority per RFC 3986.

 Many URI schemes include a hierarchical element for a naming
 authority so that governance of the name space defined by the
 remainder of the URI is delegated to that authority (which may, in
 turn, delegate it further).  The generic syntax provides a common
 means for distinguishing an authority based on a registered name or
 server address, along with optional port and user information.

Authorities are preceded by a double slash, so:

ipns://QmXfrS3pHerg44zzK6QKQj6JDk8H6cMtQS7pdXbohwNQfK/pages/gpg.md

…and if IPNS public key hashes are interpreted as an authority, distinct from the global (no-authority) IPFS paths, IPFS and IPNS could be viewed as two halves of one scheme:

ipfs:QmPXME1oRtoT627YKaDPDQ3PwA8tdP9rWuAAweLzqSwAWT/readme
ipfs://QmXfrS3pHerg44zzK6QKQj6JDk8H6cMtQS7pdXbohwNQfK/pages/gpg.md

@larsks
Copy link
Author

larsks commented Sep 10, 2015

@willglynn, thanks for your comments.

While working with this in practice, I realized that one may want to provide IPFS gateway information as part of the URL. Again, using the hypothetical example of adding IPFS support to something like curl, I need a way to tell the utility which IPFS endpoint to use. If I'm not running one locally, the utility needs to know where to find an API to fetch ipfs:QmPXME1oRtoT627YKaDPDQ3PwA8tdP9rWuAAweLzqSwAWT/readme.

Should this information always be specified external to the URL (e.g., configuration options to the tool)? Or does permitting this in the URL make sense? That would give us something like:

ipfs://<host>:<port>/QmPXME1oRtoT627YKaDPDQ3PwA8tdP9rWuAAweLzqSwAWT/readme

Where <host> and <port> specify an endpoint, and can be omitted, making the typical URL look like:

ipfs:///QmPXME1oRtoT627YKaDPDQ3PwA8tdP9rWuAAweLzqSwAWT/readme

@willglynn
Copy link

In my view, IPFS gateway information is external to the URI. I think it's analogous to HTTP/FTP/SOCKS proxy information for which tools like curl are configured using separate parameters or environment variables. An IPFS object should always have the same identifier regardless of how it is accessed, and especially in the case of IPFS, I think identity is more fundamental than location.

@larsks
Copy link
Author

larsks commented Sep 10, 2015

Yeah, that was mostly my inclination as well. Externally configured it is, then.

@willglynn
Copy link

The more I think about it, the more I favor treating IPNS names as a URI authority. Consider:

ip[nf]s://multihash/object

Retrieval would be processed as "resolve IPNS multihash, retrieve object".

http://domain.name/object

Retrieval would be processed as "DNS look up domain.name IN A, connect to resulting IP, HTTP GET /object".

ip[nf]s://<domain.name>/object

Retrieval would be processed as "DNS look up domain.name IN TXT, resolve resulting IPFS/IPNS identifier, retrieve object".

@lidel
Copy link
Member

lidel commented Sep 10, 2015

I feel this quote belongs here :-)

I want to remind everyone here that we're not actually limited by any rules. It is of course convenient and nice to work productively with everybody else, but there are certain mistakes we should not continue to make.

I'll give you an example of a break from tradition (or rather... a return to even older tradition). it is a strong goal to mend the rift between UNIX and the Web. That is, "ipfs links" should be exactly the same in both the Web, and UNIX. meaning: /ipfs/<hash>/<path>, NOT ipfs://<hash>/<path> (explicitly disobeying the scheme that the W3C insists on). Luckily for us, this is technically feasible, though it does have its bumps to work around. (and easiest done with a TLD :) ). That's fine for us as the upside of mending part of this awful rift is worth a lot.

@jbenet in ipfs/ipfs-companion#16 (comment)

So the canonical way of representing IPFS address is /ipfs/<hash>/<path> (at least that was the consensus in April, maybe it changed since then? :).

That being said, IMHO it would not hurt to additionally provide a silent support of URLs like ipfs://<hash>/<path> for interoperability with legacy software that can't use /ipfs/<hash>/<path> natively.

@whyrusleeping
Copy link
Member

@lidel nothing has changed since that quote :)

@willglynn
Copy link

Indeed; IPFS can choose whatever syntax it wants for canonical use. I'm suggesting that there be an official, standardized way of encoding IPFS identifiers into URIs, since it is better to interoperate with URI-centric tools than not, and since it's better to have one way of encoding IPFS identifiers into URIs than multiple incompatible methods.

I would favor ipfs:/ipfs/<hash>/<path> over ipfs://. RFC 3986's URI grammar would permit ipfs:object, ipfs:/object, and ipfs://object, but the first two constructs are interpreted as paths, while the third would be interpreted as an authority with no path. Per RFC 2718:

2.1.2 Improper use of "//" following "<scheme>:"

   Contrary to some examples set in past years, the use of double
   slashes as the first component of the <scheme-specific-part> of a URL
   is not simply an artistic indicator that what follows is a URL:
   Double slashes are used ONLY when the syntax of the URL's <scheme-
   specific-part> contains a hierarchical structure as described in RFC
   2396.  In URLs from such schemes, the use of double slashes indicates
   that what follows is the top hierarchical element for a naming
   authority.  (See section 3 of RFC 2396 for more details.)  URL
   schemes which do not contain a conformant hierarchical structure in
   their <scheme-specific-part> should not use double slashes following
   the "<scheme>:" string.

If the IPFS namespace is one big path hierarchy, then mapping IPFS / to URI ipfs:/ seems appropriate, and conversion to/from URIs is just* a matter of a five-character prefix.

@lidel
Copy link
Member

lidel commented Sep 10, 2015

Ok, so as long as this discussion is about interoperability layer,
I like the idea of going with a single prefix for all IPFS resources (dropping ipns://).

Starting with <hash> IPFS is hierarchical, so perhaps we should go with

  • ipfs://ipfs/<hash>/<path> (explicit /ipfs/)
  • ipfs://ipns/<hash>/<path>

Additionally we could agree to default to ipfs resource if the first segment does not match ipfs nor ipns:

  • ipfs://<hash>/<path> (implicit /ipfs/)

What are your thoughts on this?

@willglynn
Copy link

A goal from @jbenet's post:

please don't do this. please please please have identifiers exactly how we have them, everywhere. Simply /ipfs/... and /ipns/....

If this is the objective, then the corresponding URIs should be ipfs:/ipfs/… and ipfs:/ipns/…. URI-centric tools would behave properly with both absolute and relative paths of the above canonical (scheme-less) IPFS form. A resource retrieved from the URI ipfs:/ipfs/hash/A referring to an absolute /ipfs/hash/B would be understood as ipfs:/ipfs/hash/B, which is consistent. Such a resource referring to a relative B would be understood to be ipfs:/ipfs/hash/B, which is also consistent.

Notably, these same references – /ipfs/hash/B and B – would work unchanged whether the underlying resource appears at the filesystem's /ipfs/hash/A, at http://gateway.ipfs.io/ipfs/hash/A, at http://localhost:8080/ipfs/hash/A, or in URI-space at ipfs:/ipfs/hash/A.

Using double-slashes (//) would sacrifice this property, since a resource at ipfs://hash/A referring to /ipfs/hash/B would resolve to ipfs://hash/ipfs/hash/B.

@lidel
Copy link
Member

lidel commented Sep 11, 2015

Makes sense. This way a rule for legacy layer would be super simple: always add ipfs: as a prefix to the canonical name. That is all.

@willglynn
Copy link

Great! Sounds like a workable direction.

There are a couple other possible pain points I can think of when mapping IPFS paths <-> URIs. Mostly, URIs are specified to be a particular subset of US-ASCII and certain characters have specific meanings, so unless IPFS conventions happen to be compatible, we'll need to use percent-encoding to bridge the gap. The specifics depend on some gory details:

@btrask
Copy link

btrask commented Sep 13, 2015

I have a proposal for a content addressing URI scheme here: https://bentrask.com/notes/content-addressing.html. I traded emails with Juan Benet about it a year ago when I first wrote it. Obviously I couldn't convince him.

RFC 6920 proposes a different but similar scheme.

It would be nice if the URI scheme was common between different projects (IPFS, Camlistore, my own). A single content address could be resolved over various different systems depending on context.

Edit: I compared four different proposals here.

@jbenet
Copy link
Member

jbenet commented Sep 13, 2015

Sorry, I'm late to the party.

Thanks very much @lidel for representing my viewpoint :)

I'm going to try responding only to things i think are unresolved. ask again if i missed something

ipfs://<hash>/<path>

I think @lidel elucidated my viewpoint excellently, and i do not need to express again that this is not desired becuase it complicates things, and forces us to add a 2+ protocol identifiers.

In my view, IPFS gateway information is external to the URI. I think it's analogous to HTTP/FTP/SOCKS proxy information for which tools like curl are configured using separate parameters or environment variables. An IPFS object should always have the same identifier regardless of how it is accessed, and especially in the case of IPFS, I think identity is more fundamental than location.

Exactly right. URLs on the HTTP gateways ARE NOT IPFS Paths/URIs (they contain one, in a larget HTTP URL).

would favor ipfs:/ipfs// over ipfs://. RFC 3986's URI grammar would permit ipfs:object, ipfs:/object, and ipfs://object, but the first two constructs are interpreted as paths, while the third would be interpreted as an authority with no path. Per RFC 2718:

Hmmmm, i'm not sure. I understand the spec... it may be ok to consider /ipfs and /ipns as naming authorities for the purposes of UX. Most people who ever see/use URLs always see them in http://. I think we should support all :, :/, ://, but actually redirect all to :// as the canonical one.

Makes sense. This way a rule for legacy layer would be super simple: always add ipfs: as a prefix to the canonical name. That is all.

I like this, maybe we make :/ canonical? it just looks so odd. and people will be weirded out by it... (we must support :// at least)

What characters are permitted in IPFS path segments? UNIX is arguably way too permissive, Windows reserves far more characters plus certain entire filenames, and various other systems are somewhere in the middle. (Edit: am I reading this right? Any byte sequence in a string is a legal path, provided it starts with e.g. /ipfs//?)
Do IPFS paths use a specific character encoding, and if so, which? UNIX doesn't, while Windows and OSX do; potentially ill-formed UTF-16 (prompting creation of WTF-8) and a particular normalization of UTF-8 respectively. (Edit: Links seem to be UTF-8. Is this enforced? Where and how?)

Yes, the paths are supposed to be UTF-8 strings. We should be enforcing it, though i dont think it's being enforced atm.

Is the concept of a query string (?foo) meaningful to a resource addressable at an IPFS URI? (I think probably no.)

Not at this time, though it may become relevant.

Is the concept of a fragment (#foo) meaningful to a resource addressable at an IPFS URI? (I think probably yes.)

yes.

I have a proposal for a content addressing URI scheme here: https://bentrask.com/notes/content-addressing.html. I traded emails with Juan Benet about it a year ago when I first wrote it. Obviously I couldn't convince him.

Sorry @btrask :/ -- i just disagree :)

Edit: I compared four different proposals here.

you didn't compare the proper IPFS URIs, which are paths:

/ipfs/QmeeQhGoyMQc7eQWERE88kFFq4WbdVRrjHctZhH1hPHNds/006/mdag.waist.png
/ipns/QmfVrBzjaXjWWfC7UhFrZnnFFMYA1ENPCjzxAtREaQz8MS/006/mdag.waist.png
/ipns/ipfs.io/docs/install

These are all valid in IPFS. Soon we should also have:

/dns/ipfs.io/docs/install

The first component is the protocol "scheme".


One remaining thing to address: the protocol scheme

I've known for some time now that we're going to need to have + support a protocol scheme identifier, for all the use cases that absolutely require one. _Instead of using only ipfs: for everything, I'm planning to use something that's valid for the entire "Unix Web", that is, a suite of protocols that want to work both on the web and on unix and want the "same identifier" niceness.

I think we should use one of:

unixweb:
nixweb:
nix:
x:

As in:

x:/ipfs/QmeeQhGoyMQc7eQWERE88kFFq4WbdVRrjHctZhH1hPHNds/006/mdag.waist.png
x:/ipns/QmfVrBzjaXjWWfC7UhFrZnnFFMYA1ENPCjzxAtREaQz8MS/006/mdag.waist.png
x:/ipns/ipfs.io/docs/install
x:/dns/ipfs.io/docs/install
x:/bitcoin/<bitcoin-txn>
x:/bittorrent/<magnet-hash>

but happy to hear more suggestions. I know it's rude to use a one-letter schme identifier... but hey... nobody else is using it.

@btrask
Copy link

btrask commented Sep 13, 2015

I understand Juan. Sorry for the misleading comparison.

If IPFS addresses are paths, what would you think about simply using file:// URLs?

file:///ipfs/QmeeQhGoyMQc7eQWERE88kFFq4WbdVRrjHctZhH1hPHNds/006/mdag.waist.png
file:///ipns/QmfVrBzjaXjWWfC7UhFrZnnFFMYA1ENPCjzxAtREaQz8MS/006/mdag.waist.png
file:///ipns/ipfs.io/docs/install

@lidel
Copy link
Member

lidel commented Sep 13, 2015

From my experience file:// proved to be problematic:

👍 Yes, it makes IPFS work out-of-the box with legacy software (as long as you have IPFS filesystem mounted via FUSE driver provided by go-ipfs).

👎 ..but if you don't have root access and/or can't set up FUSE -- bad luck
👎 if you use MS Windows or other non-unix system -- bad luck
👎 a lot of confusion due to "File not found" errors.

IMO file:// should be left as a workaround for people who can set up FUSE on local system and we should come up with a new protocol scheme for canonical use.

As a minimalist I really like the x: scheme described by @jbenet :-)

@longears
Copy link

The x:/ipfs/... scheme might be similar enough to a Windows file path (e.g. "drive X", which many people have) that it would be misinterpreted by browsers and auto-converted to file:///X:/ipfs/.... Can someone on Windows check the behavior of browsers when you type that into the URL bar or use it as a link href?

Windows is supposed to use backslashes but forward slashes are often accepted and silently corrected to backslashes.

@jbenet
Copy link
Member

jbenet commented Sep 14, 2015

@gatesvp or someone else using windows, could you please check what happens with x:// above?

@mappum
Copy link
Contributor

mappum commented Sep 14, 2015

Just tested on Windows, x:/ becomes file:///x:/ (I tried it in the URL bar and as a link href in Chrome). However, strings longer than one character (xx:/) are kept as a protocol.

@jbenet
Copy link
Member

jbenet commented Sep 14, 2015

@mappum is that the case even if you install a protocol handler for x:// ?

@mappum
Copy link
Contributor

mappum commented Sep 14, 2015

is that the case even if you install a protocol handler for x:// ?

Yes, just tried adding a protocol handler to the registry for x:, the URL was still transformed.

@davidar
Copy link
Member

davidar commented Sep 14, 2015

I'm using both ?foo#bar for the ia book reader, so think they should both be supported :)

also 👍 for minimalism

@lidel
Copy link
Member

lidel commented Sep 14, 2015

Hm.. xx: is not bad, but how about xn: ? (uniXNamespace)

@jbenet
Copy link
Member

jbenet commented Sep 14, 2015

may be worth doing:

nix://
nixweb://

wish unix:// wasn't taken by unix sockets.

@davidkwast
Copy link

I'd go with
nix://

@willglynn
Copy link

I like this, maybe we make :/ canonical? it just looks so odd. and people will be weirded out by it... (we must support :// at least)

nix:/ipfs/base58/resource parses as { scheme: "nix", authority: null, path: "/ipfs/base58/resource" }. A single slash denotes an absent authority followed by an absolute path. I think this matches the intent of IPFS (a single global namespace using absolute paths assigned by no central authority), which is why I suggest it as the canonical form.

nix://ipfs/base58/resource parses { scheme: "nix", authority: "ipfs", path: "/base58/resource" }. This breaks IPFS absolute paths since the first path component has moved into the authority part of the URI. I think that makes this a non-starter.

nix:///ipfs/base58/resource parses as { scheme: "nix", authority: "", path: "/ipfs/base58/resource" }. Triple slashes denote an empty authority followed by an absolute path, which is equivalent enough to the no-authority URI that it's not wrong. :/// should be supported, either in addition to or in lieu of :/. Note also that at least one library conflates these two address forms.

@jbenet
Copy link
Member

jbenet commented Sep 14, 2015

Yesterday we had thought of using get://, but that's not good for non-read functionality. writes, etc.

some more:

nix:// nixweb:// unixweb:// uweb:// dweb:// path:// endpoint:// ep:// unixpath:// up:// fp:// web3://  

nix://ipfs/base58/resource parses { scheme: "nix", authority: "ipfs", path: "/base58/resource" }. This breaks IPFS absolute paths since the first path component has moved into the authority part of the URI. I think that makes this a non-starter.

I dont think it does, the browser tools could undo that change for the user.

my problem with :, :/, and :/// is that it's not what 90% of users will expect to see. regular users have no idea what the hell all of these are for, but hey do know http:// and that's what they're going to type. so we have to support it regardless.

@jbenet
Copy link
Member

jbenet commented Sep 14, 2015

a worry with nix:// is that it means

  • noun 1. nothing.
    • "apart from that, nix"
  • exclamation 1. expressing denial or refusal.
    • "“I owe you some money.” “Nix, nix.”"
  • verb (NORTH AMERICAN) 1. put an end to; cancel.
    • "he nixed the deal just before it was to be signed"

which is not ideal. this is the sort of thing 99% of internet users will be confused by, so it should be as clear as we can make it

@willglynn
Copy link

Attaching IPFS / to :// does break references.

Picture a /ipfs/base58/document referring to related-resource, /ipfs/otherbase58/linked-document, and /ipns/domain.name/. If IPFS were mounted to a UNIX filesystem, these resolve as:

  • related-resource => /ipfs/base58/document
  • /ipfs/otherbase58/linked-document => /ipfs/otherbase58/linked-document
  • /ipns/domain.name/ => /ipns/domain.name/

These work as expected – i.e. as paths.

If that same document were retrieved from http://gateway.ipfs.io/ipfs/base58/document, those references become:

  • related-resource => http://gateway.ipfs.io/ipfs/base58/related-resource
  • /ipfs/otherbase58/linked-document => http://gateway.ipfs.io/ipfs/otherbase58/linked-document
  • /ipns/domain.name/ => http://gateway.ipfs.io/ipns/domain.name/

These work as expected because gateway.ipfs.io is treated as an authority and the IPFS paths are treated as URI paths.

If that same document were retrieved from foo:/ipfs/base58/document or foo:///ipfs/base58/document, those references become:

  • related-resource => foo:/ or foo:///ipfs/base58/related-resource
  • /ipfs/otherbase58/linked-document => foo:/ or foo:///ipfs/otherbase58/linked-document
  • /ipns/domain.name/ => foo:/ or foo:///ipns/domain.name/

Again these work because the authority is constant, and the IPFS paths are mapped to URI paths.

If that same document were retrieved from foo://ipfs/base58/document those references become:

  • related-resource => foo://ipfs/base58/related-resource
  • /ipfs/otherbase58/linked-document => foo://ipfs/ipfs/otherbase58/linked-document (note foo://ipfs/ipfs)
  • /ipns/domain.name/ => foo://ipfs/ipns/domain.name/ (note foo://ipfs/ipns)

Please do not make foo://ipfs/base58/document the canonical IPFS URI format.

Can this be worked around client-side? Yes, but there are many more clients that assume paths are paths than there are IPFS implementations that would assume something different. I don't want to patch wget and curl and Heritrix and Scrapy and every other tool I use that follows links to have special awareness of IPFS paths just because users are used to typing foo://bar into a browser window.

the browser tools could undo that change for the user

If users typing in foo://bar is an important enough use case to suggest browser tools dedicated to fixing it, then those browser tools should redirect foo://bar to foo:/bar or foo:///bar instead, rather than trying to support base URIs of foo://bar.

@jbenet
Copy link
Member

jbenet commented Sep 14, 2015

Please do not make foo://ipfs/base58/document the canonical IPFS URI format.

we have to make this work. it's not an option. 99% of people on the internet will try it. I believe that we can teach the browser's nix protocol resolver how to make it work. it may be hacky, but it will prevent massive confusion. (try explaining to your grandmother why foo: and foo:/ and foo:/// work but not foo://, which is, coincidentally, everything she may be used to).

I will add that i understand your post well. im saying we have to work around the limitations.

@momack2 momack2 added this to Inbox in ipfs/go-ipfs May 9, 2019
@ipfs ipfs unlocked this conversation May 16, 2019
@lidel
Copy link
Member

lidel commented May 16, 2019

This issue is very old and should be closed. A lot changed since 2015:

@vanrein
Copy link

vanrein commented May 18, 2019

ip[nf]s://<domain.name>/object

Retrieval would be processed as "DNS look up domain.name IN TXT, resolve resulting IPFS/IPNS identifier, retrieve object".

I was reading this, hoping to find just that. IPFS wants to get away from hosting locations for reasons of persistency, but IPNS is different. I love the idea of using DNS (and DNSSEC) and getting very powerful bookmarks:

  1. Stored domain name, for human reference where it once originated from
  2. Stored IPNS key, so we can lookup the names from anyone who pins them, along with new versions from the same origin
  3. Stored IPFS hash, so we can retrieve the version that we bookmarked for as long as it is pinned

I will add the spec to my reading list, it's useful stuff!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
need/community-input Needs input from the wider community
Projects
No open projects
Development

No branches or pull requests