New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XOR-URLs: resolvable XOR-name based URLs #281

Open
wants to merge 5 commits into
base: master
from

Conversation

Projects
None yet
5 participants
@bochaco
Member

bochaco commented Nov 13, 2018

Proposal for having a resolver function in the SAFE app client API that allows us to have standardised safe://-URLs which are generated based on the XOR address of the content being referenced.

For an overview of the proposal, and foresight about its potential, this screencast video explores some ideas around it with a working proof of concept.

@ustulation

If we are going for XOR uri's would you think this would be useful: Ppl might want to type in the uri's themselves. It's pretty diff to remember the entire 32 byte representation. However 4 or 6 bytes could be easily be remembered. So could CID take only 6 bytes for e.g. and try and fetch data ? Ofc Network needs to support this and the get query might fail if there are multiple matches. So as long as there is no ambiguity and the given length of prefix uniquely identifies a group (e.g. if section prefix length is <= that) and data in that group then the resolution by the network is successful and data is returned, much like git commit hashes where you need to type until uniqueness achieved and it's usually achieved pretty soon.
Maybe that'll require a diff RFC if useful to have it but just something of a brain dump right now.

<path-query-fragment> := ['/' <path>] <query-fragment>
<query-fragment> := ['?' <query>] ['#' <fragment>]
```

This comment has been minimized.

@ustulation

ustulation Nov 14, 2018

Member

Should there also be a way to retrieve value corresponding to a specific key in its binary format ? So for mutable-data: cid:15000+22+54856abcdef5645... to say: for the MD at cid with tag=15000, version=22 get me the value corresponding to the key=54856abcdef5645.... <<< Just an e.g. ofc

This comment has been minimized.

@bochaco

bochaco Nov 14, 2018

Member

I'm not sure what you mean by a key , are you by any chance thinking along the lines of what they defined as human-readable CIDs?
If so , we could consider having the browser to show a tooltip with the human-readable CID when hovering over a XOR URL in the address bar, if that happens to be useful, and/or have the browser to automatically convert a human-readable CID into the CID string when the user pasted/entered it in the address bar as it can be good for those who understand the format to fetch content at a any given xorname. I would say this can be all done but I doubt it should be part of the resolver itself but perhaps provided by the client app, e.g. in our case the browser.

As per having shorter CIDs like you suggested above @ustulation , it would be nice, but shortening it as you seem to imply would break the encoding of the CID string, so you'd need some type of encoding which can be shortened that way and still be able to get the (shortened) xor address out of it

This comment has been minimized.

@ustulation

ustulation Nov 14, 2018

Member

I meant MData Key from the key-value pair.

On shortning, IDK how it breaks it. So say in base x you get hsh121difvoc.. etc., if you trimmed it to hsh1 you could convert that back to repr in base 16 (which currently what the network understand) and use that portion. So it will give you are shorter XOR address like explained in the overall comment here. But again, if that might be too much of a digression and could hijack comments directly (and more closely) related to this RFC

This comment has been minimized.

@bochaco

bochaco Nov 14, 2018

Member

For getting a specific MData key, then yes you could use the #fragment which is what it's the standard way of referring to a specific part of the document you are targeting with the URL, e.g. the WebID spec also makes use of the #fragment to refer to one of the subgraphs, so I wouldn't do that as part of the resolver's job, since the #fragment is usually used by the app consuming the content.

Shortening the xorname to 4 or 6 bytes wouldn't still give you a 4 or 6 bytes XOR-URL, remember the CID encodes some others bytes (hash type, CID version, content type and base type), so yes, it sounds like it can be done depending which encoding you use (e.g. CID in this case) you won't still be able to shorten it to 6 bytes but a few more, which could still be good I guess.

Anyway I think there is a major concern with shortening it as you suggest, it could be breaking the immutability of the relationship between the XOR-URL and the target content, e.g. you give me a short XOR-URL, and I can fetch an immutable content since it can resolve to an unique xorname, until one day someone stores another piece of data (at a xorname which has the same initial bytes) which makes that XOR-URL un-resolvable as it doesn't resolve to an unique address, right?

This comment has been minimized.

@ustulation

ustulation Nov 15, 2018

Member

as it doesn't resolve to an unique address, right?

Ya that's true (it's same as git, as mentioned), but probably not worth the hassle as you say.

For getting a specific MData key, then yes you could use the #fragment which is what it's the standard way of referring to a specific part of the document you are targeting with the URL

Would it be better if you exemplified it in the RFC ? Perhaps a few more examples to illustrate would give an easier understanding while reading the RFC. I would also propose to explain the deconstruction in examples if it's needed. Something like:

to do X: <abcd>:://<efgh>:<ijkl>?<mnop>#<qrst>
where:
   abcd=cid
   efgh=blah
 ... etc.

This comment has been minimized.

@bochaco

bochaco Nov 15, 2018

Member

Agreed, I also think it will help to have some details of the decoding/resolution process, I'll work on adding it @ustulation

@bochaco bochaco force-pushed the bochaco:rfc-xor-urls branch from 384d50c to f5256a9 Nov 14, 2018

@bochaco bochaco force-pushed the bochaco:rfc-xor-urls branch from efbce9b to 3573841 Nov 14, 2018

`safe://hygjdkfty6m7ag3bckq7eqgeizbtjk915c3jbrcgtisad8iikbk4xws4jbpky`

MutableData XOR-URL:
`safe://hyfktce8j75yhmj1dbi1xw5wnb4m3zdydr7wpbzf1a16hc3sbxzu8a9hiqw:15000`

This comment has been minimized.

@krishnaIndia

krishnaIndia Nov 16, 2018

Member

If we have a new data type which has a XOR name and tag type which behaves different to a MudatbleData, for example, consider the StructuredData is being revived and supported. How can we represent the same as a XOR url?
If my understanding is right it will also have a similar URL as a MutableData would have.

Potential options:

  • The CID generation takes registered codec as an input. But not sure whether custom codec DataTypes will be accepted as a part of the codec list to multicodec and cids repositories. Other options is to fork the repository which comes with maintenance overhead.
  • Have different hashing techniques to differentiate Datatypes. When the URL is decoded the webFetch can do a switch based on the hashing technique to resolve the corresponding datatype.

This comment has been minimized.

@bochaco

bochaco Nov 19, 2018

Member

Yes this is a good point, and I also thought a bit about it, and I was thinking more towards your first option. If the target content is an immutable data then the codec will likely be the one corresponding to the file itself, but if it's another type of 'structured' data like MutableData, then the codec can be our own custom set of types, which I think it should be possible since I see there are already a few custom ones in the table, like the ones for Ethereum: https://github.com/multiformats/multicodec/blob/master/table.csv#L416

<immutable-data-uri> := <cid> <query-fragment>
<mutable-data-uri> := <cid> ':' <type-tag> ['+' <content-version>] <path-query-fragment>

This comment has been minimized.

@krishnaIndia

krishnaIndia Nov 16, 2018

Member

While generating the digest of the multihash, we are using only the XOR name. And then we are adding tagType and version to the URL. For example, safe://xor-address-of-md:tagType+version/sample.png. This convention would make it difficult to parse IMO.
Is it possible to have a json like representation,

{
  name: xor-name,//multibase encoded
  tageType: 15000,
  version: ''//string or number
}

This could be used to generate the CID like,

const getData = (address, tagType) => {
    return new Buffer(JSON.stringify({address, tagType}));
};
const encodedHash = mHash.encode(getData('somedata', 1000), 'sha1');
const cid = new CID(1, 'raw', encodedHash);
const encodedCID = cid.toBaseEncodedString('base32');

When decoded we have a defined structure to resolve.

This comment has been minimized.

@bochaco

bochaco Nov 19, 2018

Member

It could be, although I thought having the type tag and versions exposed in the URL could be better for the user. The type tag matches quite well if you think of an analogy with ports in the clearnet, and having the version available to the user to change and get a different version of the same content is also I thought could be good, and as mentioned in the RFC if you are willing to get the latest then you omit the version from the URL. Basically if you see the version in the URL explicitly could probably be better for the user as he/she knows which version is being fetched and even change it.


## Unresolved questions

The current list of supported codecs in the [multicodec project](https://github.com/multiformats/multicodec) (which is part of the CID format that it's proposed to use) doesn't include the MIME-types that can be used for encoding in an ImmutableData XOR-URL. There is [a discussion](https://github.com/multiformats/multicodec/issues/4) which has been triggered to add such a support (see [PR sent](https://github.com/multiformats/multicodec/pull/84)) but it hasn't been approved yet.

This comment has been minimized.

@krishnaIndia

krishnaIndia Nov 16, 2018

Member

Does the multicodec project allow to add multiple prefixes? which could be helpful to represent like this. I can think of scenario where a Video player app gets a XOR URI and needs to know of the mime type and also the codec. My knowledge in regards to codecs is limited, my use case could be wrong too 😉.

This comment has been minimized.

@bochaco

bochaco Nov 19, 2018

Member

It currently doesn't support it, but I guess if this is a common practice out there by apps, i.e. to rely on several codecs for content, we should be able to propose that to the multicodec project and add such a support, it sounds to me it shouldn't be difficult at all to support it.

NFS/Files container MutableData XOR-URL:
`safe://hyfktcenm57js4bm3owhez9td9pi3t8bzk1crqp7mr5865c15ih3yxpz68w:15008/some/folder/index.html`

## Drawbacks

This comment has been minimized.

@krishnaIndia

krishnaIndia Nov 16, 2018

Member

The XOR URI is limited only for public content sharing could be a limitation/drawback?

This comment has been minimized.

@bochaco

bochaco Nov 19, 2018

Member

Good point, I'm not sure this is accurate to say like that, since it's probably not about the URL itself but about what the resolver supports.
The URL would never contain decryption keys for something private. Just like the mechanism to fetch a private MD requires the decryption keys to be obtained from the account our of band, or with a separate mechanism not supported by the MutableData API itself. So for private content, I think it could still be possible but the resolver function will need to allow to receive the decryption keys as args, just like the newPrivate function of MD API, and they would need to be fetched from the account somehow.
At the same time, it's true we'd need either a fallback mechanism or an additional bit encoded in the URL to realise if it's a private or public content being targeted to act accordingly. I will add a paragraph explaining all this in the 'Drawbacks" or "Alternatives" section @krishnaIndia

@bochaco

This comment has been minimized.

Member

bochaco commented Nov 20, 2018

@ustulation @krishnaIndia I just pushed a new commit with more details as agreed with you from your questions and feedback. I also assigned #53 as requested @david-maidsafe .

@david-maidsafe

Initial commit with RFC number assignment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment