Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intiial exploration report for IPLD URL Scheme #195

Merged
merged 5 commits into from Jun 1, 2022
Merged

Conversation

RangerMauve
Copy link
Contributor

@RangerMauve RangerMauve commented Mar 29, 2022

Starting a PR so that folks can start bikeshedding early on.

Not ready for review until I fill out a bunch of TODOs.

The `Accept` header may be optionally provided to specify the codec that should be used in the response.
This follows the same conventions as in the [IPFS gateway](https://github.com/ipfs/go-ipfs/pull/8758).
Other codecs that may be targetted are DAG-CBOR.
TODO: `application/vnd.ipld.dag-json` or just `application/json`?
Copy link
Contributor

@lidel lidel Apr 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To maximize interop with other tools both could be supported in requests (dag-json is a strict subset of json), but when it comes to responses, it is better to go with more specific application/vnd.ipld.dag-json.

fysa my plan is to register it as part of ipfs/kubo#8823 (ETA go-ipfs 0.14), would be cool if I could link to this spec along with https://ipld.io/docs/codecs/known/dag-json/ 👍 (but also not a blocker if it won't be ready on time)

This comment has been minimized.

Copy link
Contributor Author

@RangerMauve RangerMauve Apr 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only worry with regards to using application/vnd.ipld.dag-json is that it wouldn't get recognized by common browser tools like syntax highlighting extensions and the such. I think it could still work with await (await fetch(url)).json() in browsers, though.

Copy link
Contributor

@lidel lidel May 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RangerMauve fair point! so.. application/json everywhere + also support application/vnd.ipld.dag-json in requests? that should maximize reuse of things that have nice UX/support for JSON.

Copy link
Contributor Author

@RangerMauve RangerMauve May 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm into that. I'll add it to the doc. 😁


The `{CID}` should be the "base" CID to extend data from.
Sadly if we are to use URLs (TODO: why should we use URLs?), we must have some sort of CID in the hostname.
TODO: Document some inline CIDs to use. IPFS has this empty directory CID `ipfs://bafyaabakaieac` which would be cool to have analogs for for empty `{}` object and `[]` array nodes.
Copy link
Contributor

@rvagg rvagg Apr 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, we now have the concept of "unit" type in schema-land:

## UnitRepresentation is an enum for describing how a TypeDefnUnit should be represented in the data model.
##
## Unit types are commonly seen represented in several ways.
## A null token is a common one.
## A true token is sometimes seen (especially, when people encode "sets" in json:
## often this will be seen as a map where the values are keys and the map values are 'true').
## Also, an empty map can be a useful unit value;
## an empty map accurately communicates a lack of data.
## (The emptymap strategy can be a particularly interesting choice if you want to
## have a schema that is evolvable in the future to start using a struct or map
## in the same place as the unit type currently stands, while having older documents
## continue to be parsable by the evolved schema.)
##
## Unlike many of the other representation information types seen in the schema-schema,
## this one is just an enum, rather than being a union.
## That's because there's no possibility of every needing to annotate more customization
## onto values in the unit type... because there are no possible values in the unit type.
##
## Note that there is no discernible logical difference between
## `type Foo struct {}` and `type Foo unit representation emptymap`;
## only that the latter can be said to be a more explicit description of intent.
## Both will result in identical representations, and both have identical cardinality (which is 1).
##
type UnitRepresentation enum {
| Null ("null")
| True ("true")
| False ("false")
| Emptymap ("emptymap")
}

And we have identity CIDs, we could just pick one to standardise on as a base.

https://cid.ipfs.io/#baeaaaapw might be a good choice, null is a commonly understood unit value and might be a good "there's nothing here to start from".

Or are you wanting to convey codec information with the "base"? So, Content-Type tells the receiver what format the data is being provided in, but the base CID says what codec should be used for the encoded block? Do we want that kind of flexibility? And if so, what about base CIDs that have some data because they're CIDs that reference some existing block or have some data in them as identity CIDs?

Yet another possibility here is to just use identity CIDs to send the data and Content-Type to say what codec to encode into—what limitations do we run into with hostnames here?

Copy link
Contributor

@rvagg rvagg Apr 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, maybe the semantics of a PUT could all fall under IPLD Patch since it already has all needed modification primitives (and more)? Otherwise this would be introducing a mutation operation that is limited to only adding or changing data from existing blocks, whereas Patch gives us full modification potential.

Copy link
Contributor Author

@RangerMauve RangerMauve Apr 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or are you wanting to convey codec information with the "base"?

I think keeping the codec information in the Content-Type header would be best to keep things URL-y. ipld://baeaaaapw/ seems like a great place to start.

what about base CIDs that have some data because they're CIDs that reference some existing block or have some data in them as identity CIDs?

I think this would be useful. e.g. It'd be cool to have some base CIDs with examples like an empty array or an empty map. Is there an existing list I could reference somewhere? Should I just generate these myself with the ipld CLI?
'

maybe the semantics of a PUT could all fall under IPLD Patch since it already has all needed modification primitives

I was actually thinking that the PATCH HTTP method could be used there instead. I think it'd be cool to add support for it as a first-class feature, but since Patch is still a bit of a WIP I didn't want to block this to wait on it.

One thing to consider here is that we just need to have something in the hostname portion of the URL since in my experience most parsers (like browsers) will freak out if it's only sometimes there. 😅

Alternately, in IPNS land we were looking at using localhost as a "no-op" name for when we want to do something with data that isn't already tied to a particular CID. @lidel might have more to say on that front.

Copy link
Contributor

@lidel lidel May 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • I think my idea was that if we want to invent some name to indicate "empty" state, the localhost is not a bad choice because it is something devs understand, resolves safely to loopback device if DNS is ever involved (either on purpose, or as a bug), and also hints that data is added to local IPLD node, and not some magical cloud:

    • PUT ipld://localhost → get CID
  • In case a drive-by reader is confused by "JSON Patch", it is different thing from "HTTP PATCH" method:

    • The "JSON Patch" @rvagg mentioned is a convention (RFC 6902 from the IETF) for representing JSON diffs (http://jsonpatch.com) – see relevant example in ipfs dag diff ipfs/kubo#4801 (comment) (and the discussion that follows).
    • Wherever possible, we should reuse RFCs like this, maximizing interop with existing tools, and/or promoting existing standards (even if they are not popular, it is still better than inventing something new).

Copy link
Contributor Author

@RangerMauve RangerMauve May 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm into having ipld://localhost/ be a special thing for when there's no base CID to build off of. @rvagg @warpfork Any strong feelings on that matter?

Regarding PATCH, I've added a link to the GH issue where the implementation is being tracked: ipld/go-ipld-prime#350

Copy link
Contributor

@rvagg rvagg May 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems reasonable to me; it is slightly arbitrary (e.g. why not :: to be uber-modern?) but I guess anything we choose is!

The `Content-Type` may be optionally provided to specify the encoding of the `Body`.
By default it will be assumed to be `DAG-JSON`.

If a `Path` is specified, intermediate Nodes will be created and added to the Node specified by the CID.
Copy link
Contributor

@rvagg rvagg Apr 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this strictly stick to the integer components of a path, where if it encounters a 0 it'll create an array for you? And what if the path has a 1 (or later) in it? Do we make an array that has that many elements that are ... empty? Because we'd have to know what "empty" is (back to the unit discussion).

Copy link
Contributor Author

@RangerMauve RangerMauve Apr 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think to be safe it would be good to generate intermediate Map Nodes if a node doesn't already exist at that path otherwise it might be too magical. 😅 E.g. what if somebody is iterating through a list of names and one of them just happens to be an integer.

Also, should we error out if somebody tries to add a named key to a node which is an array?

Copy link
Contributor Author

@RangerMauve RangerMauve Apr 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could also be augmented down the line with Schemas so you can give better type hints for individual segments of the path.

Copy link
Contributor Author

@RangerMauve RangerMauve Apr 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think erroring out when there isn't something obvious and having fancier behavior in the future would be best for this initial version. I'll add some notes about how lists should be handled in the meantime though.

Copy link
Contributor Author

@RangerMauve RangerMauve May 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking more on this, maybe we should error out when there's no intermediate node? E.g., users would need to manually add intermediate nodes with separate requests as they're building up the DAG.

Copy link
Contributor Author

@RangerMauve RangerMauve May 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rvagg Mind weighing in on whether we should error out when there's no intermediate node?

Copy link
Contributor

@rvagg rvagg May 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmmm, I don't have a strong opinion. There's a nice convenience to automatically handling intermediate nodes (mkdir -p, rm -r - but notice they require special flags!). It might be tedious to have to do them all, one-by-one—round-trips get expensive in content-addressed land. But on the other hand, does it indicate a possible error if you're encountering missing intermediates? Perhaps so.

Your call if you're able to form a firmer opinion.

Copy link
Contributor Author

@RangerMauve RangerMauve May 31, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been chewing on it and I think the convenience outweighs the downsides. 😅

I've got auto-generation (-p) for ipfs/ipns/hyper in Agregore at the moment and not having it made a lot of things more annoying. Ultimately, I think developer UX is really important for this stuff and streamlining is usually worth it.

@rvagg
Copy link
Contributor

rvagg commented Apr 20, 2022

This steps on the /ipld/ convention that was somewhat formalised in the go-ipfs v0.10 release in the DAG API, where it's not an explicit fallback to IPLD pathing, avoiding the UnixFS magic that has got in the way previously. /ipld/ was an existing prefix but now it's a specific tool to opt for pure IPLD pathing. I think/thought there was moves to do this on the gateway too, which is the same as the /ipld/ CID prefix suggested here.

That may not be a conflict, however, ipfs dag get /ipld/bafy.../foo/bar/baz should be the same as here, ipfs dag put ... doesn't have the concept of a "base" but could probably be made to do so, thereby enabling (roughly?) the same PUT operation being proposed here.

Either way, it might be worth looking at that and making sure we're not doubling up on something that's not quite the same and may end up being confusing to users. If there's a path to harmonisation then that'd be awesome.

@RangerMauve RangerMauve marked this pull request as ready for review Apr 25, 2022
@RangerMauve
Copy link
Contributor Author

RangerMauve commented Apr 25, 2022

@lidel Would you be able to comment on @rvag's comment regarding conflicts with what is already in the IPFS gateway?

My gut feeling is that the ipfs dag API wouldn't be affected by these changes since we would likely be exposing it on the gateway API instead.

@BigLep
Copy link
Contributor

BigLep commented May 3, 2022

@RangerMauve : do you want to incorporate any more feedback here, or are you good if we merge now?

@RangerMauve
Copy link
Contributor Author

RangerMauve commented May 6, 2022

@BigLep Gonna go over this weekend to see if there are more notes and TODOs to add and then I think I'll be ready to merge next week. 👍

@RangerMauve
Copy link
Contributor Author

RangerMauve commented May 10, 2022

Based on the discussion we had at the end of the last IPLD weekly meeting, I've added a section on escaping special characters and reserved some future syntax for Lenses based on the suggestions from @aschmahmann

One thing I'd like resolved before we merge is to get some more example unit CIDs to base off of.

Ideally it'd be nice to have the following:

  • null
  • {} empty Map
  • [] empty List

Would somebody mind linking me to info about how I may generate these or find an existing set? The encoding doesn't matter too much, but DAG-JSON might be easiest to mess with manually.

@rvagg
Copy link
Contributor

rvagg commented May 11, 2022

Here's how I make these things, I have the codecs and js-multiformats repos locally and built because I need to require() from the dist/ directories in the repl, but you can do the same from npm installed versions since those are packaged dist/ directories.

> var { CID } = require('./js-multiformats/dist/')
> var dagJSON = require('./js-dag-json/dist')
> var dagCBOR = require('./js-dag-cbor/dist')
> var { identity } = require('./js-multiformats/dist/hashes/identity')
> var raw = require('./js-multiformats/dist/codecs/raw')
> var digest = require('./js-multiformats/dist/hashes/digest')
> CID.create(1, raw.code, digest.create(identity.code, new Uint8Array([0])))
bafkqaaia
> CID.create(1, raw.code, digest.create(identity.code, new Uint8Array([])))
bafkqaaa
> CID.create(1, dagJSON.code, digest.create(identity.code, dagJSON.encode(true)))
baguqeaaeorzhkzi
> CID.create(1, dagJSON.code, digest.create(identity.code, dagJSON.encode([])))
baguqeaaclnoq
> CID.create(1, dagJSON.code, digest.create(identity.code, dagJSON.encode({})))
CID(baguqeaacpn6q)
> CID.create(1, dagJSON.code, digest.create(identity.code, dagJSON.encode('')))
CID(baguqeaaceira)
> CID.create(1, dagCBOR.code, digest.create(identity.code, dagCBOR.encode(true)))
bafyqaapv
> CID.create(1, dagCBOR.code, digest.create(identity.code, dagCBOR.encode([])))
bafyqaama
> CID.create(1, dagCBOR.code, digest.create(identity.code, dagCBOR.encode({})))
bafyqaana
> CID.create(1, dagCBOR.code, digest.create(identity.code, dagCBOR.encode('')))
CID(bafyqaala)
> CID.create(1, dagJSON.code, digest.create(identity.code, dagJSON.encode('banananana, do do de do do')))
CID(baguqeaa4ejrgc3tbnzqw4ylomewcazdpebsg6idemuqgi3zamrxse)

@lidel
Copy link
Contributor

lidel commented May 16, 2022

@RangerMauve would be useful to add any missing edge cases to https://github.com/ipld/codec-fixtures (if not present already) – we point more and more people at fixtures like these, so it is good return of time invested.

@RangerMauve
Copy link
Contributor Author

RangerMauve commented May 31, 2022

I think this is good to merge 👍

@rvagg rvagg mentioned this pull request Jun 1, 2022
19 tasks
rvagg
rvagg approved these changes Jun 1, 2022
Copy link
Contributor

@rvagg rvagg left a comment

👌 good by me to merge

@RangerMauve RangerMauve merged commit 7b77028 into master Jun 1, 2022
2 checks passed
@RangerMauve RangerMauve deleted the ipld-url-schemes branch Jun 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: 🎉 Done
Development

Successfully merging this pull request may close these issues.

None yet

5 participants