Skip to content
This repository was archived by the owner on Jan 8, 2026. It is now read-only.

Conversation

@rvagg
Copy link
Member

@rvagg rvagg commented Aug 29, 2019

Biting the bullet and coming up with a formal realisation of what "advanced layouts" might mean in terms of schema DSL.

I've opted to overload representation here, rather than the alternative genetics style syntax of <Foo>, I think it fits better but of course am happy to discuss, along with everything else in here! I know we all have slightly different things in our head for this stuff but we need to get some basics laid out at least.

Will follow up with an additional idea for parameterisation that goes a bit too far beyond this most basic level and is likely to be in discussion-limbo for much longer. I don't want to pollute this one with too many additional ideas that might hold it up.

It may then be used as a `representation` for a `type`, from which we can infer the `kind` that we expect.

```ipldsch
type MyString string representation ROT13
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, so, I'm sold on using some specialization of the representation clause on the grounds that something certainly can't have another meaningful representation directive if it also has an advanced layout.

Two other issues with that, though:

  • Needs a separate name plane. (Thinking about how this should encode in the schema-schema as json rather than DSL may be helpful here.) Two reasons: 1, it terrifies me that advanced layouts would be visually indistinctive on the same line they're used (I value lexical-locality highly for ease of fast reading), so let's avoid that by adding a sigil or something; 2, we don't want built-in representations and user-named advanced layouts to be able to get into name conflicts. Implication: the current schema-schema json is also probably Insufficient to describe advanced layouts -- one possible solution is to turn the "representation" field there into another union of "core"|"advanced", and move the current unions to be under "core".

  • Because this uses the type clause, this only works for named types. Doesn't answer what we'd do if there's an inline usage. E.g. type Foo struct { wizardField {X:Y}<ShardedMap> }. Maaaaybe this is actually a usage we can just... not. But it's worth a thought. (For a fuller example using this syntax: this unixfs schema spike uses inline markers for Dir.members and File.content -- it's interesting to note that this is 100% of the usages; and in both cases, the wrapper types would've existed and needed names anyway (for their relationship to unions), so requiring a named type for the advanced layout to be applied would have strictly increased the number of types needed.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hrm, I'll take my own wind back out of my sails a bit for that second point. Does having a named type means we can do better error messages? Maybe. Will codegen in practice often need to produce a type for this node and thus need to name it something? Uff, probably. Can we solve those by munging a field name? I guess, but it's not necessarily good looking. Needs thought.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... turning the schema-schema json into a a union nested in a union sounds gross, so maybe a better idea there would be to add a member to all the current representation unions that's just called "goto_adv" and has a string saying the name of the advanced layout; and then the top level schema objects should now contain a map of types and also a map of advancedlayout details. That'd be a lot less twisty than a union-in-a-union, and maps well to the DSL syntax, and also still demonstrates the point that advanced layouts are Distinct in name plane from other representations.


```ipldsch
advanced ChainedBytes {
root Chunk
Copy link
Contributor

@warpfork warpfork Aug 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm scared of this syntax in some way I can't quite put my finger on yet.

Glazing one's eyes over, the pattern of keyword TitleCase { linebreak fieldname TypeName } looks like -- well, fieldname isn't a fieldname, is it. This isn't a struct.

I wonder if some more visual distinctiveness wouldn't be a pleasant idea here. I'm worried that someone might skim this and infer that there's literally a field named "root" somewhere in the data, which would be incorrect.

(I don't have a suggestion at the moment, either, sorry. Just thinking outloud here.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I know, I've also drafted sketches with roughly this syntax before. I'm not sure why I'm more triggered now. Possibly because in other drafts, there have been more keywords (second word of line being lowercase rather than titlecase) and other errata that made it more clearly not a struct. Possibly because the brainmeats are just more critical when someone else does it :))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, so this kind of thinking is what lead you to put a | in front of union members, (personal *click* moment). Remember that your representation {} block does the same thing as this, discriminatorKey being a glaring example.
I have no strong feelings either way so I'll wait for suggestions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah.

Worth noting perhaps that the syntax for the representation block has also drifted back and forth a bit over time. Earlier drafts had key="value" pairs -- thus feeling more like value assignment, and distinct from "fieldname TypeName" declarations -- and I think might've even used parens rather than squirrely braces. They were originally even one-liners, I think... but started looking too long that way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it OK having the first keyword (i.e. advanced, struct) indicating what the key value pairs inside mean. Whether they are fixed parameters of definitions. I agree the readability is more important then writability as your write once, read often.

Though if i think about writing a schema, I would learn what the different keywords advanced and struct mean and what they are doing. I wouldn't need to remember which syntax goes with them, I just need to remember a single concept "key-value pairs are represented as key <space> value in ipldsch.

Though we already have a different syntax for unions and enums, which breaks the "single concept" argument.

When going for a different syntax = would make sense as it is an assignment. Though I'm not sure if I like this myself:

advanced ROT13 {
  = root String
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or a .? That has "property" connotations:

advanced ROTX {
  .characters Int
}

I find it all a bit ugly and don't really need the | in unions or for this to be different. They're all struct-style definitions anyway. But that's a weakly held opinion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I withdraw most of my concerns here. I'm much less phased when adding in other examples like adding the param name "value" triples mentioned in a subsequent PR. It's mostly the specifics of "Chunk" in root Chunk being both a TypeName but having radically different semantics than any other appearance of a TypeName that gives me pause... which is maybe an issue better regarded on its own.

@rvagg
Copy link
Member Author

rvagg commented Aug 30, 2019

Pulling up a couple @warpfork's points to top level so they don't get lost as source changes:

  1. advanced "needs a separate name plane". My reading: representation ROT13 in the example is not distinctive enough from, say representation stringjoin and there's potential for user conflicts (I assume in the sense that we want to reserve a keyword space for future expansion of the core representation types).

I'd say we just mandate prepending $ to any of these. advanced $ROT13 and representation $ROT13.

Re "encode in the schema-schema as json rather than DSL". And the follow-up: "a better idea there would be to add a member to all the current representation unions that's just called "goto_adv" and has a string saying the name of the advanced layout". I think you're suggesting difficulty around this from schema-schema:

type StructRepresentation union {
	| StructRepresentation_Map "map"
	| StructRepresentation_Tuple "tuple"
} representation keyed

Can we not just add an "advanced" here?

type StructRepresentation union {
	| StructRepresentation_Map "map"
	| StructRepresentation_Tuple "tuple"
	| StructRepresentation_Advanced "advanced"
} representation keyed

type StructRepresentation_Advanced struct {
	name String
}
type MyString string representation advanced {
  name $ROT13
}
"MyString": {
  "kind": "string",
  "representation": {
    "advanced": {
      "name": "$ROT13"
    }
  }
}

Let's say that parameterization from #183 was a thing:

type MyPredictableMap { String : Name } representation advanced {
  name $HashMap
  hashAlg "murmur3-32"
  bucketSize 2
  bitWidth 10
}

schema-schema could be:

type StructRepresentation_Advanced struct {
	name String
        parameters { String: Any }
}

reified as:

"MyPredictableMap": {
  "kind": "map",
  "keyType": "String",
  "valueType": "Name",
  "representation": {
    "advanced": {
      "name": "$HashMap",
      "parameters": {
        "hashAlg": "murmur3-32",
        "bucketSize": 2,
        "bitWidth": 10
      }
    }
  }
}
  1. "Doesn't answer what we'd do if there's an inline usage. E.g. type Foo struct { wizardField {X:Y}<ShardedMap> }". My answer is implicit, we don't support that. My preference is toward explicitness in most things and this is one such case. You have to write it with much more clarity and that's better for everyone, we're not doing CoffeeScript, or this abomination.
type Foo struct {
  wizardField WizardMap
}

type WizardMap {X:Y} representation advanced { name $ShardedMap }

I have a second objection to the <> syntax in that it's traditionally understood to signify generics/templating, universally in the most popular languages that support such features at least, and that's not what we're doing here so I'd rather not introduce that kind of confusion.

Now, if we wanted to do something like this, I might be a little more amenable because it leans on existing mindshare about what <> does.

type Foo struct {
  wizardField $WizardMap<X, Y>
}

But that would mean revisiting a bunch of other stuff, so probably not.


## Root node type definitions

Advanced layouts are designed to abstract data that exists at the data model layer. As such, they may also dictate what they expect from the data that exists at the node their _root_ resides at.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this... a direction we want to go at all?

I think this basically means the schema for the innards of the advanced layout would be "vendored in" (so to speak) the schema of the application using it.

This would be possible, but I don't know if it strikes me as useful. The functionally relevant relationships to those structures are all inside the advanced layout logic. So I expect when writing advanced layout logic, we'll often find it useful to have a schema for its insides. But having that internal schema be replicated/vendored in an application schema that's just consuming the advanced layout is redundant and/or at the wrong level for those purposes.

Having it in the application/consuming schema means some forms of validation could be done without having the advanced layout code. But I'm not at sure that's useful. I think mostly not. I can't really imagine an application that's going to be able to make useful partial progress or degraded but-still-useful functionality out of that.

(We could still execute the idea of "send a request for data that treats the advanced layout as transparent and reaches in and only pulls out a subset of its content"... but I'd expect to do that by sending a different schema that uses those snippets of advanced-layout internals. Different-schemas-on-same-data is sufficient to do that and seems clear.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An idea that's tempting to toy with is having a link to a separate schema that describes the internals of the ADL.

But, shucks, I think that's going to be a lunge we don't want to make. That would implies cross-schema linking, and that implies either naively pasting a CID hash -- which is a UX that I don't want to put on people -- or opening the door to requiring some sort a naming system that we... are not going to want to expand our scope to include just for this purpose.

So I think having a separate schema, and handling it in the same mechanism as we use to handle the rendevouz problem for finding the ADL code at all is probably the best we can do.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this basically means the schema for the innards of the advanced layout would be "vendored in" (so to speak) the schema of the application using it.

Not necessarily. A single schema may contain the types that are required by the layout. Once complete, I would expect that the unixfsv2 schema contains the layout schema necessary for its advanced types except for HAMT which will be loaded separately.

I’d like to see this be as flexible as possible. If the schema defines the layout schemas for an advanced layout, great. If it doesn’t and instead just has a string identifier for it, great.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

has a string identifier for it

back to the identification problem - what is this identifier? earlier incarnations of this had:

advanced HashMap {
  identifier "IPLD/HashMap/0"
}

but then we got all tangled and it resulted in #130 (which continues in this thread).

@warpfork wants to punt this whole identify-code-with-schema-definition thing down the road, and I agree that it would be good to postpone that as much as possible and hardwire things in the meantime or add mapping mechanisms to what we build: compile-thing foo.ipldsch --map HashMap:/path/to/hashmap/code or whatever does that mapping of the name that appears in the schema (HashMap) and the logic that runs it. So for now, the only signal we have is the term that comes after advanced.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

honestly, the name you give the advanced layout is already an identifier as far as libraries that parse the schema and generate a library from it are concerned. i wouldn’t put the identifier keyword in the critical path right now, that’s much more helpful down the line when we have some way to resolve globally unique names to implementations.

@warpfork
Copy link
Contributor

warpfork commented Sep 2, 2019

Various syntaxes:

Alpha:

type MyString string representation $ROT13

Beta:

type MyString string representation advanced {
  name $ROT13
}

Gamma:

type MyString string representation advanced {
  name ROT13 # don't actually need the $ to discriminate anymore
}

Delta:

type MyString string representation advanced ROT13

Delta's pretty terse and quite unambiguous and reuses the representation keyword... but has a totally different style for parsing the representation "block".

@rvagg
Copy link
Member Author

rvagg commented Sep 2, 2019

(from conversation with @warpfork elsewhere) going to remove references to "root" (and "rootType") in here, that's for the implementation of the ADL to deal with. A user of the ADL simply needs to point to something that handles that logic. We're still punting on that linking mechanism, it'll likely be language specific for now (e.g. in Go when doing codegen you may provide a map that links these ADL names to importable packages).

@rvagg
Copy link
Member Author

rvagg commented Sep 2, 2019

re type MyString string representation advanced ROT13 - same as inline representations with discriminantKey or stringjoin with join, single parameter. But for this one I'm also proposing parameters in #183. I think we should punt on this shorthand and make the verbose the standard way, at least for now.

type MyString string representation advanced {
  name ROT13
}


```ipldsch
advanced ChainedBytes {
root Chunk
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it OK having the first keyword (i.e. advanced, struct) indicating what the key value pairs inside mean. Whether they are fixed parameters of definitions. I agree the readability is more important then writability as your write once, read often.

Though if i think about writing a schema, I would learn what the different keywords advanced and struct mean and what they are doing. I wouldn't need to remember which syntax goes with them, I just need to remember a single concept "key-value pairs are represented as key <space> value in ipldsch.

Though we already have a different syntax for unions and enums, which breaks the "single concept" argument.

When going for a different syntax = would make sense as it is an assignment. Though I'm not sure if I like this myself:

advanced ROT13 {
  = root String
}

@rvagg rvagg force-pushed the rvagg/schema-advanced-layouts branch 2 times, most recently from 91c9bc1 to 378a2ad Compare September 10, 2019 06:22
@rvagg
Copy link
Member Author

rvagg commented Sep 10, 2019

This is ready for (hopefully final) review @warpfork @mikeal @vmx.

  1. Landed on the basic syntax from our team sync today:
advanced ShardedMap
type MyMap { String : &Any } representation advanced ShardedMap
  1. documented in schemas/advanced-layouts.md
  2. removed the root Type bit but retained the text in design/history/exploration-reports/2019.09-adl-schema-root-type-defn.md because it may be interesting in the future.
  3. rebased on top of schemas: complete map and struct representations for schema-schema #186 (because they both make representation changes to schema-schema and it's simpler and likely more ready than this to land)
  4. attempted to introduce advanced and representation advanced into schema-schema. An AdvancedDataLayout struct can hold advanced Foo while AdvancedRepresentation is now an option in every basic type's kinded *Representation union as "advanced". This has meant giving many things a representation where they were previously implied: TypeBool, TypeString, TypeBytes, TypeInt, TypeFloat, TypeLink (we can play with mutable links now ...) and I even gave one to TypeEnum although that one feels a little strange. Needs lots of sanity checking from @warpfork.

Copy link
Member

@vmx vmx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't checked schemas/schema-schema.ipldsch.json though.

@mikeal
Copy link
Contributor

mikeal commented Sep 10, 2019

I gave it a cursory review and it LGTM. I don’t have the time right now to really drill into the schema-schema for a proper review but would prefer to defer to @warpfork on that anyway :)

@rvagg rvagg force-pushed the rvagg/schema-advanced-layouts branch from 378a2ad to c3dd708 Compare September 13, 2019 06:35
@rvagg
Copy link
Member Author

rvagg commented Sep 13, 2019

I starting to play with the implementation of this in the parsers and have got to "how does an 'advanced' get represented in the JSON version when we assume all the top level items are 'type's?". schema-schema doesn't really have to touch this so it's not answered there.

Does it work to have a kind: 'advanced'? So this:

advanced Foo
type FooMap {String:Int} representation advanced Foo

reifies to:

{
  "schema": {
    "Foo": {
      "kind": "advanced"
    },
    "FooMap": {
      "kind": "map",
      "keyType": "String",
      "valueType": "Int",
      "representation": {
        "advanced": {
          "name": "Foo"
        }
      }
    }
  }
}

one for you @warpfork

@warpfork
Copy link
Contributor

warpfork commented Sep 15, 2019

To the question of how the IPLD 'AST' should be shaped: I'd like to float the idea that maybe it's time to add another top-level map to the 'AST'.

We have:

type Schema union {
        | SchemaMap "schema"
} representation keyed

Instead, maybe we should have:

type Schema struct {
        types SchemaMap
        advancedLayouts {ADLRef:AdvancedDataLayout}
}

(Also possibly rename s/SchemaMap/TypesTable/ or similar. ADLRef also doesn't exist yet if I've read correctly, but it would just be a typedef of string, like TypeName.)

Two maps would align well with the two major keywords in the DSL. It also has some of the same readability features: I can scroll down a JSON file very quickly and eyeball if it contains any ADLs or not just by keeping my eyes at a fixed offset from the left, which seems like a virtue. And perhaps most importantly, it's just plain keeping apples separated from oranges (ADLRef is a different pool than TypeName; best to make the maps say so).


(The suggestion above was to replace the top-level union with a struct, but another option would be do mostly the same thing but adding a struct rather and keeping the single-member union too. I'm not sure I have a concluded opinion about this yet. The original idea of the single-member union was version hinting. A struct can do that too. It's just a question of how pleasant/obvious/redundant/excessive we regard this. Leaning towards thinking the struct is sufficient, but haven't given it a long think yet.)

## schema.
##
type AdvancedRepresentation struct {
name String
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would recommend adding a type AdvancedLayoutRef string type and using that here (and then as well in other places it occurs).

Should show up roughly two places (which should make sense because it's sort of like a primary key in one of the positions, and a foreign key reference in the other, roughly speaking).

## connection with the algorithm/logic behind this ADL. Future iterations may
## formalize this connection by some other means.
##
type AdvancedDataLayout struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be a naming bikeshed, and feel free to dismiss, but I might like to try sneaking one more word into the name of this one -- it took me a second to sort out whether these was the block with the details, or the much smaller thing thing that refers to it.

(I came up with ADLBlock and ADLRef in my scratchpad, but I'm not attached to those (nor even particularly pleased with that much acronym).)

## formalize this connection by some other means.
##
type AdvancedDataLayout struct {
name String
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This field might disappear entirely if the two maps proposal makes sense. Or if not, would become AdvancedLayoutRef per https://github.com/ipld/specs/pull/182/files#r324478807 .

@rvagg
Copy link
Member Author

rvagg commented Sep 17, 2019

Changes

OK, @warpfork I've done this in schema-schema:

type AdvancedDataLayoutName string
type AdvancedDataLayoutMap {AdvancedDataLayoutName:AdvancedDataLayout}
type Schema struct {
       types SchemaMap
       advanced AdvancedDataLayoutMap
}
type AdvancedDataLayout struct {}
type AdvancedRepresentation struct {
       name String
}

So you get {"types": { ... }, "advanced": { ... }}. With the elements of "advanced" being empty maps.
I've also scaled back use of AdvancedRepresentation to just TypeMap, TypeList and TypeBytes, the easy ones for now and the ones we actually have use-cases for. We can expand later if/as we need.

Status

While working the practicalities of this through for various use-cases @warpfork has expressed concerned that the current incarnation may not quite work for some, including encryption where you want to compose and unpack types. This question arises out of what we allow representation advanced to be attached to beyond just map, list, bytes.

Here's the way I'm thinking about it (still inspired by @Gozala's seeding of this topic).

Consider the following "secret box" style message encoding (loosely inspired by SSB).

# a list of messages
type MessageStream [&DecryptedMessage]

# the message you want to read in the format your application can do something with
type DecryptedMessage struct {
  authorKey Bytes
  timestamp Int
  type MessageType
  text String
}

# the opaque encoded transport form of the message
type EncryptedBox struct {
  nonce Int
  pubkey Bytes
  encryptedContents Bytes
}

How do you connect these things so that the data model list of EncryptedBox structs pass through an ADL to become a list of DecryptedMessage structs?

Does this work?

advanced SecretBoxEncryption

type MessageStream [&DecryptedMessage]

type DecryptedMessage struct {
  authorKey Bytes
  timestamp Int
  type MessageType
  text String
} representation advanced SecretBoxEncryption

While the SecretBoxEncryption ADL implementation would have its own schema, used internally to validate, read and write blocks (i.e. this is at the data model):

type EncryptedBox struct {
  nonce Int
  pubkey Bytes
  encryptedContents Bytes
}

Implementations

I'm going to go ahead and start working on js-ipld-schema and go-ipld-schema implementations of the current form of this PR but put a stderr WARNING when you use advanced that the implementation is unstable. It's a blocker for a couple of things so getting a placeholder in at least will be helpful.

@rvagg
Copy link
Member Author

rvagg commented Sep 18, 2019

I have support for this current incarnation over at ipld/js-ipld-schema#12

The test fixture is interesting and its JSON form is worth consideration: https://github.com/ipld/js-ipld-schema/blob/rvagg/advanced/test/fixtures/bulk/advanced.yml

The thing that stands out to me is: "representation":{"advanced":{"name":"Foo"}} which doesn't seem quite right now that "advanced" are top-level things. The "name", doesn't match how we refer to types: "keyType" and "valueType" - note the "Type" in those. Wondering if "representation":{"advanced":{"Foo":{}}} gets us closer? But then what would schema-schema look like? type AdvancedRepresentation {AdvancedDataLayoutName:Null} and use "representation":{"advanced":{"Foo":null}}. Or is this unimportant?

@rvagg rvagg closed this Sep 19, 2019
@rvagg rvagg deleted the rvagg/schema-advanced-layouts branch September 19, 2019 02:31
@rvagg
Copy link
Member Author

rvagg commented Sep 19, 2019

OK this is merged after some discussion on IRC. The JSON form is now "representation":{"advanced":"Foo"}. I'm not sure I like the symmetry (or lack of) in this:

type BytesRepresentation union {
	| BytesRepresentation_Bytes "bytes"
	| AdvancedDataLayoutName "advanced"
} representation keyed

Where type AdvancedDataLayoutName string. But so be it. I thought that maybe if/when we introduce aliases that type AdvancedDataLayoutRepresentation AdvancedDataLayoutName might not be a terible thing to do. But as far as I can tell this is just aesthetics.

@mikeal
Copy link
Contributor

mikeal commented Sep 19, 2019

In case anyone is interested in how this is being used, I’ve started to add support for advanced layouts to the JS API generation: https://github.com/mikeal/ipld-schema-gen/blob/master/index.js#L321

As you can see, when using an Advanced Layout the user must also pass in an implementation of the class, otherwise the schema parsing will throw.

The cool thing about this approach is, you could generate an API for a schema, say a Struct, and then add properties to the class, or subclass it, and pass it into a subsequent API generation as an advanced layout. The composability ends up being pretty nice.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants