-
Notifications
You must be signed in to change notification settings - Fork 93
schemas: define advanced layouts #182
Conversation
schemas/advanced-layouts.md
Outdated
| It may then be used as a `representation` for a `type`, from which we can infer the `kind` that we expect. | ||
|
|
||
| ```ipldsch | ||
| type MyString string representation ROT13 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, so, I'm sold on using some specialization of the representation clause on the grounds that something certainly can't have another meaningful representation directive if it also has an advanced layout.
Two other issues with that, though:
-
Needs a separate name plane. (Thinking about how this should encode in the schema-schema as json rather than DSL may be helpful here.) Two reasons: 1, it terrifies me that advanced layouts would be visually indistinctive on the same line they're used (I value lexical-locality highly for ease of fast reading), so let's avoid that by adding a sigil or something; 2, we don't want built-in representations and user-named advanced layouts to be able to get into name conflicts. Implication: the current schema-schema json is also probably Insufficient to describe advanced layouts -- one possible solution is to turn the "representation" field there into another union of "core"|"advanced", and move the current unions to be under "core".
-
Because this uses the
typeclause, this only works for named types. Doesn't answer what we'd do if there's an inline usage. E.g.type Foo struct { wizardField {X:Y}<ShardedMap> }. Maaaaybe this is actually a usage we can just... not. But it's worth a thought. (For a fuller example using this syntax: this unixfs schema spike uses inline markers forDir.membersandFile.content-- it's interesting to note that this is 100% of the usages; and in both cases, the wrapper types would've existed and needed names anyway (for their relationship to unions), so requiring a named type for the advanced layout to be applied would have strictly increased the number of types needed.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hrm, I'll take my own wind back out of my sails a bit for that second point. Does having a named type means we can do better error messages? Maybe. Will codegen in practice often need to produce a type for this node and thus need to name it something? Uff, probably. Can we solve those by munging a field name? I guess, but it's not necessarily good looking. Needs thought.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... turning the schema-schema json into a a union nested in a union sounds gross, so maybe a better idea there would be to add a member to all the current representation unions that's just called "goto_adv" and has a string saying the name of the advanced layout; and then the top level schema objects should now contain a map of types and also a map of advancedlayout details. That'd be a lot less twisty than a union-in-a-union, and maps well to the DSL syntax, and also still demonstrates the point that advanced layouts are Distinct in name plane from other representations.
schemas/advanced-layouts.md
Outdated
|
|
||
| ```ipldsch | ||
| advanced ChainedBytes { | ||
| root Chunk |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm scared of this syntax in some way I can't quite put my finger on yet.
Glazing one's eyes over, the pattern of keyword TitleCase { linebreak fieldname TypeName } looks like -- well, fieldname isn't a fieldname, is it. This isn't a struct.
I wonder if some more visual distinctiveness wouldn't be a pleasant idea here. I'm worried that someone might skim this and infer that there's literally a field named "root" somewhere in the data, which would be incorrect.
(I don't have a suggestion at the moment, either, sorry. Just thinking outloud here.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I know, I've also drafted sketches with roughly this syntax before. I'm not sure why I'm more triggered now. Possibly because in other drafts, there have been more keywords (second word of line being lowercase rather than titlecase) and other errata that made it more clearly not a struct. Possibly because the brainmeats are just more critical when someone else does it :))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh, so this kind of thinking is what lead you to put a | in front of union members, (personal *click* moment). Remember that your representation {} block does the same thing as this, discriminatorKey being a glaring example.
I have no strong feelings either way so I'll wait for suggestions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah.
Worth noting perhaps that the syntax for the representation block has also drifted back and forth a bit over time. Earlier drafts had key="value" pairs -- thus feeling more like value assignment, and distinct from "fieldname TypeName" declarations -- and I think might've even used parens rather than squirrely braces. They were originally even one-liners, I think... but started looking too long that way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find it OK having the first keyword (i.e. advanced, struct) indicating what the key value pairs inside mean. Whether they are fixed parameters of definitions. I agree the readability is more important then writability as your write once, read often.
Though if i think about writing a schema, I would learn what the different keywords advanced and struct mean and what they are doing. I wouldn't need to remember which syntax goes with them, I just need to remember a single concept "key-value pairs are represented as key <space> value in ipldsch.
Though we already have a different syntax for unions and enums, which breaks the "single concept" argument.
When going for a different syntax = would make sense as it is an assignment. Though I'm not sure if I like this myself:
advanced ROT13 {
= root String
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or a .? That has "property" connotations:
advanced ROTX {
.characters Int
}
I find it all a bit ugly and don't really need the | in unions or for this to be different. They're all struct-style definitions anyway. But that's a weakly held opinion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I withdraw most of my concerns here. I'm much less phased when adding in other examples like adding the param name "value" triples mentioned in a subsequent PR. It's mostly the specifics of "Chunk" in root Chunk being both a TypeName but having radically different semantics than any other appearance of a TypeName that gives me pause... which is maybe an issue better regarded on its own.
|
Pulling up a couple @warpfork's points to top level so they don't get lost as source changes:
I'd say we just mandate prepending Re "encode in the schema-schema as json rather than DSL". And the follow-up: "a better idea there would be to add a member to all the current representation unions that's just called "goto_adv" and has a string saying the name of the advanced layout". I think you're suggesting difficulty around this from schema-schema: Can we not just add an "MyString": {
"kind": "string",
"representation": {
"advanced": {
"name": "$ROT13"
}
}
}Let's say that parameterization from #183 was a thing: schema-schema could be: reified as: "MyPredictableMap": {
"kind": "map",
"keyType": "String",
"valueType": "Name",
"representation": {
"advanced": {
"name": "$HashMap",
"parameters": {
"hashAlg": "murmur3-32",
"bucketSize": 2,
"bitWidth": 10
}
}
}
}
I have a second objection to the Now, if we wanted to do something like this, I might be a little more amenable because it leans on existing mindshare about what But that would mean revisiting a bunch of other stuff, so probably not. |
schemas/advanced-layouts.md
Outdated
|
|
||
| ## Root node type definitions | ||
|
|
||
| Advanced layouts are designed to abstract data that exists at the data model layer. As such, they may also dictate what they expect from the data that exists at the node their _root_ resides at. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this... a direction we want to go at all?
I think this basically means the schema for the innards of the advanced layout would be "vendored in" (so to speak) the schema of the application using it.
This would be possible, but I don't know if it strikes me as useful. The functionally relevant relationships to those structures are all inside the advanced layout logic. So I expect when writing advanced layout logic, we'll often find it useful to have a schema for its insides. But having that internal schema be replicated/vendored in an application schema that's just consuming the advanced layout is redundant and/or at the wrong level for those purposes.
Having it in the application/consuming schema means some forms of validation could be done without having the advanced layout code. But I'm not at sure that's useful. I think mostly not. I can't really imagine an application that's going to be able to make useful partial progress or degraded but-still-useful functionality out of that.
(We could still execute the idea of "send a request for data that treats the advanced layout as transparent and reaches in and only pulls out a subset of its content"... but I'd expect to do that by sending a different schema that uses those snippets of advanced-layout internals. Different-schemas-on-same-data is sufficient to do that and seems clear.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An idea that's tempting to toy with is having a link to a separate schema that describes the internals of the ADL.
But, shucks, I think that's going to be a lunge we don't want to make. That would implies cross-schema linking, and that implies either naively pasting a CID hash -- which is a UX that I don't want to put on people -- or opening the door to requiring some sort a naming system that we... are not going to want to expand our scope to include just for this purpose.
So I think having a separate schema, and handling it in the same mechanism as we use to handle the rendevouz problem for finding the ADL code at all is probably the best we can do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this basically means the schema for the innards of the advanced layout would be "vendored in" (so to speak) the schema of the application using it.
Not necessarily. A single schema may contain the types that are required by the layout. Once complete, I would expect that the unixfsv2 schema contains the layout schema necessary for its advanced types except for HAMT which will be loaded separately.
I’d like to see this be as flexible as possible. If the schema defines the layout schemas for an advanced layout, great. If it doesn’t and instead just has a string identifier for it, great.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
has a string identifier for it
back to the identification problem - what is this identifier? earlier incarnations of this had:
advanced HashMap {
identifier "IPLD/HashMap/0"
}
but then we got all tangled and it resulted in #130 (which continues in this thread).
@warpfork wants to punt this whole identify-code-with-schema-definition thing down the road, and I agree that it would be good to postpone that as much as possible and hardwire things in the meantime or add mapping mechanisms to what we build: compile-thing foo.ipldsch --map HashMap:/path/to/hashmap/code or whatever does that mapping of the name that appears in the schema (HashMap) and the logic that runs it. So for now, the only signal we have is the term that comes after advanced.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
honestly, the name you give the advanced layout is already an identifier as far as libraries that parse the schema and generate a library from it are concerned. i wouldn’t put the identifier keyword in the critical path right now, that’s much more helpful down the line when we have some way to resolve globally unique names to implementations.
|
Various syntaxes: Alpha: Beta: Gamma: Delta: Delta's pretty terse and quite unambiguous and reuses the representation keyword... but has a totally different style for parsing the representation "block". |
|
(from conversation with @warpfork elsewhere) going to remove references to "root" (and "rootType") in here, that's for the implementation of the ADL to deal with. A user of the ADL simply needs to point to something that handles that logic. We're still punting on that linking mechanism, it'll likely be language specific for now (e.g. in Go when doing codegen you may provide a map that links these ADL names to importable packages). |
|
re |
schemas/advanced-layouts.md
Outdated
|
|
||
| ```ipldsch | ||
| advanced ChainedBytes { | ||
| root Chunk |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find it OK having the first keyword (i.e. advanced, struct) indicating what the key value pairs inside mean. Whether they are fixed parameters of definitions. I agree the readability is more important then writability as your write once, read often.
Though if i think about writing a schema, I would learn what the different keywords advanced and struct mean and what they are doing. I wouldn't need to remember which syntax goes with them, I just need to remember a single concept "key-value pairs are represented as key <space> value in ipldsch.
Though we already have a different syntax for unions and enums, which breaks the "single concept" argument.
When going for a different syntax = would make sense as it is an assignment. Though I'm not sure if I like this myself:
advanced ROT13 {
= root String
}
91c9bc1 to
378a2ad
Compare
|
This is ready for (hopefully final) review @warpfork @mikeal @vmx.
|
vmx
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't checked schemas/schema-schema.ipldsch.json though.
|
I gave it a cursory review and it LGTM. I don’t have the time right now to really drill into the schema-schema for a proper review but would prefer to defer to @warpfork on that anyway :) |
378a2ad to
c3dd708
Compare
|
I starting to play with the implementation of this in the parsers and have got to "how does an 'advanced' get represented in the JSON version when we assume all the top level items are 'type's?". schema-schema doesn't really have to touch this so it's not answered there. Does it work to have a reifies to: {
"schema": {
"Foo": {
"kind": "advanced"
},
"FooMap": {
"kind": "map",
"keyType": "String",
"valueType": "Int",
"representation": {
"advanced": {
"name": "Foo"
}
}
}
}
}one for you @warpfork |
|
To the question of how the IPLD 'AST' should be shaped: I'd like to float the idea that maybe it's time to add another top-level map to the 'AST'. We have: Instead, maybe we should have: (Also possibly rename Two maps would align well with the two major keywords in the DSL. It also has some of the same readability features: I can scroll down a JSON file very quickly and eyeball if it contains any ADLs or not just by keeping my eyes at a fixed offset from the left, which seems like a virtue. And perhaps most importantly, it's just plain keeping apples separated from oranges ( (The suggestion above was to replace the top-level union with a struct, but another option would be do mostly the same thing but adding a struct rather and keeping the single-member union too. I'm not sure I have a concluded opinion about this yet. The original idea of the single-member union was version hinting. A struct can do that too. It's just a question of how pleasant/obvious/redundant/excessive we regard this. Leaning towards thinking the struct is sufficient, but haven't given it a long think yet.) |
schemas/schema-schema.ipldsch
Outdated
| ## schema. | ||
| ## | ||
| type AdvancedRepresentation struct { | ||
| name String |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would recommend adding a type AdvancedLayoutRef string type and using that here (and then as well in other places it occurs).
Should show up roughly two places (which should make sense because it's sort of like a primary key in one of the positions, and a foreign key reference in the other, roughly speaking).
schemas/schema-schema.ipldsch
Outdated
| ## connection with the algorithm/logic behind this ADL. Future iterations may | ||
| ## formalize this connection by some other means. | ||
| ## | ||
| type AdvancedDataLayout struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be a naming bikeshed, and feel free to dismiss, but I might like to try sneaking one more word into the name of this one -- it took me a second to sort out whether these was the block with the details, or the much smaller thing thing that refers to it.
(I came up with ADLBlock and ADLRef in my scratchpad, but I'm not attached to those (nor even particularly pleased with that much acronym).)
schemas/schema-schema.ipldsch
Outdated
| ## formalize this connection by some other means. | ||
| ## | ||
| type AdvancedDataLayout struct { | ||
| name String |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This field might disappear entirely if the two maps proposal makes sense. Or if not, would become AdvancedLayoutRef per https://github.com/ipld/specs/pull/182/files#r324478807 .
ChangesOK, @warpfork I've done this in schema-schema: So you get StatusWhile working the practicalities of this through for various use-cases @warpfork has expressed concerned that the current incarnation may not quite work for some, including encryption where you want to compose and unpack types. This question arises out of what we allow Here's the way I'm thinking about it (still inspired by @Gozala's seeding of this topic). Consider the following "secret box" style message encoding (loosely inspired by SSB). How do you connect these things so that the data model list of Does this work? While the ImplementationsI'm going to go ahead and start working on js-ipld-schema and go-ipld-schema implementations of the current form of this PR but put a stderr WARNING when you use |
|
I have support for this current incarnation over at ipld/js-ipld-schema#12 The test fixture is interesting and its JSON form is worth consideration: https://github.com/ipld/js-ipld-schema/blob/rvagg/advanced/test/fixtures/bulk/advanced.yml The thing that stands out to me is: |
|
OK this is merged after some discussion on IRC. The JSON form is now Where |
|
In case anyone is interested in how this is being used, I’ve started to add support for advanced layouts to the JS API generation: https://github.com/mikeal/ipld-schema-gen/blob/master/index.js#L321 As you can see, when using an Advanced Layout the user must also pass in an implementation of the class, otherwise the schema parsing will throw. The cool thing about this approach is, you could generate an API for a schema, say a Struct, and then add properties to the class, or subclass it, and pass it into a subsequent API generation as an advanced layout. The composability ends up being pretty nice. |
Biting the bullet and coming up with a formal realisation of what "advanced layouts" might mean in terms of schema DSL.
I've opted to overload
representationhere, rather than the alternative genetics style syntax of<Foo>, I think it fits better but of course am happy to discuss, along with everything else in here! I know we all have slightly different things in our head for this stuff but we need to get some basics laid out at least.Will follow up with an additional idea for parameterisation that goes a bit too far beyond this most basic level and is likely to be in discussion-limbo for much longer. I don't want to pollute this one with too many additional ideas that might hold it up.