-
Notifications
You must be signed in to change notification settings - Fork 108
Conversation
I was so burned out last night I forgot to reference the current implementation https://github.com/ipld/js-unixfsv2/blob/master/src/data.js This is using a slightly different schema because |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've only higher level nitpicks.
I'm not sure if "Data" is a good name. What about "Binary" or "Binary Data".
Also please lowercase the filename to be consistent with the repo :)
I struggled a lot with the naming, not just the union key names but also pluralization. I must have changed some of these names 2 or 3 times throughout the coarse of development. As a result, I’m going to be a real push over on this, I’m basically willing to change the naming convention to whatever the first person says, because I’m not happy with my existing choices here and am happy to defer to the first person I hear ;) |
I have some mild observations at most, but overall assume you've thought about this and it represents progress... therefore, go ahead :) Observations:
|
Theoretically, this could just be one Advanced Layout. The reason I made it 3 is so that the It also makes the underlying data schemas and advanced layouts more usable independently. If I knew all my data had a low max chunking I could write a schema that just directly uses the ByteList. I couldn’t do that easily if all the read code was in a single place in the root Advanced Layout. |
Ok, pushed a bunch of changes that are now ready for review. I’m also going to need to go back and work on the implementation a bit to make sure none of these changes cause any problems. The current type generation code doesn’t have tuple representation either, so I’m going to need to implement that now ;) |
Wow, ok, way simpler now. One thing lead to another and I was able to remove a bunch of things. |
I'd like to declare exactly how/where we should return an error in the spec. If a the "part" part of a |
Agree, but “when we resolve the part” is a bit too ambiguous. We “resolve” the entire graph when we replicate it but because replicators work on generic IPLD graphs they won’t be able to produce errors during replication because they won’t understand this data structure or when to apply this schema. I think “when we read the part” is less ambiguous and doable, but as @ribasushi has mentioned before, this still leaves you open to an odd situation when reading a slice of the data that occurs after a miss-alignment of the size will not cause an error and will return the data slice based on the encoded size information rather than the raw part length. This is probably fine but if we’re going to document it as part of the spec it should be this detailed, because we’re effectively saying that “when reading a slice of the data, the encoded size information is relied upon and miss-alignments will only cause errors if the miss-aligned part is being read.” |
Fair enough. With respect to reading parts that are correctly aligned when some aren't, my mental model here is that the "bad" sections are like corrupt disk sectors. You don't get an error until you actually try to read them. |
See my next comment below |
By "fork", I assume you mean that reading sequentially yields different results than seeking? That's a bug in the current implementation, not something that has to be true. We can validate offsets as we read and return errors when we encounter misaligned sections. |
While trying to explain/illustrate my concern I think I may have arrived at a better formalization and something approaching a solution to the issue I keep bringing up. I need to let this "bake" over the weekend, and will update the thread. For the time being assume I didn't make the above comment 🤔 |
Is this design finalized? I have a need for something like this and am hoping to use this if it can be finalized. |
We’ve spent far too much time bike shedding on “naming things.” I’ve pushed a few changes to naming and am closing the loop on bike shedding the names. I’ve also fixed the issue @riba found (reference name wasn’t migrated when we changed the name of the data structure). I’m going to merge this in 24h unless there’s a strong objection. I’m also going to get the JS implementation in-line with the latest schema today. |
One tiny ask: can we add a line of prose that states clearly that Otherwise my green checkmark from ages ago still applies; this all looks reasonable to me. |
This comment is made against this version of the proposed spec: https://github.com/ipld/specs/blob/1345e737ecdeff3ee182a7458c4ce0667a43770b/data-structures/data.md
I posit that the "bikeshedding on naming" is a symptom of an underlying lack of shared context. There are two parts to that:
If the answer to 1. above is "no not really, this is definitely not the final version for streams, this is something we are playing with" - then this is fine, and my only request is that the spec reflects that ( If the answer however is "yeah, we need to nail this down and try not to change this anymore", we get to:
I think this covers everything that bothers me :) |
Yes, and this is something we've not got good answers to with schemas yet, we either need some kind of signal indicating top-level elements or a convention. For now I suggest we adopt the convention of putting top-level elements at the top of a schema doc and ordering things in reverse dependency order (the opposite of what you'd do in a C program).
This is supported by the current iteration just fine. There's no tie between Schemas and CBOR, it's just a matter of shape and type. So to do my exercise again to look at the kinds of shapes this Schema might allow:
And that's it I think. Possible drawbacks:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with this but would prefer two minor changes:
- Flip the ordering of the types, putting the top-level type at the top and the dependents in order below it.
- Add a note about
length
being advisory. Something like:length
is a cumulative value across potentially nested multi-block chunks; therefore it should be considered advisory-only by applications usingFlexibleByteLayout
as it cannot be fully validated until all nested blocks are available.
How would we go about enforcing this? The schema can be applied to any node. Whether or not the top level type is a discrete block is really a decision that libraries are going to have to make when they decide how they want to support it. It’s part of the implementation logic, which isn’t defined in the spec or the schema. I don’t see a clear problem with applying this schema to a node that is within block rather than the root of the block, but maybe I’m missing something. |
@rvagg pushed fixes for you comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Present state looks good, we can fix nits as they arise once more alignment takes place 🎉
You can use existing
The only opportunity you have to use a binary value without linking to a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚢
Right, and I agree that it's not enforcement that we need here. It's more about clarity about the expected entry point for the primary purpose of this schema. I expect we're going to find codegen and code interface reasons why we need more than just a convention (like the As it is, this document has the same title as the top-level type, so it's now clear in that way as well as the presented order, so that's good enough for me for now. We can work on this challenge over time. |
Thrilled that this is merged. Quick recap on the discussion about "entry point types" in the schemas and how I feel about that: Yeah, I think ya'll kinda covered all facets of it already.... 👍
🚢 |
If the entrypoint matters it should be in prose. It wasn't mentioned in the spec so I wasn't sure, I left thinking "it didn't matter, I can choose any". It would be nice to have some examples in different languages of FBL implementations and of course a reader perhaps as part of a ADL Utility library. If anyone has any ideas on what would be a good example use of this, please let me know as I'll probably work on this today. A few other cleanup items:
I'll submit a pr for these along with some prose on the entrypoint |
(I think we're probably oscillating around the same understanding here, but just in case a little more discussion and alternative examples help...) I think "I can choose any" is also still technically true, just defacto not so much in this case... because the "choose" is at the scale of "any ADL implementation that wants to be interoperable", so it's kinda just one big single incident of choose in practice. Example in another domain that's an interesting contrast: in the schema for Selectors, I introduced a But that sentence is in prose in the docs and is heavy in conditionals on purpose: I simultaneously consider it totally reasonable that if someone else is designing a larger schema that uses Selectors somewhere inside their messages, they might just use the |
* wip: Data * fix: typo * fix: rename to lowercase file * fix: various changes for code review * fix: factor out double offset indexing * fix: clearer naming * fix: chop down to only nested byte lists * feat: use kinded union * fix: remove reference to old advanced layout * fix: remove extra advanced layout * fix: remove old type references * fix: remove usage from schema, renames * fix: remove ADL definition, cleanup description * fix: remove unnecessary link in union * fix: latest name change * fix: moving to new directory * fix: bring union to top level * fix: remove option algorithm field * fix: reduce to single list tuple * fix: name lengthHint since the property is insecure * fix: going back to length * fix: move to new naming * fix: @rvagg’s review comments * fix: add document status, thanks @ribasushi
I got the new
js-unixfsv2
working usingipld-schema-gen
. It’s awesome, but now my brain is mush and I need a rest.I wanted to get this posted before I crash though. The structure I landed on for the layout of the byte data is very interesting and general purpose.
I figured out a way, using recursive unions, to create large DAG’s without encoding the layout algorithm into the data structure. This means that we can create a general read implementation for any layout we decide to implement in the future.
All you do is implement
read()
methods for 3 advanced layouts and those readers essentially call each other all the way through the DAG. So you can mix and match all the basic layout components on the write() side to your heart’s content. Since it’s a union, we also wouldn’t have trouble adding another one in the future, but the problem of “representing a linear list of bytes” isn’t so large that I anticipate needing another one.