Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update UnixFS specification #316

Open
Jorropo opened this issue Sep 1, 2022 · 5 comments
Open

Update UnixFS specification #316

Jorropo opened this issue Sep 1, 2022 · 5 comments
Assignees

Comments

@Jorropo
Copy link
Contributor

Jorropo commented Sep 1, 2022

We need:

  • A proper unixfs spec
    • how to calculate offsets
      • document difference between Tsize (total subdag size, raw data + envelopes) and raw file data (without IPFS metadata), and how to read /interpret each.
    • how to read & create HAMT directories
    • protobufs
  • Some testing fixtures.
@Jorropo Jorropo self-assigned this Sep 1, 2022
@lidel lidel changed the title Modern unixfs spec Update UnixFS specification Sep 1, 2022
@BigLep
Copy link
Contributor

BigLep commented Sep 1, 2022

Cc @rvagg @dignifiedquire that Kubo maintainers are going to take the first stab at getting this written in September. Feel free to watch or leave any notes.

@b5
Copy link
Contributor

b5 commented Sep 1, 2022

Our latest set of trials & tribulations from Iroh: n0-computer/iroh#198
and our running doc of papercuts: https://number-zero.notion.site/UnixFs-742339892d9c47d5b79f4f942e661bbf

@Jorropo
Copy link
Contributor Author

Jorropo commented Sep 1, 2022

@b5 about n0-computer/iroh#198 I think balanced tree is not in the spec. Or at least, if someone really care about it, it's a non authoritative part of the spec.

As long as you get your file sizes rights, and the merkle dag is correct (mean that a correctly build decoder successfully rebuild the original content). You can use whatever scheme you like.

@b5
Copy link
Contributor

b5 commented Sep 2, 2022

Sure, maybe not an authoritative part of the spec, but as Lidel pointed out in the implementers call yesterday, there are many things that would be good to suggest within spec documents that give implementers hints so the don't footgun themselves.

No one says the dag needs to be balanced. Everyone ends up implementing a balanced tree at some point.

@lidel
Copy link
Member

lidel commented Sep 5, 2022

Some additional asks, based on real world problems I've seen:

  • make it clear that a chunking strategy and the way DAG is constructed / balanced is up to implementation, but..
    • "notes for implementers" section should give an example of basic implementation (size-based chunker, balanced tree) so people who are in a rush and don't care about performance end up with a sane default and don't reinvent a square wheel
  • make it clear what Tsize means in context of UnixFS and non-UnixFS sub-DAGs (total size, including all IPFS/IPLD envelopes)
    • add "note for implementers" with how reading byte range of a bug file should be done
      • this is ridiculously important, some of our own go libraries did not use it correctly and used Tsize instead of raw file size

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: 🥞 Todo
Development

No branches or pull requests

4 participants