Skip to content
This repository has been archived by the owner on Dec 6, 2022. It is now read-only.

UnixFS Reboot #28

Closed
mikeal opened this issue Aug 8, 2019 · 1 comment
Closed

UnixFS Reboot #28

mikeal opened this issue Aug 8, 2019 · 1 comment

Comments

@mikeal
Copy link
Contributor

mikeal commented Aug 8, 2019

TLDR;

I’ve listed every feature I can find that has been considered for UnixFSv2 below. We discussed this in a short meeting (notes at the end of the document, recording posted soon) and the following action items surfaced:

  • @mikeal will kick off an issue in ipfs/spec to add file metadata to UnixFSv1
  • @mikeal will kick off an issue in this repo to define and scope a UnixFSv2* we can ship on a reasonable timeline.

UnixFS vNext Reboot

For some time we’ve been directing issues, feature requests, and the general future of UnixFS at “UnixFSv2.” Since the size and scope of this future version were never locked down this has delayed improvements to UnixFSv1 and has failed to tie UnixFSv2 to a clear deadline and set of functionality.

The goal of this document is to describe the various issues and features we’d like to see in UnixFS and link to the historical discussions about those features. We can then use this document to discuss and prioritize each feature and find the best path to development whether it be improvements to UnixFSv1, an incremental UnixFSv2 on dag-cbor, or a bigger future version built on features that are still being researched.

General Links

Development Targets

This section briefly describes the difficulties and limitations of different development strategies which should help inform how to best approach solving each issues.

Improvements to UnixFSv1

One problem with improving UnixFSv1 is that every generic improvement we make cannot be leveraged by other applications outside of IPFS. For instance, the work we’ve done for directory sharding lives in UnixFSv1 and can’t be used for other generic sharding problems. This means that solving fairly generic problems via UnixFSv1 is less valuable and eventually duplicated effort.

The other problem is dag-pb, best summarized by @stebalian. In short, it’s very rigid and adding fields and other features are more cumbersome than dag-cbor.

UnixFSv2 on dag-cbor soonish

This development route solves the dag-pb related issues and makes some of the generic improvements leveragable outside of IPFS.

However, there is one major problem remaining: upgradability. All new features and improvements must exist and be relatively consistent between two versions of IPFS manipulating the same data. There is no good way to ensure this without future IPLD features that are still in the research phase.

This route of development is most problematic when tackling the “Reproducible Hashes” issue.

It should also be noted that, given we know that there is future un-developed IPLD work that we want to leverage for UnixFS we have a high degree of certainty that if we were to release this version of UnixFSv2 that we would still at some point in the future have another major version migration as well.

The actual development time for this would not be very long. @mikeal has already written draft implementations of several iterations of the UnixFSv2 spec in JS. A much more important factor to consider is the upgrade cost to IPFS users.

UnixFSv2 on “IPLD Future”

Most of the big problems facing UnixFS are problems facing IPLD generally. These problems are all being actively worked on in the form of engineering and research and at some future date can be leveraged for an ideal, future-proof (upgradable), version of UnixFS. However, when this will be available can’t be predicted with a high level of certainty.

Issues

Standard File/Directory metadata

Links

Arbitrary file metadata

The ability for users to add their own optional metadata to files could be very useful. However, doing arbitrary anything in dag-pb is problematic.

Reproducible Hashing

Put simply, this is the ability for a given UnixFS implementation to look at an existing UnixFS encoded file and a file on a traditional file system and to reproduce the UnixFS encode identically.

This feature is relatively simple if there is no optionality and every version of IPFS is in perfect alignment. However, this is almost never the case.

IPFS has several options that can be used when encoding a file that alter the encode.

One path is to encode all options into the encoded version of the file. This would work as long as both versions of IPFS are in alignment, which means this can fail to produce identical hashes often in new upgrade scenarios. The only to way to completely guarantee reproducible hashing is to have a guarantee that the applications are also identical but this is very difficult without “IPLD Future.”

“Inline” files and directories

For small files and directories the benefits of de-duplication are often out-weighed by the cost of retrieving additional blocks.

There are also use cases, like websites, where it may be highly beneficial to inline certain data into the root block of the directory tree for faster early rendering.

Support for non-utf8 Filenames

Link

Seeking in large directories

It’s often necessary to paginate through large directories and the current implementations do not easily support this.

Question: Given that you can only paginate through a randomized ordering using the current sharding data structure, how useful would this be without ordered collections?

Symlinks

Link

Protobuf Performance

While I’ve heard people say on numerous occations that dag-pb performance is an issue (compared to dag-cbor) I can‘t find any good links or resources to what the real impact of this is.

Miscellaneous

Meeting Notes: August 8th 2019

  • performance things

    • issues with old unixfs hamt
      • batching issues
      • fans out at the bottom way to fast
      • really deep tree even in cases that's unnecessary
  • questions about external information we can feed into priorities

    • some other major user stories about high level apis have also come up...
      • it's hard to add directories to ipfs currently without re-scanning all files... incremental adds wanted
        • this is very much edge tooling and not unixfsv2 asks
    • "we took everything that was blocked on unixfsv2 off our q3 list"
      • doesn't mean we don't still want it, just choose to route elsewhere in other teams :)
      • ... additional comments about "these workaround are terrible"
  • generation style versioning?

  • more worried about changes to things like rabin chunking than anything else

    • moves (cancels dedup of) vast amounts of the data
    • changing metadata much lighter comparatively (still not free)
  • some kinds of data might be easier to maintain read of and maybe that's useful?

    • e.g. concatenating all the bytes in a [][]byte is easy, even if chunker to write it changed
  • worth mentioning that dir list order in most existing filesystems isn't... really specified.

    • you can't seek it -- there are not syscalls for that.

anyone wanna talk about attribs?

https://gist.github.com/warpfork/3948bd951e93c0f0b4e355d78b736f83

  • we should ping djdv on this as well
@rvagg
Copy link
Member

rvagg commented Dec 6, 2022

closing for archival

@rvagg rvagg closed this as completed Dec 6, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants