Skip to content
This repository has been archived by the owner on Jun 2, 2020. It is now read-only.

Research current IPFS docs situation and users’ learning/working experiences #52

Closed
Mr0grog opened this issue Mar 13, 2018 · 8 comments
Assignees

Comments

@Mr0grog
Copy link
Collaborator

Mr0grog commented Mar 13, 2018

We need to do some honest research to create a cohesive picture of things like:

  • What are the general problems people are running into trying to use IPFS?
  • What concepts are confusing, problematic, or just poorly understood and communicated?
  • What are people trying to do with IPFS and how?

…which should give us a clearer, more concrete picture of what problems we need to solve.

@Mr0grog Mr0grog self-assigned this Mar 13, 2018
@Mr0grog
Copy link
Collaborator Author

Mr0grog commented Mar 13, 2018

This is largely done. Research notes and output (than can be made public) can be browsed at:

This also includes:

  • An inventory of discuss.ipfs.io topics in Google sheets: https://docs.google.com/spreadsheets/d/1Z0Wy_jSS-42gXP9n1MA9JifZ7PDvNEVbpbfTFTVlz8M/edit

    This is a list of all topics on https://discuss.ipfs.io as of 2018-03-05. All the columns from “custom tags” onwards are tags I’ve added to help get a sense of what kinds of topics are convered. The “custom tags” tab lists each tag and the number of topics that use it. (“All tags” is the same, but includes the topic author’s tags as well.) I’ve only gotten through about half the topics, but focused on ones with the most discussion. @pmthomps has been a great help in this effort (thanks!!). Here's a quick histogram of those tags:

    Custom Tag Count
    troubleshooting 66 //////////////////////////////////////////////////////////////////
    basics 46 //////////////////////////////////////////////
    ipns 32 ////////////////////////////////
    use-cases 31 ///////////////////////////////
    explanation 22 //////////////////////
    api 20 ////////////////////
    features 20 ////////////////////
    announcement 20 ////////////////////
    go-ipfs 18 //////////////////
    tools 17 /////////////////
    cli 17 /////////////////
    js-ipfs 16 ////////////////
    status 15 ///////////////
    concepts 15 ///////////////
    privacy 12 ////////////
    networking 11 ///////////
    dynamic-data 11 ///////////
    ideas 11 ///////////
    pinning 11 ///////////
    performance 10 //////////
    filecoin 10 //////////
    security 9 /////////
    files 9 /////////
    community 8 ////////
    uploading 8 ////////
    cluster 8 ////////
    bug 8 ////////
    gateway 7 ///////
    windows 7 ///////
    private-networks 7 ///////
    pubsub 7 ///////
  • An inventory of all the repos on GitHub: https://docs.google.com/spreadsheets/d/1IDVAGfniyHCJLIxLc3y7K7YTOFGCtgwTVCZEojtNLlw

    This has one tab for each org (ipfs, ipld, libp2p, multiformats, ipfs-shipyard), but I was only able to fully get through ipfs:

    • Gray lines are deprecated repos.
    • Orange lines are repos of unknown status (can't tell if they are active, kinda sleepy, deprecated, intended to be moved, etc.) We might know the status of some (but surprisingly not all!), but community members and people new to IPFS definitely don’t.
    • Red lines are repos with significant problems. Usually this means either they lack a license (and so are not open source) or they are deprecated but not marked in any way as such (someone could easily waste a lot of time trying to use them before realizing they are a dead end).
    • Green is organizing/discussion, blue is docs/specs.
    • For a full key, see the “key” tab.

More notes to come here momentarily.

@Mr0grog
Copy link
Collaborator Author

Mr0grog commented Mar 13, 2018

Pulling in some bits from https://gateway.ipfs.io/ipns/ipfs-docs-research.robbrackett.com/html/General-Notes.html:

Common Questions

Where do I start? What’s what?

Where is the spec? What’s the current canonical version of it?

What’s the status of _____? (Piece of software, a spec, a concept [e.g. IPNS vs. IPRS], a document, a repo -- clear indicators and roadmaps are often missing. See above about spec!)

  • Should I depend on or trust it [yet]? (If not yet, when?)
  • Has it become deprecated? (Projects might appear solid, but there's an issue you might not see initially about deprecating the whole thing)

How can I help the network? Is there data I can store?

I followed “getting started” guide, what do I do now?

How do I know the progress or completion of a pin or a get?

I can't see my file on some machine, but I can in the Gateway (networking issues)

What ports do I need to open/what firewall settings do I need? How do I do that? (Related to the above)

What happens when I close my laptop (that the daemon is running on) for the day?

How do I share keys?

How do I change a file?

Why is the IPFS hash (CID) not just the hash of the file?

How do I choose JS vs. Go, how are they different?

Can I limit what I store or share?

Can I share data with only certain other people?

The conecpt of “uploading” content:

  • My content disappeared! Where did it go? It was on the Gateway yesterday.
  • How much will my content be replicated?

How do I remove a file?

How do I host more than one thing with IPNS?

How do I modify IPNS from more than one computer/node?

Can I keep my files (or their contents) secret? What's the story around privacy?

What is…

  • MFS
  • IPLD
  • Libp2p
  • (And how do they relate?)

Can other people see all my files? Can I see all theirs?

Can I pin a file later? i.e. can I add a pin to a queue, but not block while it is getting?

How do I know which pins are which?

[How] can I do dynamic data?

Where are some good examples of things built with IPFS?

Who's using IPFS?

Are there legal concerns?

What's the difference between http://localhost:8080/ipfs/<cid> and https://gateway.ipfs.io/ipfs/<cid>? Why do I care?

Concepts & Terms that Trip People Up

There are enough new concepts and IPFS is different enough from general server/client systems that we should probably have some kind of “concepts dictionary.” We don’t need to define all these terms there (e.g. CBOR is a pretty standard, thing though it may be obscure to many people), but we do need to be careful to always provide some reference for these terms wherever we use them, whether a link (e.g. http://cbor.io), a quick description (e.g. “CBOR (a binary version of JSON),” or something else.

  • DWeb
  • DHT
  • Graph
  • DAG
  • Merkle Tree/DAG
  • Merkle “Forest”
  • Hash
  • Gateway
  • Pinning
  • Transport
  • Swarm
  • Information Space
  • MFS
  • (Cryptographic) Signing
  • Peer
  • Peer ID
  • CRDT
  • Repo
  • DataStore
  • Node, Daemon
  • CID (v0, v1, …)
  • Path/Address
  • DNSLink
  • CBOR
  • Bitswap
  • Blocks
  • Bootstrap Node
  • Listening
  • Dialing
  • Announcing
  • Relay
  • GC/Garbage Collection

Common Comparisons

We don’t need to speak too actively toward these, but these are the sorts of things people are actively comparing IPFS to. We should have ready responses or some kind of “how does it compare to…” doc — there’s no reason to come up with a unique description of the differences/pros/cons every time someone asks.

  • Dat
  • Bittorrent
  • Webtorrent
  • Solid
  • Freenet
  • i2p
  • emu
  • kazaa
  • gun
  • Storj
  • Maidsafe
  • Blockchain-y filestore things!

@Mr0grog
Copy link
Collaborator Author

Mr0grog commented Mar 13, 2018

GitHub/Source-Related Issues

Licensing: for the most part, this is good, but:

  • Some repos are unlicensed or only refer to their license by name (e.g. “MIT”) without the actual license text or a link in the readme or source, which is effectively unlicensed. These repos are not actually open-source in any legal sense.

  • Discussion/organizing repos have inconsistent licensing. It seems like, at some point, there was a push to make them Creative Commons-licensed (which seems more reasonable for discussion than MIT), but there are quite a few that are MIT licensed instead.

Organization:

  • There are lots of repos, but a huge number of them are deprecated. Not all the deprecated ones are marked as such, which adds more confusion. Archiving these repos can help reduce noise (they don’t show up in most views), but at least making sure they are consistently marked as deprecated (and in a consistent way) can at least reduce confusion.

  • Empty READMEs and repo description lines: both of these are important cues for people to understand what a repo is and what it’s for, but are often empty. Additionally, some READMEs are empty in value even though they appear structured and have text — it looks like there was a push to validate READMEs with standard-readme at some point, and people reacted by filling in the required structure with “TODO” or placeholder content, which defeats the point of having the check and makes it easy for the READMEs to continue to fail in their purpose.

  • Naming conventions are often inconsistent. Sometimes there’s even more than one attempt at a convention for a given type of thing (e.g. *-ds-* vs. *-store-* for datastore implementations). This is a tough one to fix (renaming a project is all kinds of messy).

Contributing:

  • Contribution Instructions: There are two standard styles of listing contribution instructions:

  • Contributors: most (but not all) js-* repos list contributors in package.json. NO other projects list contributors anywhere, but instead rely on Git/GitHub metadata. (Worth noting: this misses out on contributions that do not show up as Git commits)

See also https://docs.google.com/spreadsheets/d/1IDVAGfniyHCJLIxLc3y7K7YTOFGCtgwTVCZEojtNLlw

@Mr0grog
Copy link
Collaborator Author

Mr0grog commented Mar 14, 2018

And my last bit here, before moving on to concrete actions (will be a separate issue)…

Use Cases

These use cases focus on what people perceive they are using IPFS for, rather than how they are using it. There are a lot of cross-cutting concerns in this view, but I think it’s important to think from both perspectives.

  • File Storage

    • Small scale — a distributed version of Dropbox
      • Concerns:
        • How do I make sure it gets pinned permanently somewhere else?
        • How do I remove things?
        • How do I update things?
        • What the heck is MFS?
        • I want a nicer UI that the CLI
    • Big Data/Archiving/Libraries
      • Concerns:
        • Performance
          • What should I expect?
          • How do I measure it?
          • What can I tweak/what are my tradeoffs?
          • How do I make changes to large datasets cheap/fast? (e.g. patching objects)
          • What should I consider around frequency of change?
        • Organization/Storage
          • How do I best manage data + metadata?
          • How do I need to organize my data (or not) to suit the grain of IPFS? (vs. other common archiving systems/tools)
            • Directory/disk layout
            • Naming
            • Indexing
            • File size/granularity
            • Custom schema/IPLD stuff?
            • Custom storage engine? (e.g. URLStore)
          • How do I optimize storage space usage?
            • In particular, how do I not duplicate data, but plenty of other questions here
          • How do I spread data across many locations (locally, e.g. servers or disks for huge datasets)
          • How do I ensure redundancy and safety?
          • How do I manage fixity checks/guarantees?
            • Many institions have policies around this; how can they be supported in IPFS?
        • Workflows
          • How do I fit this into my institutions workflow for ingesting, verifying, cleaning, categorizing, tagging, etc.
          • How do I manage changes/updates/fixes/versions?
        • Providing access
          • Should IPFS be merely about storing and sharing raw datasets? (vs. user interfaces for searching or browsing)
          • How do I control access? (Not all collections are fully open-access public data)
          • How do I not become an access portal for data my institution is not comfortable with? (or limit us to only publishing content we feel is explicitly within our purview?)
  • Websites -------> Complicated Webapps

    • This is a big spectrum, from simple static sites to complicated webapps with realtime data transfer
    • Concerns:
      • How do I make my stuff available so it doesn't go away when I turn off my computer?
        • Pinning services
        • Operate a node on a server
        • Managing IPNS republishing! Most people don't even know they need to do this!
      • How do I make updates? How do I make them timely?
      • How do I make content available at a sane human address?
        • Hashes are long
        • Hashes are hard to read, speak, and type (easy to make a typo, impossible to find)
        • DNSLink, other approaches?
        • I get URIs, why don't they work here? Why not ipfs://<cid>???
      • How do I do dynamic data? Users accounts? Documents? etc.
        • CRDTs
        • PubSub
        • IPNS
        • IPRS
        • (When do I use which for what?)
      • [How] can I do user accounts or federated identities
      • How do I manage permissions, ACLs, etc.
      • How do I block/allow specific kinds of content?
      • What are the security concerns?
      • How/what for do I use libp2p vs. ipfs tools/libraries? How/can I use both?
      • How do I make sure all the nodes using my app can see each other?
  • Local/native Apps

    • Pretty tightly related to the above (webapps) in knowledge requirements, expertise, area of concern
    • People often think of it differently, though — here, they are providing an interface/tool around IPFS, while with webapps, they exist within IPFS.
      • In reality, though, the complex end of webapps is exactly the same! The main difference is in how they are loaded.
      • How do we help people converge these ideas?
  • Data Modeling/Distributed Database

    • This mostly feeds into IPLD
    • Concerns:
      • IPLD spec doc out of date (but it's the only obvious resource)
      • Which parts (if any?) are stable?
      • How do I keep up with this concept/tech that is constantly in flux?
      • Linking to outside systems that are hash-based but aren't expressed as just hashes (e.g. Stellar, Scuttlebutt) is confusing and complicated
  • Working around censorship

    • Concerns:
      • What are the privacy issues? What do I need to do to be generally anonymous?
      • How do I make content secret?
      • How do I make public content well distributed?

Another formulation of the above focuses on how people are using IPFS (see note at top):

  • Files, Sharing, Communication (Using IPFS as app/service via CLI, browser plugin, or some other OS integration/UI)

    • This largely couples the “file storage” and “censorship” uses above
  • Distributed [Data] Application Development

    • This is about building any kind of application that sits on top of and uses IPFS or the communications tools in Libp2p.
    • Covers the “websites,” “webapps,” and “native apps” above, plus some aspects of “censorship” and “data modeling.”
  • Deep Protocol Development and Integration

    • This is mainly the “data modeling” case above, but the complicated end of “webapps” and “native apps” live here, too.
    • Focused on development and innovation in the protocols and models of IPFS, not just usage of its top-level APIs or file-system-like features.
    • Finding novel ways to store, access, and query data

@Mr0grog
Copy link
Collaborator Author

Mr0grog commented Mar 16, 2018

See #58 for plan of action based on this research.

@Mr0grog
Copy link
Collaborator Author

Mr0grog commented Mar 21, 2018

Updated research docs to add notes from an interview with Geoff Hayes at Compound.Finance.

Living link: https://gateway.ipfs.io/ipns/ipfs-docs-research.robbrackett.com/html/
This version: https://gateway.ipfs.io/ipfs/QmV5QdWAVbYCpcEspWu4tiCtVfdL9SESidLmnXLEJ82gdL/html/

@Mr0grog
Copy link
Collaborator Author

Mr0grog commented Aug 20, 2018

Updated research docs to add notes from an interview with Eric Tang at Livepeer.

Living link: https://ipfs.io/ipns/ipfs-docs-research.robbrackett.com/html/
Permalink: https://ipfs.io/ipfs/QmNj68gTzAs9QbfMKzMGurXP2WCmA6GTcKuUkWm4kBV1Qn/html/

@Mr0grog
Copy link
Collaborator Author

Mr0grog commented Aug 24, 2018

I’m closing this issue — it’s not actionable anymore, and it will still be linked and findable from the README in #114.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant