Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design: Local vs Remote Storage, Whither The BlockService #7960

Open
hannahhoward opened this issue Mar 2, 2021 · 3 comments
Open

Design: Local vs Remote Storage, Whither The BlockService #7960

hannahhoward opened this issue Mar 2, 2021 · 3 comments
Labels
kind/feature A new feature need/triage Needs initial labeling and prioritization

Comments

@hannahhoward
Copy link
Contributor

hannahhoward commented Mar 2, 2021

This issue is designed to provoke discussion around design issues related local vs remote storag as it relates to efforts to write new interfaces for working with DAG structures in go-ipld-prime. It's mostly a documentation for context for the team working on this, but I think laying out these concepts may help others in the future in thinking about designing abstractions for IPFS.

Overview

In a sense, the entire IPFS network can be thought of as a single, distributed file storage system. In fact, this is the ideal goal we are always striving for. We should be able to imagine a single user of IPFS having access to the entirety of what is stored on the network as if it were local.

In practice, keeping a local copy of some data makes sense both for fast access and because the user may want take responsibility for providing data to the network. In the IPFS software, the local copy is the blockstore.

We might think of the blockstore as similar to the web browser cache. Just like a user doesn't specify if a web page is cached locally or must be fetched remotely when they type in a URL, at the command like or HTTP api level, from a user perspective, we should strive to never worry about where data is coming from.

At the same time, unlike a web browser cache, when we receive data or add it from the command line, we may also become a host for that data on the network. Writing to the local blockstore is therefore only one part of a write operation. The second part is to provide the written cids to the DHT.

So to illustrate the symmetry of local / remote on read vs write:

Operation Local Component Remote Component
Read Load from blockstore Read from network
Write Write to blockstore Provide To Network

Importantly, all new interfaces that work with DAGs, for reading or writing, must insure they perform both parts of a read and write (unless specifically told not to)

Level Of Abstraction, Designing New Services

Currently the key service that performs the local/remote abstraction is the BlockService. The block service abstracts at the level of Block read/writes. When I add a DAG, I call AddBlocks / AddBlock on the block service for all blocks in the DAG, which writes each block individual then calls out to Bitswap to provide each block individually. When I fetch a DAG, I call GetBlock / GetBlocks on the block service for all blocks in the DAG, which attempts to load the block locally then calls out to Bitswap to fetch the blocks from the network (which ultimately saves the blocks to the local store and provides them as well).

So we have:
Blockstore -> Local Block Read/Write
Bitswap -> Remote Block Read/Write
BlockService -> Local/Remote Block Read/Write Abstraction

The system is coherent and works as long as local and remote fetching or providing works at the block level. However, consider the things this locks us into if we care about DAGs:

  • How do we provide only the root of a DAG? (often something we want to do to make the DHT less crowded)
  • How would this work if we're fetching with Graphsync or a combination of graphsync and bitswap?

We've actually tried to move the providing to a different place and offer DAG roots only (see github.com/ipfs/go-ipfs-provider) but this is still experimental. And the tight integration of blockstore / bitswap / blockservice has made it hard to deliver on projects like this.

As we design our interfaces, I don't think we should try to change any of this immediately, but we should design them with an eye towards moving abstractions up in the future.

Hard questions in interface design

Currently we have go-fetcher and go-dagservice two in progress repos for working with IPLD prime data. Reads however often correspond with writes and i'm not 100% sure about keeping these seperate.

Here are some questions I am thinking about as we look at designing these two services:

  • How would I add a whole dag and specify provide only a root?
  • Assuming eventually IPLD prime finishes implementation of FocusTransform and WalkTransforming, how should I be able to access these interfaces?
    • Important question: should I be able to perform a transform (which potentially digs deep into the DAG) on a DAG I haven't fetched entirely locally? I think maybe I should be able to do this.
  • are reads and writes seperate enough that we can make them seperate interfaces?
@hannahhoward hannahhoward added need/triage Needs initial labeling and prioritization kind/feature A new feature labels Mar 2, 2021
@willscott willscott added this to In Progress in IPLD-Prime-in-IPFS Mar 2, 2021
@willscott
Copy link
Contributor

it would be great to get thoughts from @aschmahmann and @Stebalien on this, since it'll leave visible traces through a lot of the places in IPFS.

  • the 'transform over not-fully-local-dag use case' is far enough off in the integrastion of the various moving pieces that i wouldn't spend too many interface design cycles on it right now because i suspect we'll need to change things a couple more times before we have a clear idea of how that would actually be enabled.
  • i don't see much harm in single vs paired interfaces for reading and writing.

@aschmahmann
Copy link
Contributor

Will try and think about this a bit more, but here are some rough thoughts.

How would I add a whole dag and specify provide only a root?

The way I'm thinking this should work is that decisions on what to provide, how frequently, etc. should be able to live in a totally separate system of arbitrary complexity. To handle this efficiently the providing system somehow needs to be able to get triggered when new data is added to the system and given some context as to why it was added. Whether this trigger is the responsibility of the writer implementation or the thing that calls the writer seems debatable (especially since the caller could always do WriterWrapper(writer).Write(data)), I'm not sure which approach I think is best at the moment.

Assuming eventually IPLD prime finishes implementation of FocusTransform and WalkTransforming, how should I be able to access these interfaces? Important question: should I be able to perform a transform (which potentially digs deep into the DAG) on a DAG I haven't fetched entirely locally? I think maybe I should be able to do this.

Hard for me to say since I haven't really played with these interfaces much. However, I'm pretty sure that our interfaces should support operating on local and remote graphs. We should be able to operate on local-only data if we want to, but I don't think we should be boxed into that at all.

are reads and writes seperate enough that we can make them seperate interfaces?

I suspect so. For example, the Go standard library's io package has a whole slew of small interfaces (read, write, close, seek, etc.) and then combines them into higher level interfaces (readwritecloser, etc.). If someone only needs a reader for their application, but all of our code takes read+write interfaces then they'll just have to implement a write interface that does nothing and/or errors.

@warpfork
Copy link
Member

warpfork commented Mar 8, 2021

The way I'm thinking this should work is that decisions on what to provide, how frequently, etc. should be able to live in a totally separate system of arbitrary complexity.

+1, my vote and bets are here too. I don't know what this looks like. But I think there may end up being more than one system of decisions for providing, and different approaches might not even have a particularly uniform interface, so it would be good if they can be decorated over other core concepts rather than need to be tightly bound together.

To handle this efficiently the providing system needs to be [...] given some context as to why it was added

Yeah, I think the real knot at the heart of the matter is here. That context and "why", if we could ferret out exactly what form that information takes, would probably make a lot of the rest of the design decisions unfold more clearly almost immediately. (But, again, I bet there might be more than one possible form to this information, per different approaches.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature A new feature need/triage Needs initial labeling and prioritization
Projects
Development

No branches or pull requests

4 participants