Skip to content
This repository has been archived by the owner on Feb 12, 2024. It is now read-only.

Sharing IPFS node across browsing contexts (tabs) on same origin #3022

Closed
Gozala opened this issue May 5, 2020 · 36 comments · Fixed by #3081
Closed

Sharing IPFS node across browsing contexts (tabs) on same origin #3022

Gozala opened this issue May 5, 2020 · 36 comments · Fixed by #3081
Assignees

Comments

@Gozala
Copy link
Contributor

Gozala commented May 5, 2020

cc ipfs/in-web-browsers#158

When site with origin e.g. foo.com uses JS-IPFS, new node is bootstrapped for every browser context (tab, iframe) this implies:

  • All the network connections need to be reestablished.
  • General memory / CPU overhead is multiplied by number of browsing contexts.
  • Create a possibility for race condition (two nodes read / write data concurrently).

There is an opportunity to improve this by offloading most of JS-IPFS work into SharedWorker in browsers where API is available and fallback to a dedicated Worker elsewhere (basically Safari).

We can explore ServiceWorker as a better fallback mechanism in the future. Unilke dedicated worker they can be shared, but they also come with enough challenges to justify keeping it out of scope initially.

However use of SharedWorker implies some trade-offs:

Worker script distribution

Worker needs to be instantiated with a separate JS script. Which leaves us with the following options:

  1. Create a separate "worker" dist as opt-in for shared node. That still has two more options:
    1. Spit out two scripts on for main context and other for worker context.
    2. Use something like a worker-loader to bundle worker script as a blob embedded in the the main script.
  2. Self load main script into worker and context detect to choose between main and worker operational modes.

Lack of WebRTC in workers

This is a major problem as WebRTC is largely replacing WebSocket transport and requires more research. Pragmatic approach could be to tunnel WebRTC traffic into a worker, however that would still imply reestablishing existing connections which would be a major drawback.

@Gozala Gozala added the need/triage Needs initial labeling and prioritization label May 5, 2020
@Gozala
Copy link
Contributor Author

Gozala commented May 6, 2020

Another thing I'm recognizing now is that two different browsing contexts (tabs) could start IPFS node with different configurations. Which needs to be handled in some way.

@Gozala
Copy link
Contributor Author

Gozala commented May 6, 2020

I have started hacking on the ipfs-message-port-client that in the spirit is very similar to ipfs-http-client except it wraps around MessagePort.

However I'm realizing that API surface is pretty large, which reminded me of js-ipfs-lite that is a lot slimmer but also well type typed (TS ❤️). This combined with the configuration point above got me wondering if this maybe a good opportunity to consider stripping some of the convenience stuff which could be pulled in separately (maybe even opportunity to converge code bases ?)

@Gozala
Copy link
Contributor Author

Gozala commented May 6, 2020

I had some discussion with @hugomrdias last night, about aegir not yet supporting multiple webpack configs where he pointed out shiny package.json exports which could be a better solution going forward.

After spending all day yesterday running into walls, I was desperate for some progress so I have come up with a following plan and would really love your feedback @achingbrain

  • Create ipfs-message-port-client sub-package (with it's own .aegir.js) that is the spirit of ipfs-http-client except it wraps around MessagePort. Idea is that it will be loaded in the main thread and will RPC with IPFS node through the MessagePort.
  • Create ipfs-message-port-host sub-package (with it's own .aegir.js) that wraps around MessagePort and starts IPFS node and executes incoming commands.
  • I think you wanted to keep it all in the same package and just spit out different dists, but I hope it's reasonable to stat those as separate sub-packages first and then figure out how include it into js-ipfs afterwards. Especially since it would involve changes to aegir and I would much rather defer it until we know exactly what we want.
  • I would like to reduce scope as much a possible for MVP and iterate on features incrementally. Hence the Sharing IPFS node across browsing contexts (tabs) on same origin #3022 (comment) @hugomrdias also mentioned efforts to strip everything to essentials and Dependency injection patterns for reducing bundle size ipld/js-block#10. That coupled with the fact that nodes are highly configurable challenges general idea of "node sharing" I would really really like to define what the bare minimum API is and make it an MVP.
  • As I started work on ipfs-message-port-client (Gozala@f2994c7) I've attempted to use TS JSDoc but it seems that this does not yet support class properties Support JSDoc @property microsoft/TypeScript#15715 nor type checker seems to be much of help. This got me wondering if using ts here would be acceptable ? I know it's not the first time I'm bringing up type checker, but it does really help with large code bases.
  • APIs like current ipfs.add are somewhat tricky due to highly polymorphic API and streaming.
    • e.g. normalise-input takes almost anything and turning it into AsyncIterable<{ path, content: AsyncIterable<Buffer> }> which adds ton of complexity especially when things need to be moved across the wire. It also makes it easy to pass sub-optimal input. I think it would be nice if bare minimum API was optimized for better performance. That is to suggest used ArrayBuffer as they can be transferred without encoding / decoding or copying across the threads. Support for more exotic inputs could be than added via helper libs that users could choose to use.
    • It is not exactly yet clear to me how important support for streaming is. If absolutely necessary we could map AsyncIterators to MessagePorts and transfer those or alternatively do multiplexing over the same MessagePort but if it's mostly for convenience I think it would be more justified to do it as another helper lib. e.g. from what I can tell ipfs.files.write could effectively be used for writing out of stream input which again could reduce complexity of the RPC protocol and could be implemented only on the client end.

@lidel
Copy link
Member

lidel commented May 12, 2020

  • I like the idea of ipfs-message-port-* packages.

    • this work could help us improve modularity of js-ipfs itself, dogfooding the "core library" in packages/ipfs and use parts of it to assemble runtime-optimized distribution in packages/ipfs-web etc
      • would be good to get early feedback from @achingbrain about this direction ;^)
    • (loose thought) if those are generic enough we could reuse them for window.ipfs interface in IPFS Companion (I had some prior discussions about something like that with @alanshaw, and we always wanted similar sandboxing and limited API surface for window.ipfs anyway)
  • Agree the API surface is too big and IMO for MVP.

    • I suggest we aim at the most basic file and DAG operations, punting things like Peer Identity, MFS/key "namespacing" etc for later phase.
      • For extremely lean MVP supported APIs would be: ipfs.add, ipfs.get, ipfs.cat, ipfs.dag.* (unsure about unixfs-specific ipfs.object.*)
    • This would also enable us to test those browser optimizations for reading big files during add
  • TS JSDoc is something @hugomrdias experimented with before, not sure what is the status tho. Perhaps drop it for now if it gets too distracting.

  • Note that ipfs files write produces different DAGs than ipfs add (Upload via WebUI (ipfs.files.write --create) produces different hash than CLI (ipfs.add) ipfs-webui#676)

@Gozala
Copy link
Contributor Author

Gozala commented May 13, 2020

punting things like Peer Identity, MFS/key "namespacing" etc for later phase.

I think those should not matter for same-origin node sharing unless I'm overlooking something.

For extremely lean MVP supported APIs would be: ipfs.add, ipfs.get, ipfs.cat

Unfortunately those are the most complicated and problematic ones, getting ipfs.files.* would be a lot easier. I wrote down notes explaining why https://gozala.io/pl-notes/JSCoffee-2020-05-13 and hope to discuss in more details during the call.

ipfs.dag.* (unsure about unixfs-specific ipfs.object.*)

Complexity is mostly in having jQuery style APIs where functions take pretty much anything. ipfs.dag.* and ipfs.object.* don't seem to have pretty straight forward API so it should be a big deal.

This would also enable us to test those browser optimizations for reading big files during add

Could please explain what you mean here, I'm not sure I follow.

TS JSDoc is something @hugomrdias experimented with before, not sure what is the status tho. Perhaps drop it for now if it gets too distracting.

He actually filled me in on this today, very exciting stuff! We also took same approach so things should merge nicely. I don't think it's distraction it make navigating large codebase like js-ipfs a lot more manageable.

Note that ipfs files write produces different DAGs than ipfs add (ipfs/ipfs-webui#676)

I don't think that matters because mostly client forwards ArrayBuffers / Blobs to the worker thread which than runs corresponding API calls.

@carsonfarmer
Copy link

This is exciting! I'm looking forward to contributing here more as time permits. In the mean time, the minimal API alluded to above is very close/similar to ipfs-lite. There is a (currently on the back burner) TS implementing here: https://github.com/textileio/js-ipfs-lite. Just for reference!

@Gozala
Copy link
Contributor Author

Gozala commented May 19, 2020

I have outlined some high level questions I would love to get answered by surveying our community / users. I think it would help to ensure that effort is focused on user needs (in browser context) as opposed to API compatibility with IPFS in runtimes which have different constraints.

  1. What are the essential configuration options for JS-IPFS in browsers ?
    It is important to distinguish it from incidental, where it is means to an end. In other words, it is worth pondering if existing use cases could be addressed through different means.

  2. What subset of JS-IPFS API is used in browsers ?
    It is worth considering this, so we can better prioritize efforts and optimize experience of what is important as opposed to aiming for API compatibility.

  3. Why users are choosing to use JS-IPFS in main thread over worker thread ?
    Given that effort effectively moves JS-IPFS to a worker thread, we need to understand why users chose to use JS-IPFS in main thread in first place. (Validate working assumption that doing it on main thread was just easier). It is also worth considering how important high level JS-IPFS API in main thread is (It may be that users would be perfectly happy with using JS-IPFS in worker thread)

  4. How does user priorities fall in regards to convenient API vs bare-bone API ?
    JS-IPFS prioritizes convenience. At the same time >60% of code written for this effort deals with accepting variadic inputs, effectively encoding / decoding streams etc… It is worth checking our assumptions about users needs / wants to avoid bloat that may be irrelevant in practice

  5. What are other important things (not on the list) that JS-IPFS does or you wish it did for your browser use case ?
    E.g Textile chose to invest into js-ipfs-lite & which probably motivated by mismatch in what JS-IPFS provides vs what team needed. By asking we hope to identify other high priority needs / wants for JS-IPFS in browser that we may have overlooked otherwise.

If you spot something overlooked or missed please let me know.

@Gozala
Copy link
Contributor Author

Gozala commented May 19, 2020

@carsonfarmer thanks for joining the thread and prompt to look closer to the js-ipfs-lite API. I have few questions remarks based on what I see there, which I would like to understand:

  1. Library uses modular architecture, which is great way to manage complexity. However I would like to understand how important that is for textile. Reason is that plug-able components is at incompatible with idea of sharing node across multiple browsing contexts (tabs, applications) because they can end up with different setups which would prevent sharing.

    Argument could be made towards say sharing networking stack and allowing plugging stores. However that also has complications. For one tabs (on different origins) will not be able to access data that in the disk without tunneling it over the network. From UX perspective it also means there is no canonical place for seeing all the data.

    That being said, I do recognize value say in e.g in-memory store which is not persisted, which could be achieved differently e.g. DAG staging staging area (Something along the lines of Discuss: improving ipfs.add  #3033 (comment)) which could address use case without complicating node sharing.

  2. I see node Buffer is used as canonical representation for data. Is that motivated by a convenience or some other reasons ?

    Current approach attempts to avoid bringing node baggage into browser, that is ArrayBuffer|ArrayBufferView captures all binary types (including node Buffer it being subclass of Uint8Array) and given that all native browser APIs do return those it appears that there is no good reason to demand Buffer in any of the user facing APIs (removing it from internals is more complicated).

@carsonfarmer
Copy link

I would say that, in the interest of sharing the node across multiple browsing contexts, a less modular design might be ideal. Additionally, while there are other considerations for the modular design (allows much smaller packages and dynamic imports if done properly), the bundle size savings are so large that the benefits from reuse would certainly be larger than the benefits of smaller one-off bundles. So to answer the question: not that important for use (though I would still advocate for a modular design in terms of its development).

As far as Buffer goes, that is purely convenience. All the existing tooling expects/supports it. But we (Textile) internally have been trying to move most of our code and APIs to Uint8Array. So happy to support that push here as well.

@oed
Copy link
Contributor

oed commented May 27, 2020

Thanks for the discussion @Gozala. Here are details as we discussed :)

  1. What are the essential configuration options for JS-IPFS in browsers ?

The only configuration option that we rely on is passing a rendezvous server as a swarm address. Previously this was the only way to connect to a rendezvous server, but a way to do this has been added: #2508 (comment)
Other than that we disable preload and bootstrap, in order to create the instance faster. This should however be less of a problem with the shared instance.

  1. What subset of JS-IPFS API is used in browsers ?'

We use the following:

  • ipfs.add
  • ipfs.cat
  • ipfs.dag
  • ipfs.pubsub
  • ipfs.libp2p

I believe that's all, but I might be missing something. The main reason we're using ipfs.libp2p is because of pubsub room. Important to note here is that this is only supported in js-ipfs and not in ipfs-http-client. This should probably be added to the readme of pubsub room as well in order to not confuse people. cc @achingbrain

Another thing to note is that we don't use garbage collection right now. This was only recently introduced into js-ipfs. Instead we assume that all data that we add to ipfs will just remain in storage, and sync it from the network if it's not there.

  1. Why users are choosing to use JS-IPFS in main thread over worker thread ?

We are currently in a situation where our software package (3box) is run on multiple tabs across multiple domains. There are two main things that could be solved using worker threads:

  • Multiple across tabs causes connectivity issues.
    If you have two instances of ipfs on the same doamin but in multiple tabs they will get the same PeerId. This causes issues when connecting to remote nodes (especially with pubsub) where only one of the nodes becomes fully connected. We haven't figured out exactly what causes this, instead our workaround involves creating a random PeerId every time a new instance is created.
  • Sharing data across origins.
    We want to be able to share data across origins. Currently we rely on the data being available in the network, but this is less then ideal. Instead it would be nice if nodes on multiple origins could share block storage. We are currently working on a solution for this by putting the blockstore api into an iframe.
  1. How does user priorities fall in regards to convenient API vs bare-bone API ?

Don't have strong opinions here. On the security on things like the pinning api on multiple origins I tend to lean towards the side of usability now rather than origin specific pins. This gives us the chance to experiment more.

  1. What are other important things (not on the list) that JS-IPFS does or you wish it did for your browser use case ?

The question of codecs is important. Initially a requirement from our side would be that we can provide a custom codec to the worker. We are working on codecs for signed and encrypted data, which we will start using as soon as it's ready.

@Gozala
Copy link
Contributor Author

Gozala commented May 28, 2020

Thanks @oed for taking time to discuss this with me & for publishing more details here. I would like to followup on few points:

I believe that's all, but I might be missing something. The main reason we're using ipfs.libp2p is because of pubsub room. Important to note here is that this is only supported in js-ipfs and not in ipfs-http-client. This should probably be added to the readme of pubsub room as well in order to not confuse people. cc @achingbrain

This appears to be a know issue ipfs-shipyard/ipfs-pubsub-room#28 so maybe we should revive conversation about this in that thread.

ipfs.libp2p itself has pretty large API surface and I think it would really help if we could:

  1. Figure out what does 3box needs vs what it's currently using. Your comment leads me to believe that ipfs.pubsub lacks some of the functionality from ipfs-pubsub-room and if so maybe it would be best to asses how this functionality can provided instead of trying to cover whole ipfs.libp2p API surface. From what I recall during our discussion, missing functionality was to ability to send messages to peers directly. Is there anything else ?
  2. Figure out how and if missing functionality could be made possible over ipfs-http-client.

We want to be able to share data across origins. Currently we rely on the data being available in the network, but this is less then ideal. Instead it would be nice if nodes on multiple origins could share block storage. We are currently working on a solution for this by putting the blockstore api into an iframe.

I put some more thought into this after our conversation. And came to conclusion that in first iteration it would be best to do following:

  1. Create ipfs-message-port-client library that can be connected to ipfs-message-port-server over MessagePort instance.
  2. Create ipfs-message-port-server library that can take IPFS node instance provide it remotely to the ipfs-message-port-client.

This should keep pretty much same flexibility as IPFS has today (other than WebRTC) and overcome:

  1. Problems you've encountered with cross tabs connectivity (I'm assuming message-port-server + js-ipfs running insideSharedWorker).
  2. This would defer solving custom ipld codec loading until cross origin sharing is here.
  3. This I think would still allow doing blockstore sharing you've described with minimal changes. You'd have to pass message port providing remote block-storage to the worker, but from what I recall you already do that so it would just be just one more postMessage.
  4. This would in fact allow 3box to even share server+ipfs-node across origins and just pass message ports more or less how I imagine you do that with block-storage (I'm assuming you use certain origin for actual store)

@Gozala
Copy link
Contributor Author

Gozala commented May 29, 2020

@oed You also mentioned some inconvenience with using certain API(s) as I recall it was the fact of having to iterate over things. I'm afraid I lost those details, could you please share specifics. It would be a real help in informing our decisions.

@oed
Copy link
Contributor

oed commented May 29, 2020

From what I recall during our discussion, missing functionality was to ability to send messages to peers directly. Is there anything else ?

I believe that's it but I haven't looked too deeply at the pubsub-room implementation.

Don't have a strong opinion on the message-port stuff.

could you please share specifics.

Previously we did:

const cid = (await ipfs.add(buffer)).hash

Now we have to do:

const cid = (await (await ipfs.add(buffer)).next()).value.cid

Perhaps there is a way to do this using the DAG api? I've been unable to add the same buffer and get the same CID using dag.put and various configs however .

@Gozala
Copy link
Contributor Author

Gozala commented Jun 3, 2020

I have encountered obstacles when running ipfs.dag API test on my message-port based implementation. That is because ipfs.dag.put can take either plain data structure e.g. {foo: "bar"} or something like DAGNode from ipld-dag-pb

const node = new DAGNode(data)
let cid = await ipfs.dag.put(node, { format: 'dag-pb', hashAlg: 'sha2-256' })

And I'm guessing raw buffers as well.

By looking at what ipfs-http-client does it appears to me that there is no generic API to encode / serialize node:

if (options.format === 'dag-cbor') {
serialized = dagCBOR.util.serialize(dagNode)
} else if (options.format === 'dag-pb') {
serialized = dagNode.serialize()
} else {
// FIXME Hopefully already serialized...can we use IPLD to serialise instead?
serialized = dagNode
}

This is problematic as main thread needs to pass dagNode somehow to the worker thread (ideally without having to copy things). There for some generic way to serialize nodes seems necessary.

It is also worth taking this moment to consider following:

  1. It is much more efficient to transfer array buffer / typed arrays across threads than it is to copy those. However with current API it is impossible to telly when that is ok and when it is not (that is because transferred buffer will zero out one in the main thread).
  2. It would make sense to have a dag API that encourages doing the right thing by accepting the right arguments. I am not certain what the right argument in this context is, however I'm guessing it's something more specific.
  3. I am wondering if generic IPLDNode could have something like toBlob():Blob method which:
    1. Could be assembled out of pieces of data without loading it into memory.
    2. Has random access reads also without loading whole thing into memory.
    3. Can be transferred across threads without loading into memory.
    4. It can also be attached some metadata via type field.
    5. Conveniently it could also be given blob: url.

@oed
Copy link
Contributor

oed commented Jun 3, 2020

I believe the IPLD team is working on something along the lines of 3: https://github.com/multiformats/js-multiformats

@Gozala
Copy link
Contributor Author

Gozala commented Jun 4, 2020

@haadcode I would love to get some input from you as well. I am also happy to jump on a call to walk through it more closely.

@Gozala
Copy link
Contributor Author

Gozala commented Jun 9, 2020

Had a discussion with @achingbrain @lidel @mikeal @autonome today where I proposed to loosen up backwards compatibility constraint for this work in favor of:

  1. Building upon newer IPLD stack.
  2. Reducing scope / complexity of the effort.
  3. Freedom to design for web workers (as opposed to shoehorn current design onto web workers)

Hope is that this would provide following benefits:

  1. Unique constraints could be taken into account in new IPLD stack.
  2. This would allow us to prototype and validate new stack without disturbing all the JS-IPFS users. Once we're (this includes stakeholders like 3Box, Textile and others) happy with the result we can work towards integrating this back into JS-IPFS proper.

We have discussed idea of using js-ipfs-lite as a baseline. But collectively we've decided that I would instead propose specific API (inspired by ipfs-lite) which can we can discuss further to reach the consensus (feedback from the community is welcome).

@Gozala
Copy link
Contributor Author

Gozala commented Jun 9, 2020

import Block from '@ipld/block'
import CID from 'cids'


// This in nutshell is `Promise<A>` but with `X` type parameter added to
// communicate errors that may occur. Feel free to ignore the interface definition
// itself.
interface Future<X, A> {  
  // Maps resolve 
  then<B>(onfulfilled: (value: A) => B | Future<X, B>, onrejected: (error:X) => B | Future<X, B>):Future<X, B>
  // Maps error value
  then<Y>(onfulfilled: (value: A) => Future<Y, A>, onrejected: (error:X) => Future<Y, A>):Future<Y, A>
  // Maps both error & value
  then<Y, B>(onfulfilled: (value: A) => B | Future<Y, B>, onrejected: (error:X) => B | Future<Y, B>):Future<Y, B>
  
  catch<B>(onrejected?: (error:X) => B | Future<never, B>): Future<never, B>
}


interface IPFSService extends CoreService {
  dag:DAGService
  blob:BlobService
}


interface DAGService {
  // Note: path isn't optional higher level API wrap this to add deal with
  // param skipping etc. Get returns a `Block` if block can't be (fully)
  // resolved that is a `ResolveError` which will contain `remainderPath` and
  // `Block` that was resolved during this get.
  get(cid:CID, path:string, options?:GetOptions):Future<GetError, Block>
  // Note: Block knows about format / hashAlg so those options are gone.
  put(block: Block, options?: PutOptions):Future<PutError, CID>
  // Note: path isn't optional higher level API can deal with that.
  // Instead of async iterator you can request ranges so that:
  // 1. Amount of coordination between client & server can be adjusted.
  // 2. Avoids dangling references across client / server boundries.
  tree(cid: CID, path:string, options?:EnumerateOptions): Future<TreeError, string[]> 
}

interface BlobService {
  // Note: This is instead of `ipfs.add` whether it should be called `add`
  // instead is up for the debate. However it is deliberately proposed with a
  // different name because:
  // 1. It just produces CID and unwrapping async iterator to get it is
  //    inconvinience that had benig pointed out.
  // 2. It is personal preference to have a function that always works as
  //    opposed to one that fails due to unsupported input inputs most of the
  //    time.
  // 3. In browser this is almost always what you want to do as opposed to
  //    stream inputs.
  put(blob:Blob, options?:PutBlobOptions):Future<PutBlobError, CID>
  // Note: This is in place of `ipfs.cat` whether it should be called `cat`
  // instead is up for the debate. It is deliberately proposed with a different
  // name because it returns a `Blob` instead of `AsyncIterable<Buffer>`. That
  // is because blobs are well supported in browsers can be read in different
  // ways and avoids server/client coordination. Furthermore higher level API
  // providing API compat could be added on top which would create
  // `AsyncIterable`s all on the client that performs range reads with this
  // API under the hood.
  get(ipfsPath:string, options?:GetBlobOptions):Future<GetBlobError, Blob>
}

interface AbortOptions {
  timeout?: number
  signal?: AbortSignal
}

interface GetOptions extends AbortOptions {
  localResolve?: boolean
}

interface PutOptions extends AbortOptions {
  pin?: boolean
}

interface EnumerateOptions extends AbortOptions {
  recursive?: boolean
  // will skip `offset` entries if provided
  offset?: number
  // Will include at most `limit` number of entries
  limit?: number
}

interface PutBlobOptions extends AbortOptions {
  chunker?:string,
  cidVersion?: 0|1,
  enableShardingExperiment?: boolean,
  hashAlg?: string,
  onlyHash?: boolean
  pin?: boolean
  rawLeaves?: boolean
  shardSplitThreshold?: boolean
  trickle?: boolean
  wrapWithDirectory?: boolean
}

interface GetBlobOptions extends AbortOptions {
  offset?: number
  length?: number
}

type PutBlobError =
  | WriteError

type GetBlobError =
  | NotFound
  | ReadError

type GetError =
  | ResolveFailure
  | NotFound
  | DecodeError
  | ReadError
  | AbortError
  | TimeoutError

type PutError =
  | EncodeError
  | WriteError
  | AbortError
  | TimeoutError

type EnumerationError =
  | DecodeError
  | ResolveFailure
  | NotFound
  | ReadError
  | AbortError
  | TimeoutError

interface TreeError extends Error {
  error: EnumerationError
  // Entries that were succesfully enumerated before encountering error
  value: string[]
}

interface ResolveFailure extends Error {
  remainderPath: string
  value: Block
}

interface NotFound extends Error {}
interface NoSuchPath extends Error {}
interface EncodeError extends Error {}
interface DecodeError extends Error {}
interface WriteError extends Error {}
interface ReadError extends Error {}
interface PartialFound extends Error {}
interface AbortError extends Error {}
interface TimeoutError extends Error {}

@Gozala
Copy link
Contributor Author

Gozala commented Jun 9, 2020

Few extra notes I'd like to make:

  1. It feels like dag.get / dag.tree could just take ipldPath:sring argument instead of CID + path. Since CIDs get serialized over the wire anyway.
  2. It might be worth considering how IPLD Selectors implicate over dag interface, especially since it could reduce amount of chatter between server and client across message channel.
  3. There had being separate discussion about alternative to ipfs.add (See: Discuss: improving ipfs.add  #3033) which attempts to do everything that ipfs.add with added message channel constraints. However, turns out both 3Box and Textile just add a file to IPFS and want CID back (in browsers) which is why blob API is proposed as a very effective and simple API to do just that.
  4. This proposal does not (yet) attempts to provide API to differentiate between transfer vs copy underlying ArrayBuffers for DAG blocks. Mostly because I suspect it might be best make it part of the Block API.

@Gozala
Copy link
Contributor Author

Gozala commented Jun 9, 2020

I want to elaborate bit more on my "transfer vs copy" point. Consider following example that assumes current API:

const dagNode = {
  version: 1,
  message: new TextEncoder().encode("Hello World")
}

const cid = await ipfs.dag.put(dagNode)

As things stand today now following assertion will hold:

TextDecoder().decode(dagNode.message) == "Hello World"

However if things are moved across message channel to the workers that is not going to be the case if underlying ArrayBuffers are transferred (which is most efficient way) instead of being copied. Instead following will be true:

dagNode.message.byteLength === 0

This is obviously simplified example. If you imagine that dagNode is assembled in one part of the program with bytes received from other part of the program with multiple references to it, transfer (and effectively emptying them) may lead to bugs in user space.

That is why I think it is important to have an API that:

  1. Clearly differentiates between transfer & copy.
  2. Does not perform copy unnecessarily.
  3. Provides better support for other JS primitives like Blobs that copy the references to read-only memory segment and not the memory itself.

@Gozala
Copy link
Contributor Author

Gozala commented Jun 9, 2020

@carsonfarmer @oed would good to get your eyes on #3022 (comment) as well.

@oed
Copy link
Contributor

oed commented Jun 10, 2020

To me #3022 (comment) makes sense as an interface to ipfs data @Gozala
A few questions that arise:

  1. Is the idea to have another abstraction above the DAGService api that allows you to use the codec that you need? (since you are putting raw blocks)
  2. Does this only cover a shared storage layer? (i.e. no shared networking)
  3. If there is shared networking, how can I interact with the libp2p node? (connect to nodes etc)
  4. What about the pin api? Looks like you can only pin with the dag.put right now?
  5. Can the blob api be implemented on top of the dag api? (asked this before)

@Gozala
Copy link
Contributor Author

Gozala commented Jun 10, 2020

To me #3022 (comment) makes sense as an interface to ipfs data @Gozala
A few questions that arise:

Hey @oed thanks for taking time. I should have more context, there is some is in the #3022 (comment). I will try to address all of the questions below

  1. Is the idea to have another abstraction above the DAGService api that allows you to use the codec that you need? (since you are putting raw blocks)

Not exactly. New IPLD stack changes things a bit and that imported Block refers to Block coming from https://github.com/ipld/js-block not to be mistaken for https://github.com/ipld/js-ipld-block. Which provides interface to encode / decode blocks. DAGService here will just provide a way to store / retrieve blocks from / to IPFS node.

I do however refer to another abstraction layer in the comments. That is mostly to say that we can provide API compatibility with a current / future JS-IPFS API by wrapping this DAGService API which is intended to be as lean as possible layer on top of raw MessagePort.

  1. Does this only cover a shared storage layer? (i.e. no shared networking)

There are more details current plan here #3022 (comment) to address specifically this though idea is that

  • There will be a IPFSClient class that will provide API to the remote remote IPFS node you can attach to:
     class IPFSClient implements IPFSService {
       static attach(port:MessagePort):IPFSClient {
          // ...
       }
       // ...
    }
    Current discussion is about what the API of the IPFSClient instance
  • There also will be a IPFSServer class that can provide IPFS node (which could be either js-ipfs-lite or full fledged js-ipfs) over the message port:
    const main = async () => {
      const ipfs = await IPFS.create({ /* ... */})
      const server = new IPFSServer(ipfs)
      // self is SharedWorker global scope. On every tab connecting to it
      // we wire it up with a server
      self.onconnect = ({ ports })  => server.connect(port)
    }
    In a phase 1 (current focus) you are free to configre IPFS node as you're pleased. In phase 2 (future) when we start sharing across origins that will no longer be the case however that is also where we intend to have shared storage so it would not be a problem.
  1. If there is shared networking, how can I interact with the libp2p node? (connect to nodes etc)

In phase 1 you're operating an IPFS node so you're free to do anything. I am also happy to assist in trying to work out details how to provide some convenience in terms of wiring things up across worker / main thread.

  1. What about the pin api? Looks like you can only pin with the dag.put right now?

There is also ipfs.blob.put(blob, { pin: true }) which is essentially simplified version of ipfs.add which does only what your team and textile are doing with ipfs.add if that is not the case, this would be a time point it out.

If you are referring to ipfs.pin.* APIs that did not made a list because it is not something you team or Textile team has mentioned as being used. If it is something that 3box depends on I'll make sure to add that.

5. Can the blob api be implemented on top of the dag api? (asked this before)

I am not sure understand what are you referring to as blob API, could you elaborate please ?

@oed
Copy link
Contributor

oed commented Jun 11, 2020

Thanks for clarifying @Gozala, this being a new ipfs interface makes sense.

exactly. New IPLD stack changes things a bit and that imported Block refers to Block coming from https://github.com/ipld/js-block

This is clear, so developers using this new interface are expected to import @ipld/block in order to be able to put object into the dag?

If you are referring to ipfs.pin.* APIs that did not made a list because it is not something you team or Textile team has mentioned as being used.

Sorry about that, slipped my mind. Pinning is definitely something we need. At various points it might be useful to pin objects that you already have locally.

I am not sure understand what are you referring to as blob API, could you elaborate please ?

I'm basically wondering why you can't use ipfs.dag.put instead of ipfs.blob.put? Does the blob api do something that the dag api is not capable of?

@mikeal
Copy link
Contributor

mikeal commented Jun 11, 2020

This is clear, so developers using this new interface are expected to import @ipld/block in order to be able to put object into the dag?

js-ipfs should probably expose the Block interface it is using internally for others to use. If it doesn’t then people will end up importing their own and then they are going to end up re-loading a bunch of codecs and other multiformats plugins that IPFS had already loaded to use.

@Gozala
Copy link
Contributor Author

Gozala commented Jun 11, 2020

This is clear, so developers using this new interface are expected to import @ipld/block in order to be able to put object into the dag?

I was imagining that Block would be exposed by IPFSService maybe as IPFSService.Bock or IPFSService.dag.Block.

Sorry about that, slipped my mind. Pinning is definitely something we need. At various points it might be useful to pin objects that you already have locally.

I'll make sure to incorporate it!

I'm basically wondering why you can't use ipfs.dag.put instead of ipfs.blob.put? Does the blob api do something that the dag api is not capable of?

ipfs.blob.put(blob, {...}) does essentially what ipfs.add(blob, {...}) does, except it takes just on kind of input a Blob (Well File as well, which is subclass) and returns you a CID. This is based on the feeback I got from talking to you and @carsonfarmer where both teams seem to use very tiny subset of .add functionality. It also addresses inconvenience you've mentioned in #3022 (comment)

You could technically do it with ipfs.dag API, but you'd need to chunk, create unixfs blocks etc...

@Gozala Gozala self-assigned this Jun 13, 2020
@achingbrain
Copy link
Member

js-ipfs should probably expose the Block interface it is using internally for others to use

It would need to be added to the exports for core and the http client.

@Gozala it would be preferable if ipfs-message-port-client used the same mechanism.

Nb. exposing dependencies like this ties our hands around versioning as the module's exposed API essentially becomes our API so when they break, we break, even if the only place they are exposed is via these exports. It also complicates the TS effort as these modules start needing type definitions where they don't already have them.

If pressed I'd rather not export anything like this for that reason though I can see the utility.

Pinning is definitely something we need. At various points it might be useful to pin objects that you already have locally.

@oed I'm curious as to why you are pinning, you say further up the thread that you aren't using GC, pinning is really only there to stop blocks from being garbage collected.

Now we have to do:
const cid = (await (await ipfs.add(buffer)).next()).value.cid

Maybe do this instead:

const { cid } = await last(ipfs.add(buffer))

You can use it-last or something from streaming-iterables to make working with async iterators less verbose.

#3022 (comment)

My feeling on this API is that we've always tried to ensure that the API between ipfs-core and the ipfs-http-api remains the same (within reason). That lets the user switch between ipfs implementations largely by changing their setup only, they should not need to make structural changes to their application code.

It also means that we can service both implementations with the same test suite, the same documentation, the same tutorials, and that having a canonical way of performing certain actions is possible which helps immeasurably when responding to user questions here, on discuss, IRC & stack overflow.

If we do things like introduce ipfs.blob.* when there's no analog in core/http, change which positionals are optional1 or start adopting new modules that are incompatible with the rest of the IPFS ecosystem all that starts to go out of the window.

I think for an initial release it would be acceptable to say that not all methods or argument types are accepted, but the API that is supported should have the same methods and arguments types/names/positions.

So for example keep the ipfs.add method, but maybe in the first iteration it only accepts a String, Uint8Array or a Blob and the async iterator it returns is just a hack to maintain compatibility with accepting streams of these things.

We ship that, perhaps add some notes to the docs about what is supported where and people can start to experiment with it so we can get some feedback on our ideas when they use actual working code.

Maybe the performance is bad in the first release, I think that's ok, but I don't think it's ok to have an incompatible API for the message port client because it increases the cognitive load on the user and the maintenance/support/documentation load on the maintainers - we want this to be a first class implementation and not an experimental client API.


Footnotes

  1. FWIW I think the only positional that should be optional is the options object and any optional arguments should be in the options object.

@oed
Copy link
Contributor

oed commented Jun 16, 2020

I'm curious as to why you are pinning, you say further up the thread that you aren't using GC, pinning is really only there to stop blocks from being garbage collected.

When GC was introduced in js-ipfs it was disabled by default. If this setting has been changed it has not been properly communicated. Please let me know what the default is, could definitely be a problem if this has changed!

Maybe do this instead

Makes sense 👍

@achingbrain
Copy link
Member

achingbrain commented Jun 16, 2020

The only way GC runs is if it is manually triggered with await ipfs.repo.gc().

There are no plans to run this on a timer or similar right now, but if that changes you can bet we'll shout about it from the rooftops first!

@oed
Copy link
Contributor

oed commented Jun 16, 2020

Thanks for clarifying @achingbrain 🙏
As to your question, we started pinning some things in preparation for the GC, because we didn't know it was not enabled by default back then. Seems like a good practice to be pinning things you want to keep around anyhow.

@Gozala
Copy link
Contributor Author

Gozala commented Jun 16, 2020

My feeling on this API is that we've always tried to ensure that the API between ipfs-core and the ipfs-http-api remains the same (within reason). That lets the user switch between ipfs implementations largely by changing their setup only, they should not need to make structural changes to their application code.

It also means that we can service both implementations with the same test suite, the same documentation, the same tutorials, and that having a canonical way of performing certain actions is possible which helps immeasurably when responding to user questions here, on discuss, IRC & stack overflow.

There is undoubtedly a lot of benefits to doing it that way and I would wholeheartedly agree in a broader context. Context here is however crucial. Going with the same interface comes with a tradeoffs, like you can never really change / improve things without disrupting the whole community.

In this context we are working in tandem with the group so we can validate all our assumptions and also use this as an opportunity to:

  1. Drastically reduce complexity.
  2. Validate whether the new simple API has merit and if so gradually simplify js-ipfs / http-client APIs as well to arrive to same interface.

If we do things like introduce ipfs.blob.* when there's no analog in core/http, change which positionals are optional[^1] or start adopting new modules that are incompatible with the rest of the IPFS ecosystem all that starts to go out of the window.

As I tried to capture excessively in the interface comments, having simple bare-bone API does not necessarily mean API incompatibility with the IPFS ecosystem. We could and probably should layer that with the layer that provides API interop. What it merely does provides is a choice to the community to ditch all that complexity or stick with convenience of existing API.

We also have 👍 from most prominent will be users of this work that are interested in former choice. And with this context in mind I would argue that there is a benefit an trying things with willing members of community and use that as an opportunity to inform work of js-ipfs proper.

I think for an initial release it would be acceptable to say that not all methods or argument types are accepted, but the API that is supported should have the same methods and arguments types/names/positions.

I think proposed API does that to some degree, also why I called out that blob.* methods might end up being top level add / cat. However I also think that is a bad idea because:

API is going to be incompatible and masking it under familiar API is just deceiving. It creates false assumption that things are compatible, in practice leading to errors. Worse yet, some of errors could hide under the code paths uncovering issues only in production. Sure we can document differences and assume that users notice or we could just call them different names and be sure that users will notice.

@Gozala
Copy link
Contributor Author

Gozala commented Jun 16, 2020

I would like to offer one extra argument in favor of simplification here. I think this can also help us in "IPFS in browsers natively" endeavor. I do not believe there is any chance API as is could make it into browsers. I think this is also an opportunity to consider what such API could be and even exercise it in close collaborations. In other words inform ourselves in that axis as well.

@carsonfarmer
Copy link

carsonfarmer commented Jun 16, 2020

As a likely consumer of the proposed work/APIs here, I'd just like to voice an opinion in favor of simplification over compatibility. The IPFS API as it is now is vast and includes APIs that are needed when building and testing IPFS, but are less likely to be used by apps that are simply using IPFS as a library. (Certainly I'm biased here... we only use a small subset of the API. I don't have much experience with other projects beyond the few mentioned previously in this thread... so take that comment with a grain of salt). An extensible comparability layer sounds appealing to me, not only insofar as it helps keep the surface area of the core APIs small and robust.

@achingbrain achingbrain added kind/feature and removed need/triage Needs initial labeling and prioritization labels Jun 17, 2020
@achingbrain
Copy link
Member

I think what I've said is a simplification of the existing API. That is, don't implement it all, just implement the top-level files API (e.g. ipfs.add, ipfs.cat, ipfs.ls, ipfs.get etc) and the DAG API, and limit the inputs these take in the first release, if it helps you get something out of the door quicker.

Introducing a new ipfs.blob API does not make it simpler because now you have ipfs.add and ipfs.blob which sort of do the same job and sort of take the same arguments but sort of don't and they're not supported everywhere and don't really behave the same.

The API can be improved, redundancy can be removed and it can be made leaner, more intuitive and more straightforward and we should talk about this but I feel it's outside the scope of this piece of work.

@Gozala
Copy link
Contributor Author

Gozala commented Jun 22, 2020

I would like to share some updates on this work:

  • Implementation of ipfs-message-port- client / server has been submitted for the code review feat: share IPFS node between browser tabs #3081
    • If you are interested and have time to test it in your product / library this might be a good time to do so (feel free to reach out I'd be happy to be of assistance).
  • Current implementation end up following js-ipfs API as per documentation. Please note that js-ipfs in practice less strict than docs are, implying that uses that work with js-ipfs but are invalid according to docs will error.
  • Even though ipfs.add is compatible with js-ipfs, it's best to really just pass Blobs.
  • Everywhere you'd get a node Buffer you should expect Uint8Array instead. This is not exactly true everywhere right now, but that certainly a direction.
  • Current implementation provides following API subset
  • ipfs-message-port-server takes running IPFS node and exposes it over message channel.
    • I would like to make it possible to passing ipfs-lite instead of js-ipfs an option, but after what's there lands.
  • API surface does not cover libp2p / pubsub stuff which was identified as important. That is because in this iteration you are providing server an IPFS node, so you could use pubsub and anything else out of band.
    • Each client / server component is decoupled from other and adding one is fairly straight forward. If you would like to expose an API e.g. pubsub through the client I'm happy to help with that.

On the API discussion:

@oed
Copy link
Contributor

oed commented Jun 23, 2020

Using just the ipfs.block API makes sense to me. That + the ipfs.pin API seems to be the only things needed for us on the ipfs side.

On the libp2p side our long term needs are still a bit unclear, and it likely depends on how well performant the libp2p.pubsub API is in regards to peer discovery.

achingbrain added a commit that referenced this issue Jul 27, 2020
This pull request adds 3 (sub)packages:

1. `ipfs-message-port-client` - Provides an API to an IPFS node over the [message channel][MessageChannel].
2. `ipfs-message-port-server` - Provides an IPFS node over [message channel][MessageChannel].
3. `ipfs-message-port-protocol` - Shared code between client / server mostly related to wire protocol encoding / decoding.

Fixes #3022

[MessageChannel]:https://developer.mozilla.org/en-US/docs/Web/API/MessageChannel

Co-authored-by: Marcin Rataj <lidel@lidel.org>
Co-authored-by: Alex Potsides <alex@achingbrain.net>
SgtPooki referenced this issue in ipfs/js-kubo-rpc-client Aug 18, 2022
This pull request adds 3 (sub)packages:

1. `ipfs-message-port-client` - Provides an API to an IPFS node over the [message channel][MessageChannel].
2. `ipfs-message-port-server` - Provides an IPFS node over [message channel][MessageChannel].
3. `ipfs-message-port-protocol` - Shared code between client / server mostly related to wire protocol encoding / decoding.

Fixes #3022

[MessageChannel]:https://developer.mozilla.org/en-US/docs/Web/API/MessageChannel

Co-authored-by: Marcin Rataj <lidel@lidel.org>
Co-authored-by: Alex Potsides <alex@achingbrain.net>
This issue was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants