Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Orbit-DB as PouchDB backend #4

Open
lgleim opened this issue May 12, 2016 · 23 comments
Open

Orbit-DB as PouchDB backend #4

lgleim opened this issue May 12, 2016 · 23 comments

Comments

@lgleim
Copy link

lgleim commented May 12, 2016

Hi,

I'm a big fan of the PouchDB project for the development of offline first apps but somewhat dislike its fundamental dependence on a Server backend.

I was really psyched to find out about your project and its ties to IPFS but am wondering whether you ever considered the possibility to implement Orbit-DB as a PouchDB backend.
This would spare you large parts of the development effort going into the JS database frontend and open up Orbit-DB to a much larger audience.

Looking forward to some feedback!

@jbenet
Copy link

jbenet commented May 17, 2016

it would be super useful to do this, and to make it a level backend.

@haadcode
Copy link
Member

haadcode commented May 20, 2016

@lgleim Thanks for the proposal. I'm unfamiliar with PouchDB and its ecosystem, so I was wondering what in your opinion are the killer features that make PouchDB great to work with and your choice of frontend DB? Is it because it's a document store or something else?

When starting orbit-db, I had the idea of making it levelUP compatible so it can be used as a backend for leveldb. Same idea can probably be applied to PouchDB, and others. I didn't keep to the idea so as of today there's no easy and simple integration to level/pouch, but it wouldn't be hard either to do that. That said, I'm not sure orbit-db (the interface) is the right component to be the backend as it does what level/pouch do and provide a data/query API to the database. However, the data stores under orbit-db (orbit-db-kvstore or orbit-db-eventstore or counterstore) could probably be integrated to level/pouch as backends.

I don't have time right now to take this forward but will definitely get back to this at some point. Meanwhile, @lgleim if you have motivation to work on this, I'd be happy to help and work with you to provide what is needed to integrate the data stores.

@MikeFair
Copy link

MikeFair commented Feb 2, 2017

@haadcode

I'll take a crack at pitching in on this one; PouchDB is part of the CouchDB family. PouchDB runs entirely in JavaScript on the browser. It speaks the Couch replication protocol for documents.

So it:
A) can "sync" documents with CouchDB, PouchDB, or other dbs implementing Couch doc replication
B) follows CouchDB semantics (e.g. requires latest revision id to update a doc, map/reduce views, etc.)
C) Implements Couch URL/API (e.g. raw db url will give stats, has per db changes feed and all_docs view)

I think the request here is that to replicate between many devices, you really need a web hosted CouchDB target(s) to sync the data with (an online server/VPS, because serving HTTP P2P style via JavaScript with lots of "offline first devices" is umm... hard...). Instead of the public IP hosted server, if Pouch (or other DocDB) put data into an Orbit-DB, that app goes P2P in more device-direct way.

I see this happening one of two ways (possibly both).

  1. Pouch uses IndexDB to store data via the browser; if Orbit-DB was a drop-in replacement for IndexDB then that's one approach. The changes feed would need some work, but I'm sure there's a way.

  2. (I think this is easier) Implement the Couch replication protocol via a local HTTP proxy. Pouch, and anything else that speaks CouchRep, then syncs with that target.

Couch has its RESTful API, map/reduce views, changes feed, etc. The Couch API works via RESTful HTTP by design and I haven't seen mention of anything like map/reduce views/indexing in Orbit. So Couch et al. have their place. Storing a repository of Couch compatible JSON documents and implementing the documented HTTP sync protocol seems like a lot of bang for the buck.

I think every Couch based mobile/web app out there could move to Orbit-DB today if it spoke CouchRep. Orbit-DB would then be an intermediary between a bunch of Couch HTTP servers.

@fiatjaf
Copy link

fiatjaf commented Aug 10, 2017

PouchDB supports any Level backend.

@adamski
Copy link

adamski commented Oct 23, 2017

It would be awesome to have a CouchDB compatible "front-end" to an IPFS-based database. Up for helping with this if I can. As @MikeFair mentioned already, just being able to replicate to/from OrbitDB and a Couch instance would be a big deal, e.g. for use with Couchbase Lite on mobile apps.

@haadcode
Copy link
Member

haadcode commented Oct 24, 2017

Thank you everyone for the input on the topic and keeping it alive. @MikeFair I really like your proposal here and totally buy it :) Having had time to think about it, and progress made since 2016, things look a lot clearer now as to how we can make this happen.

I'd like to provide an alternative approach how we could do it:

From what I can tell, the CouchRep protocol is very much tied to the client-server/http paradigm and as such would probably require quite a bit of work and clutches to get it working that way. It would require us to manage direct peer-to-peer connections and handle the replication between two peers. As OrbitDB benefits from the libp2p's Pubsub mechanism, we should let libp2p (and IPFS) handle all the p2p networking and abstraction.

However, there's another way we can hook into CouchDB/PouchDB that makes the most sense to me: the changes feed. In fact, we're currently working with a project who are integrating OrbitDB with MongoDB in this way.

From what I can tell, the change feed has all the information we need: the id, sequence number and the document and the deleted flag. On OrbitDB side these would translate into operations that have id, seq and a payload (the document). So what we would need to do is to listen to the change feed and on every change, we create an operation for that change in OrbitDB's log (feed database in OrbitDB). This log of changes will then be propagated to the network of peers (with the aforementioned libp2p functionality) and on the receiving end, OrbitDB will handle the ordering etc. for the log. Upon receiving an update to the log (ie. a change from the change feed), we "replay" the log into the local PouchDB database. That means, if we receive a new version of the document, we do pouchdb.put(entry.doc), or if we receive a change operation that has the deleted flag set to true, we do pouchdb.del({ _id: entry.id }) in the orbitdb-pouchdb adapter. We would probably have to track the "latest changes seen" so that we don't replay the full log (ie. the changes feed) on every change. In short, transform the PouchDB changes feed to OrbitDB Feed and replay that feed back to PouchDB. So essentially, it would look something like this:

pouchdb->changes feed->orbitdb.pocuh-feed->orbitdb-log(peerA)->libp2p/ipfs->orbitdb-log(peerB)->orbitdb.pouch-feed->pouchdb

I believe doing ^ will let us get the most direct integration point to pouch, we can keep the benefits of p2p networks and content-adressing and don't have to deal with locations nor http. In effect, this would make any couch/pouch database a peer-to-peer database. In this scenario, from an app's perspective, you'd still need to make sure there's a public instance of CouchDB running somewhere in order to make sure app's data gets replicated (if the use case requires 100% availability), but the app wouldn't need to care about location addresses, OrbitDB+IPFS will handle this automatically and the only thing the app/server would need to say is "open database X" and it'll join the p2p swarm and start listening and replicating the changes.

As to where all this goes in the code, I'm not sure yet as I haven't looked into it in more detail. For reference, this is how OrbitDB handles feed updates (ie. you can add and delete entries): https://github.com/orbitdb/orbit-db-feedstore/blob/master/src/FeedIndex.js#L12 (the replay part), https://github.com/orbitdb/orbit-db-feedstore/blob/master/src/FeedStore.js#L13 (add/update doc) and https://github.com/orbitdb/orbit-db-eventstore/blob/master/src/EventStore.js#L14 (delete doc). Or actually, we could use the OrbitDB DocumentStore directly to achieve the same effect: https://github.com/orbitdb/orbit-db-docstore/blob/master/src/DocumentIndex.js#L12 (replay) https://github.com/orbitdb/orbit-db-docstore/blob/master/src/DocumentStore.js#L26 (put/update and remove).

Any thoughts?

I personally don't have time to work on this but love to help if anyone wants to start working on it! My intuition is that it shouldn't be too much code nor complicated, but not knowing exactly the PouchDB side of things, there may be tricky bits there. From OrbitDB's perspective it should be pretty straight forward.

@adamski
Copy link

adamski commented Oct 24, 2017

from an app's perspective, you'd still need to make sure there's a public instance of CouchDB running somewhere in order to make sure app's data gets replicated (if the use case requires 100% availability)

Not sure I understand this - surely the data & changes feed stored in IPFS would be enough? If we can implement this form of replication (changes feed) from OrbitDB to PouchDB (or another CouchDB type client) then the CouchDB instances are free to to do push/pull replication as normal, right?

From what I can tell, the CouchRep protocol is very much tied to the client-server/http paradigm and as such would probably require quite a bit of work and clutches to get it working that way. It would require us to manage direct peer-to-peer connections and handle the replication between two peers.

I think this would only be true if the replication was triggered from OrbitDB, e.g. if it handled the _replicate endpoint. As I understand it, normally either side of a CouchDB instance can usually handle this, whether for push or pull, but if OrbitDB only supported being the 'client' in this case, i.e. the _changes API, then it would not need to handle the connections etc, only publish and listen to changes.

@haadcode
Copy link
Member

haadcode commented Oct 25, 2017

Had a few extra hours today and wrote a test for this: https://github.com/haadcode/pouchdb-adapter-orbitdb.

Basic replication works, but didn't implement support for getting all revision of the document yet. It's hard to find documentation on the adapters or where to hook in in PouchDB so I'm not sure yet if this is the right way to do it. For usage example, see https://github.com/haadcode/pouchdb-adapter-orbitdb/blob/master/pouchdb-orbitdb.js.

Let's see where that takes us.

@adamski
Copy link

adamski commented Oct 25, 2017

Great stuff. Look forward to seeing where this goes!

@MikeFair
Copy link

MikeFair commented Jan 6, 2018

ping! :)

@haadcode perhaps I'm misreading this, but it seems like this code is taking data out of PouchDB (via the changes feed) and putting into/onto the OrbitDB channels...
Which makes this code a kind of one-way Pouch->Orbit broadcaster example at the moment right?

The goal I was seeing is something more like a PouchDB server that stored document and index data in Orbit rather than only locally (it likely still caches active working documents locally for speed); and then perhaps multiple PouchDB instances all sharing the same Orbit data/channels. This way end applications talk to Pouch with all its REST semantics, and Pouch talks to Orbit.

Applications subscribe to the Pouch changes feed, and the data funnels across Orbit to be published by the Pouch servers. The Pouch servers also implement the Couch eventual consistency model so the applications can be "offline first" and then sync up next time they connect to the P2P network.

@garbados
Copy link
Contributor

garbados commented Jan 16, 2018

I've been chewing on this for a second independently of this thread and writing a custom adapter seems like the best approach. I had success mapping PouchDB's changes feed to an append-only oplog and back again. I'll chew on the adapter you wrote up, @haadcode , and see where I can get.

@MikeFair
Copy link

MikeFair commented Jan 16, 2018

@garbados
Copy link
Contributor

garbados commented Jan 16, 2018

I see your point, but it's really quite difficult both ways. OrbitDB and PouchDB don't map nicely to one another, so making sure the underlying data is always up to date is also difficult. I deleted my initial comment after I thought about it a little more; sorry for the confusion.

Because it doesn't map so neatly, it may make sense to include OrbitDB as a plugin that adds additional methods and features to PouchDB, like toMultihash() and join(). That saves the trouble of implementing a full adapter, and scopes the work to ensuring the two stores stay in sync. For example, the plugin could attach listeners to a changes feed, playing those changes onto an oplog as they happen.

@MikeFair
Copy link

MikeFair commented Jan 17, 2018

@garbados
Copy link
Contributor

garbados commented Jan 17, 2018

OK, I drafted an OrbitDB plugin for PouchDB that exposes a .load(orbit, [address]) method that attaches event listeners which map changes in either dataset to the other. I’ll post source soon but it relies

@garbados
Copy link
Contributor

garbados commented Jan 17, 2018

Oops. Hit comment too soon.

Anyway I’ll post source soon.

@garbados
Copy link
Contributor

garbados commented Jan 18, 2018

Source on that plugin: https://github.com/garbados/pouchdb-orbit

The plugin adds two methods:

  1. #load(orbit, [address]), which calls load on an OrbitDB store and attached listeners to the store and the PouchDB instance in order to keep them in sync. Currently that synchronization happens asynchronously, so there is a period after a given write where that write hasn't propagated to the local OrbitDB store yet. This method also populates the .key and .address attributes.
  2. #sync(address), which attempts to merge updates from a given address. This doesn't work right now.

Those are the essential methods and properties I see OrbitDB adding to PouchDB currently, with more making sense once things like mutable access controls are added. Otherwise, the user relies on PouchDB for indexing and queries.

The hard part I'm finding is ensuring the integrity of the mapping between OrbitDB and PouchDB in a way that doesn't create a race condition between the mapper and read/write operations. The current implementation does create a race condition; I suspect a solution might reveal itself were I better versed in OrbitDB.

NOTE: I'll have to rename that "sync" method since PouchDB already has a sync method.

@MikeFair
Copy link

MikeFair commented Jan 18, 2018

@jchris
Copy link

jchris commented Jan 19, 2018

I was thinking about this, and it seems you'd want PouchDB to push changes into Orbit in blocks of ~100 and then pull the changes feed 100 at a time. Whether you'd want to store document bodies as their own Orbit objects or within the blocks that also store the changes feed, is something I don't know enough about Orbit to answer.

@garbados
Copy link
Contributor

garbados commented Feb 10, 2018

I polished up the pouchdb-orbit lib a little, with test coverage and all. It's marked as alpha but it allows you to replicate via OrbitDB, similar to how peerpad uses read keys. Any review on the code would be much appreciated; thanks everyone.


The chief benefit for me of using OrbitDB via a PouchDB plugin is: PouchDB has robust indexing and query interfaces, including the mango query language and an ecosystem of plugins. Using an OrbitDB plugin allows you to replicate database state over public P2P infrastructure, while retaining those advanced query interfaces.

@ptoner
Copy link
Contributor

ptoner commented Jul 8, 2022

I know this is a really old thread but I also wanted to use Pouchdb with orbit and didn't see this thread so I built a prototype a few months ago.

I want to use orbit as the system of record and as it synchronizes I just want it to put it into pouch so it can easily be searched. Because otherwise orbit needs to load all the data into memory every single time the page loads and using a database of any meaningful size becomes unbearably slow.

This also shortcircuits the load mechanism in a way where I'm tracking which of the oplogs get handled and storing that in pouch so they don't have to get re-run. So loading in the browser is pretty quick.

This is not production ready and I actually stopped working on this because something else got my attention before I really polished it. I'm not exactly sure how this approach compares to the plugin listed above but I think it's a little different and I'm excited to look into it at some point.

Here is the code if anyone is interested.
https://gitlab.com/american-space-software/orbit-db-pouch

@neonfuz
Copy link

neonfuz commented Jul 10, 2022

To anyone who was interested in this, maybe orbit would be a good backend for RxDB? RxDB started out with PouchDB as it's backend but made it pluggable, and while PouchDB was impressive for it's time, the code hasn't aged well with not being updated to use modern js patterns like esmodules.

Here's a list of the current storage engines implemented for RxDB:
https://rxdb.info/rx-storage.html

@ptoner
Copy link
Contributor

ptoner commented Jul 18, 2022

interesting. I'll definitely check it out.

There was actually a decent sized update for pouch recently. Though I'm not sure your specific criticism was addressed.

https://github.com/pouchdb/pouchdb/releases/tag/7.3.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants