New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Orbit-DB as PouchDB backend #4
Comments
|
it would be super useful to do this, and to make it a level backend. |
|
@lgleim Thanks for the proposal. I'm unfamiliar with PouchDB and its ecosystem, so I was wondering what in your opinion are the killer features that make PouchDB great to work with and your choice of frontend DB? Is it because it's a document store or something else? When starting orbit-db, I had the idea of making it levelUP compatible so it can be used as a backend for leveldb. Same idea can probably be applied to PouchDB, and others. I didn't keep to the idea so as of today there's no easy and simple integration to level/pouch, but it wouldn't be hard either to do that. That said, I'm not sure orbit-db (the interface) is the right component to be the backend as it does what level/pouch do and provide a data/query API to the database. However, the data stores under orbit-db (orbit-db-kvstore or orbit-db-eventstore or counterstore) could probably be integrated to level/pouch as backends. I don't have time right now to take this forward but will definitely get back to this at some point. Meanwhile, @lgleim if you have motivation to work on this, I'd be happy to help and work with you to provide what is needed to integrate the data stores. |
|
I'll take a crack at pitching in on this one; PouchDB is part of the CouchDB family. PouchDB runs entirely in JavaScript on the browser. It speaks the Couch replication protocol for documents. So it: I think the request here is that to replicate between many devices, you really need a web hosted CouchDB target(s) to sync the data with (an online server/VPS, because serving HTTP P2P style via JavaScript with lots of "offline first devices" is umm... hard...). Instead of the public IP hosted server, if Pouch (or other DocDB) put data into an Orbit-DB, that app goes P2P in more device-direct way. I see this happening one of two ways (possibly both).
Couch has its RESTful API, map/reduce views, changes feed, etc. The Couch API works via RESTful HTTP by design and I haven't seen mention of anything like map/reduce views/indexing in Orbit. So Couch et al. have their place. Storing a repository of Couch compatible JSON documents and implementing the documented HTTP sync protocol seems like a lot of bang for the buck. I think every Couch based mobile/web app out there could move to Orbit-DB today if it spoke CouchRep. Orbit-DB would then be an intermediary between a bunch of Couch HTTP servers. |
|
PouchDB supports any Level backend. |
|
It would be awesome to have a CouchDB compatible "front-end" to an IPFS-based database. Up for helping with this if I can. As @MikeFair mentioned already, just being able to replicate to/from OrbitDB and a Couch instance would be a big deal, e.g. for use with Couchbase Lite on mobile apps. |
|
Thank you everyone for the input on the topic and keeping it alive. @MikeFair I really like your proposal here and totally buy it :) Having had time to think about it, and progress made since 2016, things look a lot clearer now as to how we can make this happen. I'd like to provide an alternative approach how we could do it: From what I can tell, the CouchRep protocol is very much tied to the client-server/http paradigm and as such would probably require quite a bit of work and clutches to get it working that way. It would require us to manage direct peer-to-peer connections and handle the replication between two peers. As OrbitDB benefits from the libp2p's Pubsub mechanism, we should let libp2p (and IPFS) handle all the p2p networking and abstraction. However, there's another way we can hook into CouchDB/PouchDB that makes the most sense to me: the changes feed. In fact, we're currently working with a project who are integrating OrbitDB with MongoDB in this way. From what I can tell, the change feed has all the information we need: the id, sequence number and the document and the deleted flag. On OrbitDB side these would translate into operations that have id, seq and a payload (the document). So what we would need to do is to listen to the change feed and on every change, we create an operation for that change in OrbitDB's log (feed database in OrbitDB). This log of changes will then be propagated to the network of peers (with the aforementioned libp2p functionality) and on the receiving end, OrbitDB will handle the ordering etc. for the log. Upon receiving an update to the log (ie. a change from the change feed), we "replay" the log into the local PouchDB database. That means, if we receive a new version of the document, we do pouchdb.put(entry.doc), or if we receive a change operation that has the deleted flag set to true, we do pouchdb.del({ _id: entry.id }) in the orbitdb-pouchdb adapter. We would probably have to track the "latest changes seen" so that we don't replay the full log (ie. the changes feed) on every change. In short, transform the PouchDB changes feed to OrbitDB Feed and replay that feed back to PouchDB. So essentially, it would look something like this: I believe doing ^ will let us get the most direct integration point to pouch, we can keep the benefits of p2p networks and content-adressing and don't have to deal with locations nor http. In effect, this would make any couch/pouch database a peer-to-peer database. In this scenario, from an app's perspective, you'd still need to make sure there's a public instance of CouchDB running somewhere in order to make sure app's data gets replicated (if the use case requires 100% availability), but the app wouldn't need to care about location addresses, OrbitDB+IPFS will handle this automatically and the only thing the app/server would need to say is "open database X" and it'll join the p2p swarm and start listening and replicating the changes. As to where all this goes in the code, I'm not sure yet as I haven't looked into it in more detail. For reference, this is how OrbitDB handles feed updates (ie. you can add and delete entries): https://github.com/orbitdb/orbit-db-feedstore/blob/master/src/FeedIndex.js#L12 (the replay part), https://github.com/orbitdb/orbit-db-feedstore/blob/master/src/FeedStore.js#L13 (add/update doc) and https://github.com/orbitdb/orbit-db-eventstore/blob/master/src/EventStore.js#L14 (delete doc). Or actually, we could use the OrbitDB DocumentStore directly to achieve the same effect: https://github.com/orbitdb/orbit-db-docstore/blob/master/src/DocumentIndex.js#L12 (replay) https://github.com/orbitdb/orbit-db-docstore/blob/master/src/DocumentStore.js#L26 (put/update and remove). Any thoughts? I personally don't have time to work on this but love to help if anyone wants to start working on it! My intuition is that it shouldn't be too much code nor complicated, but not knowing exactly the PouchDB side of things, there may be tricky bits there. From OrbitDB's perspective it should be pretty straight forward. |
Not sure I understand this - surely the data & changes feed stored in IPFS would be enough? If we can implement this form of replication (changes feed) from OrbitDB to PouchDB (or another CouchDB type client) then the CouchDB instances are free to to do push/pull replication as normal, right?
I think this would only be true if the replication was triggered from OrbitDB, e.g. if it handled the |
|
Had a few extra hours today and wrote a test for this: https://github.com/haadcode/pouchdb-adapter-orbitdb. Basic replication works, but didn't implement support for getting all revision of the document yet. It's hard to find documentation on the adapters or where to hook in in PouchDB so I'm not sure yet if this is the right way to do it. For usage example, see https://github.com/haadcode/pouchdb-adapter-orbitdb/blob/master/pouchdb-orbitdb.js. Let's see where that takes us. |
|
Great stuff. Look forward to seeing where this goes! |
|
ping! :) @haadcode perhaps I'm misreading this, but it seems like this code is taking data out of PouchDB (via the changes feed) and putting into/onto the OrbitDB channels... The goal I was seeing is something more like a PouchDB server that stored document and index data in Orbit rather than only locally (it likely still caches active working documents locally for speed); and then perhaps multiple PouchDB instances all sharing the same Orbit data/channels. This way end applications talk to Pouch with all its REST semantics, and Pouch talks to Orbit. Applications subscribe to the Pouch changes feed, and the data funnels across Orbit to be published by the Pouch servers. The Pouch servers also implement the Couch eventual consistency model so the applications can be "offline first" and then sync up next time they connect to the P2P network. |
|
I've been chewing on this for a second independently of this thread and writing a custom adapter seems like the best approach. I had success mapping PouchDB's changes feed to an append-only oplog and back again. I'll chew on the adapter you wrote up, @haadcode , and see where I can get. |
|
If you used Pouch as a "backend" for Orbit, how do replications coming into
the PouchDB from the Couch World "push up" into OrbitDB changes?
Perhaps this is a case of making a "CouchDB" database type that implements
a few extra rules that Couch has (like _rev tracking) for a document?
I see Pouch as more of a "broker" between Couch and Orbit that gives the
developer a P2P Mesh-like CouchDB replication topology.
For example, instead of identifying the replication end point as an http
server on the Internet, it's an OrbitDB address.
The local JS app would primarily interact with the Pouch API as a PouchDB
app.
But if the JS app also identified an Orbit DB/Channel ID to participate on,
replication between active peers would go over Orbit.
Instead of having Couch's usual "hub and spoke" like topology where all the
Pouch Nodes are syncing to a central Couch Server on the internet; the
local Pouch Node would replicate with their local Orbit DB...
…On Tue, Jan 16, 2018 at 10:48 AM, Diana Thayer ***@***.***> wrote:
Using OrbitDB as a PouchDB backend complicates much of the benefit gotten
from using PouchDB. For example, the adapter must implement _rev checking
for updating and deleting documents, one of PouchDB's core features.
Re-implementing it defeats much of the purpose of using PouchDB in the
first place. Other things, like replication, wouldn't make sense to
re-implement or overwrite. I for one want the best of both worlds:
PouchDB's ability to replicate with CouchDB, and Orbit's ability to
replicate over IPFS.
It may make more sense to use PouchDB as an OrbitDB backend, such that
there is a custom store type that one might use like this: const db =
orbit.pouch(...). I have a few prototypes that do this already.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#4 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ACMqLY1TvifeKVTHVLjZ3Xg4_3ilJKvhks5tLO8JgaJpZM4IdCJl>
.
|
|
I see your point, but it's really quite difficult both ways. OrbitDB and PouchDB don't map nicely to one another, so making sure the underlying data is always up to date is also difficult. I deleted my initial comment after I thought about it a little more; sorry for the confusion. Because it doesn't map so neatly, it may make sense to include OrbitDB as a plugin that adds additional methods and features to PouchDB, like |
|
Here's a thought; each party participating on the OrbitDB creates an object
in the DB for their peerid.
The replicate their _changes feed and update their current SEQUENCE number
on this data value.
$OrbitDB[$peerID][$couchDatabaseName]['_changes'].append($newStuff)
$OrbitDB[$peerID][$couchDatabaseName]['sequence'] = $newSeqNumber
When other OrbitDB participants see the changes come in, they commit it to
their local Pouch Repo.
They also keep a marker for what the last SEQUENCE number they saw from
that peer was.
$OrbitDB[$peerID][$remotePeerID][$couchDatabaseName]['sequence'] =
$latestSequenceReceived
When nodes restart, they can use that information to figure out if their
repo is out of sync with any of the other peers.
…On Tue, Jan 16, 2018 at 3:35 PM, Diana Thayer ***@***.***> wrote:
I see your point, but it's really quite difficult both ways. OrbitDB and
PouchDB don't map nicely to one another, so making sure the underlying data
is always up to date is also difficult. I deleted my initial comment after
I thought about it a little more; sorry for the confusion.
Because it doesn't map so neatly, it may make sense to include OrbitDB as
a plugin that adds additional methods and features to PouchDB, like
toMultihash() and join(). That saves the trouble of implementing a full
adapter, and scopes the work to ensuring the two stores stay in sync. For
example, the plugin could attach listeners to a changes feed, playing those
changes onto an oplog as they happen.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#4 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ACMqLR30q2I8vcII05MsqpKsRtZ1tjPgks5tLTJWgaJpZM4IdCJl>
.
|
|
OK, I drafted an OrbitDB plugin for PouchDB that exposes a |
|
Oops. Hit comment too soon. Anyway I’ll post source soon. |
|
Source on that plugin: https://github.com/garbados/pouchdb-orbit The plugin adds two methods:
Those are the essential methods and properties I see OrbitDB adding to PouchDB currently, with more making sense once things like mutable access controls are added. Otherwise, the user relies on PouchDB for indexing and queries. The hard part I'm finding is ensuring the integrity of the mapping between OrbitDB and PouchDB in a way that doesn't create a race condition between the mapper and read/write operations. The current implementation does create a race condition; I suspect a solution might reveal itself were I better versed in OrbitDB. NOTE: I'll have to rename that "sync" method since PouchDB already has a sync method. |
|
It's best to think of it more like "replication via OrbitDB" than "Syncing
with OrbitDB".
The main reason for the central CouchDB is because an app can't p2p
replicate with all the peers.
OrbitDB gives them a way to do that.
Think of OrbitDB as the HTTP channel to sync through between two Pouch
nodes.
The goal/point isn't to replicate the Pouch objects in OrbitDB such that
someone could look in OrbitDB to see the DB, it's to use the OrbitDB comm
channel for active peers nodes to broadcast their activities to other
active peers.
So it is absolutely a race condition in the same sense that Couch
replication is a race condition; and why Couch has the conflicts APIs for
dealing with it.
Conflicts come into the system via replication (two peers successfully
updating the same doc_id).
By using OrbitDB to broadcast updates to other peers, and having them
commit those entries, conflicts can be reduced and or detected sooner.
In practice, I find that when you think about using conflict avoiding write
strategies/data structures, they work for the 99.99% case.
We then have conflict detection and resolution practices to handle the
other 0.01%.
Is that in line with what you're thinking here?
…On Wed, Jan 17, 2018 at 5:54 PM, Diana Thayer ***@***.***> wrote:
Source on that plugin: https://github.com/garbados/pouchdb-orbit
The plugin adds two methods:
1. #load(orbit, [address]), which calls load on an OrbitDB store and
attached listeners to the store and the PouchDB instance in order to keep
them in sync. Currently that synchronization happens asynchronously, so
there is a period after a given write where that write hasn't propagated to
the local OrbitDB store yet. This method also populates the .key and
.address attributes.
2. #sync(address), which attempts to merge updates from a given
address. This doesn't work right now.
Those are the essential methods and properties I see OrbitDB adding to
PouchDB currently, with more making sense once things like mutable access
controls <#292> are added.
Otherwise, the user relies on PouchDB for indexing and queries.
The hard part I'm finding is ensuring the integrity of the mapping between
OrbitDB and PouchDB in a way that doesn't create a race condition between
the mapper and read/write operations. The current implementation does
create a race condition; I suspect a solution might reveal itself were I
better versed in OrbitDB.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#4 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ACMqLQKijRjrPCJOQeAJTM6jFI_T9qmDks5tLqRigaJpZM4IdCJl>
.
|
|
I was thinking about this, and it seems you'd want PouchDB to push changes into Orbit in blocks of ~100 and then pull the changes feed 100 at a time. Whether you'd want to store document bodies as their own Orbit objects or within the blocks that also store the changes feed, is something I don't know enough about Orbit to answer. |
|
I polished up the pouchdb-orbit lib a little, with test coverage and all. It's marked as alpha but it allows you to replicate via OrbitDB, similar to how peerpad uses read keys. Any review on the code would be much appreciated; thanks everyone. The chief benefit for me of using OrbitDB via a PouchDB plugin is: PouchDB has robust indexing and query interfaces, including the mango query language and an ecosystem of plugins. Using an OrbitDB plugin allows you to replicate database state over public P2P infrastructure, while retaining those advanced query interfaces. |
|
I know this is a really old thread but I also wanted to use Pouchdb with orbit and didn't see this thread so I built a prototype a few months ago. I want to use orbit as the system of record and as it synchronizes I just want it to put it into pouch so it can easily be searched. Because otherwise orbit needs to load all the data into memory every single time the page loads and using a database of any meaningful size becomes unbearably slow. This also shortcircuits the load mechanism in a way where I'm tracking which of the oplogs get handled and storing that in pouch so they don't have to get re-run. So loading in the browser is pretty quick. This is not production ready and I actually stopped working on this because something else got my attention before I really polished it. I'm not exactly sure how this approach compares to the plugin listed above but I think it's a little different and I'm excited to look into it at some point. Here is the code if anyone is interested. |
|
To anyone who was interested in this, maybe orbit would be a good backend for RxDB? RxDB started out with PouchDB as it's backend but made it pluggable, and while PouchDB was impressive for it's time, the code hasn't aged well with not being updated to use modern js patterns like esmodules. Here's a list of the current storage engines implemented for RxDB: |
|
interesting. I'll definitely check it out. There was actually a decent sized update for pouch recently. Though I'm not sure your specific criticism was addressed. |
lgleim commentedMay 12, 2016
•
edited
Hi,
I'm a big fan of the PouchDB project for the development of offline first apps but somewhat dislike its fundamental dependence on a Server backend.
I was really psyched to find out about your project and its ties to IPFS but am wondering whether you ever considered the possibility to implement Orbit-DB as a PouchDB backend.
This would spare you large parts of the development effort going into the JS database frontend and open up Orbit-DB to a much larger audience.
Looking forward to some feedback!
The text was updated successfully, but these errors were encountered: