New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The definitive replication and authorization guide #153

Open
markopolojarvi opened this Issue Oct 12, 2018 · 7 comments

Comments

Projects
None yet
4 participants
@markopolojarvi
Copy link

markopolojarvi commented Oct 12, 2018

I think we need to hash out and clarify the replication and authorization processes a bit. I have been struggling with this for many days now and based on issues here I'm not alone, so I'm hoping we can use this issue to clear things up.

After reading the documentation, reading the tests, going over the code and issues, I still can't get this right, so I think I'm not far off if I say that the whole replication process is not intuitive to implement. There are so many details to grok so let's go over it one step at a time. The biggest issue seems to be that it's not clear how the hyperdbs need to be set up for replication to work.

Here are two scenarios that I want to figure out, but it seems I can't.

Scenario 1: I have hyperdb 1 that I want to read & write with hyperdb 2

My logic:

  1. Create 2nd hyperdb with 1st hyperdb's local.key
  2. Authorize 2nd hyperdb to write 1st hyperdb
  3. Create replicate() streams from both hyperdbs
  4. Create socket connection between machines and do stream1st.pipe(socket) and socket.pipe(stream_2nd)

That's the structure I got from the docs etc. but it doesn't seem to work. There's the issue with "first hypercore must be the same" which I guess means I have to create 2nd hyperdb with hyperdb(storage, hyperdb_1st.local.key). With that, I can see the connection happening, but nothing gets replicated. What steps are missing here?

Scenario 2: I want to replicate another hyperdb without writing

My logic:

  1. Create own hyperdb with remote hyperdb's local.key?
  2. Create replicate({ live: true }) stream for my hyperdb
  3. Connect to remote hyperdb's socket
  4. Do socket.pipe(my_hyperdb_stream)

With this I don't get any errors, I see data going over the socket, and I can see the data in the remote hyperdb, but nothing shows up in my hyperdb. It's a bit like the replication doesn't start for some reason.

@pfrazee

This comment has been minimized.

Copy link
Collaborator

pfrazee commented Oct 12, 2018

@0fork I wont be able to help on this but I do just want to mention that hyperdb and multiwriter and the networking stack are actively getting worked on, so we do know there are problems and we're working to improve on these flows.

@markopolojarvi

This comment has been minimized.

Copy link
Author

markopolojarvi commented Oct 13, 2018

Here's a code that's suppose to create hyperdb1 to be cloned to hyperdb2 via hyperswarm/network. As you can see from running it that sockets are receiving data but no writes are being passed from hyperdb1 to hyperdb2.

What am I missing here?

const hyperdb = require("hyperdb")
const network = require("@hyperswarm/network")
const cr = require("crypto")

// this is meant to "simulate" two separate servers so two networks
const net1 = network()
const net2 = network()

const $key = "194e841187f33843b246a796f8a6aceb0d8d5d22b36661e8500b4d693a31e7e5"
const id = cr.createHash("sha256").update($key).digest()
net1.discovery.holepunchable((err, yes) => {
  if (err || !yes) {
    console.log("no hole")
    process.exit()
  }
})

const db1 = hyperdb(`./_test1`, $key, { valueEncoding: "utf-8" })
let db2
let rep1
let rep2
db1.ready(() => {
  console.log("hyper1 created")
  rep1 = db1.replicate({ live: true })
  net1.join(id, {
    lookup: true,
    announce: true,
  })
  net1.on("connection", (socket) => {
    // this is suppose to "push" so piping from
    // rep1 to socket (rep2) to rep1
    rep1.pipe(socket).pipe(rep1).on("end", function() {
      console.log("socket1 pipe end")
    })
    socket.on("data", (data) => {
      console.log("socket1 got data", data)
    })
  })
  db2 = hyperdb(`./_test2`, $key, { valueEncoding: "utf-8" })
  db2.ready(() => {
    console.log("hyper2 created")
    rep2 = db2.replicate({ live: true })
    net2.join(id, {
      lookup: true,
      announce: true,
    })
    net2.on("connection", (socket) => {
      // this is suppose to replicate so piping from
      // socket (rep1) to rep2 to socket (rep1)
      socket.pipe(rep2).pipe(socket).on("end", function() {
        console.log("socket2 pipe end")
      })
      socket.on("data", (data) => {
        console.log("socket2 got data", data)
      })
    })
    db2.watch("/test", (err, data) => {
      console.log("socket2 /test", data)
    })
  })
})

setInterval(function() {
  db1.put("/test", "test", () => {
    db1.list((err, list) => {
      console.log("1", list)
    })
    db2.list((err, list) => {
      console.log("2", list)
    })
  })
}, 3000)
@pfrazee

This comment has been minimized.

Copy link
Collaborator

pfrazee commented Oct 13, 2018

@0fork Your broader point about needing a guide is on point. I debugged your script and was only able to do so because I know about some gotchas.

I made a few changes but there were only two that mattered:

  1. I changed the use of @hyperswarm/network to only have one side announce and the other side lookup. That's because hyperswarm doesn't yet have connection deduplication builtin, and so you were getting more connections than you need. We're either going to have dedup builtin to the code, or we'll put that pattern in the readme once we've got one written.
  2. You were providing a public key to both hyperdb instances, but that will only work if you already have the private key to match. Not supplying the public key to the first instance solved that -- if the archive already exists it'll load the key from disk, and if it doesnt already exist it'll mint a new keypair.

Here's the fixed snippet:

const pump = require("pump")
const hyperdb = require("hyperdb")
const network = require("@hyperswarm/network")
const cr = require("crypto")

// this is meant to "simulate" two separate servers so two networks
const net1 = network()
const net2 = network()

const db1 = hyperdb(`./_test1`, { valueEncoding: "utf-8" })
let db2
db1.ready(() => {
  console.log("hyper1 created")

  console.log('swarming')
  const $key = db1.key
  const id = cr.createHash("sha256").update($key).digest()
  net1.discovery.holepunchable((err, yes) => {
    if (err || !yes) {
      console.log("no hole")
      process.exit()
    }
  })

  net1.join(id, {
    lookup: false,
    announce: true,
  })
  net1.on("connection", (socket) => {
    console.log('net1 got connection')
    // this is suppose to "push" so piping from
    // rep1 to socket (rep2) to rep1
    var rep = db1.replicate({ live: true })
    pump(rep, socket, rep, function() {
      console.log("socket1 pipe end")
    })
    socket.on("data", (data) => {
      console.log("socket1 got data", data)
    })
  })
  db2 = hyperdb(`./_test2`, $key, { valueEncoding: "utf-8" })
  db2.ready(() => {
    console.log("hyper2 created")
    net2.join(id, {
      lookup: true,
      announce: false,
    })
    net2.on("connection", (socket) => {
      console.log('net2 got connection')
      // this is suppose to replicate so piping from
      // socket (rep1) to rep2 to socket (rep1)
      var rep = db2.replicate({ live: true })
      pump(rep, socket, rep, function() {
        console.log("socket2 pipe end")
      })
      socket.on("data", (data) => {
        console.log("socket2 got data", data)
      })
    })
    db2.watch("/test", (err, data) => {
      console.log("socket2 /test", data)
    })
  })
})

setInterval(function() {
  db1.put("/test", "test", () => {
    db1.list((err, list) => {
      console.log("1", list)
    })
    db2.list((err, list) => {
      console.log("2", list)
    })
  })
}, 3000)
@markopolojarvi

This comment has been minimized.

Copy link
Author

markopolojarvi commented Oct 14, 2018

@pfrazee aaah, thank you! I knew it was some small mistake in the configuration instead of a bug in the library code.

Authorization seems to work just by adding db1.authorize(Buffer.from(db2.local.key, "hex"), () => {}) in db2.ready(() => { ... }) so no problems there.

So to summarize: The biggest gotcha seems to supplying the key pair right. The problem for me was that everything seemed to initialize right without errors so I assumed hyperdb had everything it needed. I'm still a bit unclear why hyperdb1 writes worked if only public key was supplied because doesn't hypercore sign each chunk with the secret key to be verified with the public key?

@pfrazee

This comment has been minimized.

Copy link
Collaborator

pfrazee commented Oct 15, 2018

@0fork Yeah I think the reason it doesn't fail if you supply a public key is to meet the usecase of db2, where you're joining a pre-existing hyperdb as a second author. I agree that's a footgun though.

@reconbot

This comment has been minimized.

Copy link

reconbot commented Oct 15, 2018

I just want to say this thread is an education. Thank you for doing this out in the open! 👏

@lachenmayer

This comment has been minimized.

Copy link

lachenmayer commented Nov 8, 2018

Hey folks, I wrote a pretty detailed guide about authorization & replication in hyperdb. Hope it's useful!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment