[WIP] Introduce an atomic bulk append RPC function #264

Happy0 · 2019-08-07T15:01:11Z

This pull request introduces a publishAll RPC function which accepts an array of messages to be appended to the log atomically. Either all messages are written successfully, or none are written.

Under the hood, flume's append function already uses level DB's batch function under the hood which writes the messages atomically. This pull request just exposes that functionality further up the stack.

In level-db: https://github.com/Level/levelup#batch

In flume: https://github.com/flumedb/flumelog-level/blob/37edf4a94ceaad88b3bb3bf436517c185449b059/index.js#L25

I still have a lot of tests to write, but I thought I'd open this pull request early for discussion in case anyone has any comments / recommendations about my overall approach (or any reason we shouldn't have this feature.)

Motivation

Sometimes it is desirable to append many messages at once atomically. For example, if an application occasionally persists data to the log that exceeds the individual message size, it might split the message into parts (as an alternative to using the blob system.) In such cases, it can be tricky to recover from failure if only some of the writes succeed (due to a power failure or the application being closed non-gracefully.) This is something that I'm doing in the scuttlebutt-akka-persistence plugin that I've been building as part of my work for Openlaw and I'd like to be able to publish the parts to the log atomically (as recovering from having published a subset of the parts would be tricky.)

An additional use case is being able to publish multiple related messages at the same time atomically. For example (off the top of my head) - 'create gathering' and 'add invitees' messages.

Questions

This change calls timestamp() to generate a monotonic timestamp for each message in the batch. This is because the current indexes (such as indexes/feed) assume an increasing timestamp. Philosophically, these messages are really written at the same time. Is this approach okay? Or do we want to change the indexing?
What is the difference between sbot.add and sbot.publish? Do we need an equivalent for publishAll?
What is db.post used for?

ssb-validate

The required changes in ssb-validate can be found here: ssbc/ssb-validate#17

TODO

Update api.md with the new function and documentation about it.
Add a lot more unit tests

dominictarr · 2019-08-08T14:29:30Z

Okay I agree atomic writes is something we should support.

Answers:

it's probably better to use an increasing timestamp, because somethings use that to disambiguate orders of things.
sbot.add appends an already valid (i.e. signed) message, and sbot.publish takes a message content and creates a message (i.e. signs it for you).
db.post calls back whenever a message is appended. I think it's mainly used in tests.

A Problem: if you called publishAll with multiple message contents, in cases where the second message needs to refer to the first, you'll need to know the hash of the first message, because it goes in the content of the second message, but you don't know that until it's signed. So you can't do gatherings/create and invite attendees in one go. (a better solution would be if gatherings supported creating the invite and editing within the same initial message)

dominictarr · 2019-08-08T14:30:37Z

an option though, would be to create the messages one at a time, without publishing them, and then pass them all to add in one go.

Happy0 · 2019-08-08T14:46:49Z

Thanks @dominictarr :)

I see your point regarding one of the use cases I gave where you have to link to message in the bulk appends. The use case I'm working towards doesn't require this (since the messages I'm posting in bulk have an 'id' field and a 'part' field that describes how they relate to each other.)

Do you think I should try and support this use case (where you create the messages one at a time without publishing, then passing them in one go to allow you to link to a message that hasn't been published yet)? I think it'd be quite tricky / dangerous ...

Or would you be happy to merge this pull request in its current form (but with a lot more tests)?

christianbundy · 2019-11-22T01:29:42Z

I think an atomic publish would be great. Referencing them would be tricky, but I think that's a trade-off: you can publish lots of atomic messages, but they probably shouldn't refer to each other by their sigil hash (because you can't know what it will be up-front). I'm always down for more tests, but I don't dislike the direction of this change.

dominictarr · 2019-11-22T04:12:42Z

I'm in favor of bulk append, but I do think it's a database feature, so support should be implemented in the database layer, not the server/application layer. It will be more potentially useful and also easier to test that way.

stale · 2020-02-20T04:36:34Z

Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?

christianbundy · 2020-06-18T17:46:28Z

Hey @Happy0 is this something you're still working on? I'm trying to reduce the number of open PRs and would love to merge if we can.

Happy0 added 13 commits July 27, 2019 11:55

(wip) bulk append

4bdfb77

We will need a new 'createAll' function in ssb-validate

17e1475

fix check

d2b434c

Fix existing tests

94a28ea

wip

8447d55

some progress.

3416736

wip

25434b2

Remove un-necessary checks in indexes

1025e68

Add monotonic timestamp to each message.

54599d3

Remove un-necessary check in unboxer

dbf216d

Basic interleave test.

216449c

Fail on error for test.

38ce1ba

Add comments

28f15fc

Happy0 mentioned this pull request Aug 7, 2019

[WIP] Bulk append validation ssbc/ssb-validate#17

Closed

dominictarr mentioned this pull request Aug 28, 2019

bulk writes dominictarr/async-write#1

Open

stale bot added the stale label Feb 20, 2020

staltz assigned Happy0 Feb 20, 2020

stale bot removed the stale label Feb 20, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Introduce an atomic bulk append RPC function #264

[WIP] Introduce an atomic bulk append RPC function #264

Happy0 commented Aug 7, 2019 •

edited

Loading

dominictarr commented Aug 8, 2019

dominictarr commented Aug 8, 2019

Happy0 commented Aug 8, 2019

christianbundy commented Nov 22, 2019

dominictarr commented Nov 22, 2019

stale bot commented Feb 20, 2020

christianbundy commented Jun 18, 2020

[WIP] Introduce an atomic bulk append RPC function #264

Are you sure you want to change the base?

[WIP] Introduce an atomic bulk append RPC function #264

Conversation

Happy0 commented Aug 7, 2019 • edited Loading

Motivation

Questions

ssb-validate

TODO

dominictarr commented Aug 8, 2019

dominictarr commented Aug 8, 2019

Happy0 commented Aug 8, 2019

christianbundy commented Nov 22, 2019

dominictarr commented Nov 22, 2019

stale bot commented Feb 20, 2020

christianbundy commented Jun 18, 2020

Happy0 commented Aug 7, 2019 •

edited

Loading