Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feedFormats and encryptionFormats as plugins #347

Merged
merged 92 commits into from
Jul 22, 2022
Merged

feedFormats and encryptionFormats as plugins #347

merged 92 commits into from
Jul 22, 2022

Conversation

staltz
Copy link
Member

@staltz staltz commented Jun 7, 2022

Context

We are adding buttwoo as a feed format, and it works differently than other formats, so we would introduce a lot of complexity. At the same time, we need to improve the situation of support for box2 before we move on with "ssb-tribes2" on ssb-db2.

Solution

sbot.db.installFeedFormat() and sbot.db.installEncryptionFormat() and encodings.

These are now orthogonal concerns, so any encryption format can work with any feed format, which can support different encodings.

🔵 Encoding

An "encoding" is a way of representing a feed message. This PR supports two encodings: 'js' (JavaScript objects) and 'bipf' (BIPF buffers).

🟡 Feed Format

A feed format defines how to create messages that follow a schema and a spec. Every feed format is a plugin-like object with:

  • name string
  • encodings array of supported encoding names
  • functions to create and convert "native messages"
    • newNativeMsg(opts)
    • toNativeMsg(msgVal, encoding)
    • fromNativeMsg(nativeMsg, encoding)
    • fromDecryptedNativeMsg(plaintextBuf, nativeMsg, encoding)
  • encryption-related function
    • toPlaintextBuffer(opts)
  • helper functions getX() and isX()
    • getFeedId(nativeMsg)
    • getMsgId(nativeMsg)
    • isNativeMsg(x)
    • isAuthor(author)
  • validation functions
    • validateSingle()
    • validateBatch()
    • validateOOOBatch()

"Native Msg" this concept is new and important. Every feed format has a "native message format", which is a shape for the message that is specific to the feed format. The feed format fully controls its own "nativeMsg" shape, and it can be anything.

  • For "classic", a nativeMsg is the traditional JavaScript object msgVal.
  • For "bendybutt-v1", a nativeMsg is a BFE-and-bencoded buffer with a shape determined by bendy-butt-spec.
  • For "buttwoo-v1", a nativeMsg is a BFE-and-bipf buffer with a shape determined by buttwoo-spec.

⚫ Encryption Format

An encryption format specifies how to encrypt and decrypt JavaScript buffers intended for some specific recipients. Every encryption format is a plugin-like object with:

  • name string, also used for the suffix on the ciphertext
  • setup(config, cb) OPTIONAL
  • encrypt(plaintextBuf, opts) => ciphertextBuf
  • decrypt(ciphertextBuf, opts) => plaintextBuf

TODO

  • private.js special treatments of box1 and box2
  • Implement publish in terms of create()
  • isPrivate(encryptionFormat) for querying canDecrypt
  • Move post Obv to compat
  • 🔧 feedFormat.getSequence() (needed for ebt)
  • 🕵️ Is buttwoo performant with ssb-ebt?
  • 🕵️ WeakMap optimizations in feed formats, from/to native msg
  • 🕵️ Figure out box2 key management/addition
  • 🕵️ jitdb operator for querying encrypted-box1 and encrypted-box2
  • 🔧 Restore reindex-encrypted.js tests
  • 🔧 ssb-db2 should wait for all encryption formats to load
  • 🔧 Fix benchmarks
  • 🔤 Better name for encryptionFormat.getRecipients() => getEncryptionKeys()
  • 🔤 canDecrypt.index naming?
  • ⚖️ ssb-keyring LGPL-3.0 release
  • 🚀 Publish ssb-feed-format
  • 🚀 Publish ssb-encryption-format
  • 🚀 Publish ssb-classic
  • 🚀 Breaking change ssb-bendy-butt
  • 🚀 Publish ssb-box1
  • 🔧 ssb-uri-spec buttwoo URI needs parent
  • 🔧 ssb-uri2 buttwoo URI needs parent
  • 🚀 Breaking change ssb-buttwoo
  • 📖 Document new APIs such as:
    • create()
    • onMsgAdded()
    • installFeedFormat()
    • installEncryptionFormat()
  • 📖 Document breaking changes such as:
    • encrypted.index/canDecrypt.index
    • setPost hack for partial replication in the browser
    • isPrivate() => isDecrypted() / isEncrypted()
  • 🔧 ssb-keyring changes to support derived-keys-only workflow
  • 🚀 Publish ssb-box2
  • MERGE AND PUBLISH VERSION 5.0.0
  • ssb-ebt Replace post with onMsgAdded
  • ssb-conn Replace post with onMsgAdded
  • ssb-meta-feeds Update ssb-bendy-butt to new feed format API
  • ssb-subset-ql Update with new db2 5.0.0

@staltz
Copy link
Member Author

staltz commented Jun 7, 2022

@arj03 All tests pass except the ones related to box2 simply because of key management that I haven't worked on yet. But box2 is supported and you can see that from test/create.js.

To use this in ssb-ebt for buttwoo, you'll have to do something like:

db => network

  • sbot.getAtSequenceNativeMsg(arr, (err, nativeMsg) => (no need to convert, just directly pass the nativeMsg to the network peer)

network => db

  • sbot.db.add(nativeMsg, {encoding: 'bipf', feedFormat: 'buttwoo'}, cb)
  • OR
  • sbot.db.add(nativeMsg, {encoding: 'bipf'}, cb) (might be just slightly slower than the above)

If you need some other data, let me know.

encryption-formats/box2.js Outdated Show resolved Hide resolved
feed-formats/buttwoo.js Outdated Show resolved Hide resolved
feed-formats/buttwoo.js Outdated Show resolved Hide resolved
feed-formats/buttwoo.js Outdated Show resolved Hide resolved
@staltz
Copy link
Member Author

staltz commented Jun 8, 2022

@arj03 Check out an internal refactor I figured out: 4a8eb83

@staltz
Copy link
Member Author

staltz commented Jun 8, 2022

Also, 547df25 may look interesting, it creates the files

  • decrypted.index (former canDecrypt.index)
  • encrypted-box1.index (new)
  • encrypted-box2.index (former encrypted.index)

If you add a new encryption format named foo, it will create the file encrypted-foo.index.

@mixmix
Copy link
Member

mixmix commented Jun 9, 2022

thoughts

  • what is getRecipients(opts) ?
  • what are recipients ? e.g. do you mean [%groupId.cloaked, @mixFeedId,]
  • are we using msgId or msgKey it low key annoys me that ssb flip-flops between these. Lets fix it!

I think this looks good but I find it quite hard to get a feel for it without visualising these in use? Maybe a conversation with that new pipeline diagram and pointing at the different parts and then naming the functions used would help contextualise.

Also I assume some of the function signatures are not really pinned, because I assume things like isAuthor(author) might be like isAuthor(nativeMsg, author), otherwise I've misunderstood

@mixmix
Copy link
Member

mixmix commented Jun 9, 2022

Nice work btw

🏇 🦎 🍈

core.js Outdated Show resolved Hide resolved
@staltz
Copy link
Member Author

staltz commented Jun 9, 2022

Hey @mixmix thanks for taking a look at this:

are we using msgId or msgKey it low key annoys me that ssb flip-flops between these. Lets fix it!

In this PR I'm using msgId, and I can try to be consistent with that elsewhere as well. msgKey is ambiguous because in so many places we use the word "keys" for "cryptographic key pair". In fact I think I bumped into one box2 related module that had msgKey as a variable name meaning "read key of a box2 boxed msg". So I'm trying to avoid that ambiguity.

The exception, which I would preserve, is a KVT, because it's not so easy to change that object shape.

what is getRecipients(opts) ?

opts are the inputs given to sbot.db.create and include content, recps (alternative to content.recps), keys, feedFormat, encoding, etc. getRecipients takes an array of strings like sigil IDs and SSB URIs and returns an array of buffers.

I'm open to renaming this function such that we always reserve the word "recipient" for sigils and SSB URIs. But I don't what other word to use. Suggestions welcome. I surely want to avoid the generic word "key".

compat/publish.js Outdated Show resolved Hide resolved
core.js Outdated Show resolved Hide resolved
core.js Outdated Show resolved Hide resolved
core.js Outdated Show resolved Hide resolved
core.js Outdated Show resolved Hide resolved
debounce-batch.js Outdated Show resolved Hide resolved
test/basic.js Outdated Show resolved Hide resolved
@arj03
Copy link
Member

arj03 commented Jun 9, 2022

This is quite a beast, but overall a really nice and needed refactor :-) I have gone through a first pass of the code now.

Some overall comments:

compat/publish.js

This is really nice!

db.js → core.js

Much better name, and really like how this is now just the core stuff.

addGroupKey

We should probably discuss how we define these group names/keys sooner rather than later.

core.js Show resolved Hide resolved
@staltz
Copy link
Member Author

staltz commented Jun 9, 2022

db.js → core.js

@arj03 My idea with the rename is that you could use ssb-db2 without anything extra, i.e. .use(require('ssb-db2/core')) such that you add your own feed formats and plugins, etc. .use(require('ssb-db2/db')) wouldn't be as nice. I considered 'bare.js' as a name for a while, but went with core which is more common for us. If you do .use(require('ssb-db2')) then you're using reasonable defaults and built-in stuff, although opinionated.

@arj03
Copy link
Member

arj03 commented Jun 9, 2022

Yeah that is really nice. At some we can move out migrate as well, that will make package.json a lot leaner :)

@staltz
Copy link
Member Author

staltz commented Jun 9, 2022

well, it's an index of things you can decrypt. They are specifically not decrypted on disc, so I kind of liked that name better than decrypted ;)

I'm a tiny bit conflicted, and could move back to canDecrypt. But I'm going to try a bit more on this one: the index tells the records that we have successfully decrypted before, and the word is symmetric with "encrypted" in encrypted-box1.index.

But I'll give this some more time, let's see.

@staltz
Copy link
Member Author

staltz commented Jun 9, 2022

Beautiful 🥳 f68c43b

@mixmix
Copy link
Member

mixmix commented Jun 14, 2022

@staltz re "getRecipients" do you mean it to be "getEnctptionKeys"?

You didn't say what the buffers were

@ssbc ssbc deleted a comment from github-actions bot Jun 18, 2022
@ssbc ssbc deleted a comment from github-actions bot Jun 18, 2022
@ssbc ssbc deleted a comment from github-actions bot Jun 18, 2022
@staltz staltz marked this pull request as ready for review July 22, 2022 08:07
@staltz
Copy link
Member Author

staltz commented Jul 22, 2022

I'm gonna fix the benchmarks before this is ready for review.

@staltz staltz marked this pull request as draft July 22, 2022 08:10
@arj03
Copy link
Member

arj03 commented Jul 22, 2022

Some review notes:

  • the opts that add, addOOO, addOOOBatch, addBatch, addTransaction now can take needs to be documented
  • Can we make add() and friends default to encoding = js like create does?

Do you remember why we pin ssb-friends to 5.1.0?

@arj03
Copy link
Member

arj03 commented Jul 22, 2022

Looking at create and the encryptionFormat option. If you don't specify encryptionFormat, the function will just try each format to find one that works. Maybe we should default to box instead?

@arj03
Copy link
Member

arj03 commented Jul 22, 2022

Overall I think this looks really good. I love how core.js is now really streamlined. It's hard to review such a big PR, not really a fault, just mentioning it to say that we might find bugs when we start upgrading other modules to work with this. I'm happy that we have a relatively good test-suite (and benchmarks :)). So in order to get move things along and really test I'll say it's fine to merge this in once the minor things I mentioned are fixed together with the benchmarks as you mentioned. It is after all a major version change.

@staltz
Copy link
Member Author

staltz commented Jul 22, 2022

Thanks @arj03 for taking a look, I'm working on those.

Two notes:

  • I disabled the benchmark that uses box2 groups simply because we still don't have the removeGroupKeys API in ssb-box2 :(
  • In index.js, I put both ssb-box and ssb-box2, do you think it that ssb-box2 should not included (and the user has to install the plugin themselves)? Previously, ssb-db2 didn't support box2 out of the box, you had to install ssb-db2-box2

@arj03
Copy link
Member

arj03 commented Jul 22, 2022

I think it's good to include both box1 and 2 in index.js. just that the default would be box1 of you don't specify in create. It is still the case that most clients can't decide box2.

@staltz
Copy link
Member Author

staltz commented Jul 22, 2022

the opts that add, addOOO, addOOOBatch, addBatch, addTransaction now can take needs to be documented

Yes, done

Can we make add() and friends default to encoding = js like create does?

Yes, done

Do you remember why we pin ssb-friends to 5.1.0?

Oh, that's just a devDep and it's only used for the benchmarks (and actually those specific benchmarks are skipped at the moment). It's pinned so that the benchmarks are always using the same ssb-friends and we would just be testing changes in ssb-db2 (not, for instance, performance improvements in ssb-friends).

@staltz
Copy link
Member Author

staltz commented Jul 22, 2022

Looking at create and the encryptionFormat option. If you don't specify encryptionFormat, the function will just try each format to find one that works. Maybe we should default to box instead?

This actually made a lot of sense, and I made the change. But then some tests broke. Specifically, if you use ssb.db.publish() with some private group as a recp, you don't have any way of specifying which encryptionFormat to use, so it'll use box and that's going to fail.

I'm inclined to use ssb.db.create() in those tests, but I wanted to let you know that if we do the change you suggested, you won't be able to use box2 with ssb.db.publish().

@ssbc ssbc deleted a comment from github-actions bot Jul 22, 2022
@ssbc ssbc deleted a comment from github-actions bot Jul 22, 2022
@arj03
Copy link
Member

arj03 commented Jul 22, 2022

I think that is for the best actually as publish is legacy anyway

@staltz
Copy link
Member Author

staltz commented Jul 22, 2022

Yeah, I'm a bit conflicted.

On one hand, removing the "try encryption formats until one of them works" is inconsistent with the default setting for encoding and feed format.

On the other hand, now with create() you have to always specify encryptionFormat: 'box2' even if it's implicit by the recp being a group ID.

And on the third (??) hand, since encryption is important and you don't want to screw it up, it's important to explicitly mention box2 not leaving it implicit.

@staltz staltz marked this pull request as ready for review July 22, 2022 15:39
@staltz
Copy link
Member Author

staltz commented Jul 22, 2022

Okay but I implemented it anyway. Ready for Final FINAL review.

@arj03
Copy link
Member

arj03 commented Jul 22, 2022

I think it will be for the best this way :) Nice to see the last 2 commits actually removed a bunch of lines

@arj03
Copy link
Member

arj03 commented Jul 22, 2022

Can you create an issue for the benchmark that was commented out?

@github-actions
Copy link

Benchmark results

Part Duration
Create 5000 new messages 464.53ms
Validate 5000 messages 582.99ms
Native to db format 5000 messages 142.79ms
Db to native format 5000 messages 107.26ms
Add 1000 elements 524.56ms
Add 1000 box1 msgs 1213.88ms
Unbox 1000 box1 msgs first run 211.74ms
Unbox 1000 box1 msgs second run 145.35ms
Add 1000 box1 msgs 1205.58ms
Query 1000 msgs first run 56.40ms
Query 1000 msgs second run 17.34ms
Add 1000 box2 msgs 1634.10ms
Unbox 1000 box2 msgs first run 250.68ms
Unbox 1000 box2 msgs second run 177.72ms
Migrate (+db1) 14861.15ms
Migrate (alone) 5139.81ms
Migrate (+db1 +db2) 11518.36ms
Migrate (+db2) 7670.72ms
Migrate continuation (+db2) 1300.02ms
Memory usage without indexes 775.54 MB = 37.50 MB + etc
Initial indexing 877.53ms
Initial indexing maxcpu=86 4695.34ms
Initial indexing compat 997.14ms
Two indexes updating concurrently 1389.13ms
Key one initial 61.60ms
Key two 1.94ms
Key one again 1.94ms
Reboot and key one again 64.26ms
Latest root posts 1018.32ms
Latest posts 17.41ms
Votes one initial 736.68ms
Votes again 1.52ms
HasRoot 482.18ms
HasRoot again 0.55ms
Author one posts 589.18ms
Author two posts 23.76ms
Dedicated author one posts 653.76ms
Dedicated author one posts again 0.39ms
DeleteFeed 3647.04ms
Maximum memory usage 1001.49 MB = 64.49 MB + etc
Indexes folder size 10.01mb

Copy link
Member

@arj03 arj03 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work on this. Super excited about the new ability to more easily add and structure feed formats 👍 ⭐ 🥇

@staltz staltz merged commit d17f90b into master Jul 22, 2022
@staltz staltz deleted the formats-split branch July 22, 2022 19:47
@staltz
Copy link
Member Author

staltz commented Jul 22, 2022

MERGEDDDDDDD

@github-actions
Copy link

Benchmark results

Part Duration
Create 5000 new messages 455.87ms
Validate 5000 messages 549.00ms
Native to db format 5000 messages 148.89ms
Db to native format 5000 messages 117.29ms
Add 1000 elements 514.47ms
Add 1000 box1 msgs 1193.43ms
Unbox 1000 box1 msgs first run 215.57ms
Unbox 1000 box1 msgs second run 149.08ms
Add 1000 box1 msgs 1143.64ms
Query 1000 msgs first run 47.76ms
Query 1000 msgs second run 31.01ms
Add 1000 box2 msgs 1572.71ms
Unbox 1000 box2 msgs first run 270.02ms
Unbox 1000 box2 msgs second run 189.57ms
Migrate (+db1) 14391.50ms
Migrate (alone) 4862.54ms
Migrate (+db1 +db2) 10710.75ms
Migrate (+db2) 7694.13ms
Migrate continuation (+db2) 1220.48ms
Memory usage without indexes 750.46 MB = 36.20 MB + etc
Initial indexing 836.29ms
Initial indexing maxcpu=86 4573.27ms
Initial indexing compat 957.05ms
Two indexes updating concurrently 1276.12ms
Key one initial 66.04ms
Key two 1.68ms
Key one again 1.35ms
Reboot and key one again 67.01ms
Latest root posts 1016.88ms
Latest posts 9.68ms
Votes one initial 705.68ms
Votes again 0.67ms
HasRoot 544.10ms
HasRoot again 0.57ms
Author one posts 641.06ms
Author two posts 36.11ms
Dedicated author one posts 661.68ms
Dedicated author one posts again 1.16ms
DeleteFeed 3698.71ms
Maximum memory usage 1019.21 MB = 55.20 MB + etc
Indexes folder size 10.01mb

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants