Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

indexed-v1 feed format #71

Merged
merged 4 commits into from
Sep 16, 2022
Merged

indexed-v1 feed format #71

merged 4 commits into from
Sep 16, 2022

Conversation

staltz
Copy link
Member

@staltz staltz commented Sep 7, 2022

Context

I'm about to work on autosharding in ssb-meta-feeds and I'm looking at all the other modules that use ssb-meta-feeds and what kind of changes they will need, in ssb-replication-scheduler, ssb-ebt, ssb-index-feed-writer.

Problem

We could implement index feeds better in such a way that requires less APIs from ssb-meta-feeds and less "ebt format" methods.

Solution

This PR is a huge draft. It depends on changes in a couple other modules (ssb-bfe-spec, ssb-bfe, ssb-uri2, ssb-keys, ssb-index-feed-writer, ssb-meta-feeds) as well as updating the specs (the "index feeds" spec). The state of this PR now is just "I want feedback whether this is a good direction" and then if it looks good I'll work on all those other methods.

The tests don't pass here in CI because they would require all those other modules to be changed, BUT all ssb-ebt tests are passing for me locally because I just hacked my node_modules with changes.

Overview

💎 "EBT formats" are now an extension of "feed formats". Pass a feed format to ebtFormatFrom, and it'll massage it a bit and spit out an "EBT format". This vastly simplifies the implementation of formats/classic, formats/bendy-butt, and formats/buttwoo.

💎 Removed methods prepareForIsFeed, convertMsg, isReady.

💎 "indexed" EBT format will be based on a new feed format which is a bit of the classic format in disguise. The "nativeMsg" in this feed format is [msgVal, payload]. The main differences between this new design and the old index feeds is just the msgVal.author and msgId, which are both SSB URIs, e.g. ssb:feed/indexed-v1/... and ssb:message/indexed-v1/....

Changes needed in...

@arj03
Copy link
Member

arj03 commented Sep 7, 2022

This looks great. I really like how this simplifies a lot of things. Both in the formats folder but also in the decoupling of this module from both metafeeds and ssb-subset-ql (you should be able to remove that one from deps) concerns. These simplifications (as they most often are) should also make index messages lighter overall. I can help test some of that including classic and buttwoo once we can there.

From the description, it could be a bit easier to understand if you in this:

"indexed" EBT format will be based on a new feed format which is a bit of the classic format in disguise. The "nativeMsg" in this feed format is [msgVal, payload].

Added: msgVal is the index message and payload is the indexed message, or something like that.

@arj03
Copy link
Member

arj03 commented Sep 8, 2022

One more note is that I'm assuming the validate in index feeds format to be a lot simpler (than what we had in ebt here before) because the URI for index feeds to be index / main. So you have the main feed in there. Doesn't really validate exactly that you are matching the ql on the metafeed, but maybe that is okay?

@staltz
Copy link
Member Author

staltz commented Sep 8, 2022

Thanks for taking a look! Yeah, I also like it that there are less APIs and less code.

What are your thoughts on putting this new feed format in ssb-index-feed-writer? If someone wants to replicate index feeds without writing their own index feeds, they can just install the plugin and not start() it.

Added: msgVal is the index message and payload is the indexed message, or something like that.

Yeah, I would appreciate a better naming. Maybe [indexMsg, indexedMsg]. I just went with [msgVal, payload] because that's what we had before in the code.

URI for index feeds to be index / main

Actually the URI is ssb:feed/indexed-v1/____ without the main feed ID. I figured that it was not necessary, because to look up the indexed msg we don't need the main feed ID, we can just use indexMsg.value.content.indexed.key, i.e. we can just use ssb.db.get we don't need getAtSequence([author, sequence], cb).

@arj03
Copy link
Member

arj03 commented Sep 8, 2022

What are your thoughts on putting this new feed format in ssb-index-feed-writer? If someone wants to replicate index feeds without writing their own index feeds, they can just install the plugin and not start() it.

Yeah that should be fine I think. Probably better as it's one less module.

URI for index feeds to be index / main

Actually the URI is ssb:feed/indexed-v1/____ without the main feed ID. I figured that it was not necessary, because to look up the indexed msg we don't need the main feed ID, we can just use indexMsg.value.content.indexed.key, i.e. we can just use ssb.db.get we don't need getAtSequence([author, sequence], cb).

Hmm, but what if someone cheats and indexes something else, like some other feed? Before we used the ql to determine if you indexed the correct thing. This doesn't have to be malicious, could also just be a coding mistake ;-) The only place were you need to do this checking is in validate. I just wanted to avoid moving the same metafeed stuff we had here into validation. So this was why I said maybe it's okay to relax some of these requirements?

because to look up the indexed msg ... we can just use indexMsg.value.content.indexed.key

yes this is a nice improvement

@staltz
Copy link
Member Author

staltz commented Sep 8, 2022

Interesting, this is an aspect of index feeds that I haven't realized before.

Before, msgs in the index feed wouldn't actually guarantee that the QL "type" matched correctly, they only guaranteed that the QL "author" matched correctly, because we did author+sequence. This already meant that you're not sure what you're going to get, but you're sure it's at least from the correct author. I.e. it could be an about index but you end up getting votes.

Now, msgs in the index feed don't guarantee that even the author is correct.

This has two interesting conclusions:

  1. We could drop the indexMsg.value.content.indexed.sequence since it's not used at all. Then the indexMsg might get even smaller, lower overhead.
  2. We already had low guarantees that we would get the correct indexedMsg, and now we would drop that to zero guarantee. This sounds to me like we just need to trust the index feed, which means we trust the root metafeed, which means we need to trust the person using the main feed. Which to me sounds okay.

On the topic of "guaranteeing" that the indexed message is correct, it's kind of hard/impossible to know beforehand that you're going to get the correct thing. You would basically have to have the whole indexed message, check its contents, generate the msgId and compare with indexMsg.value.content.indexed.key.

Idea: what if we validate this when the message is replicated but before it's put in the database with addTransaction?

@staltz
Copy link
Member Author

staltz commented Sep 8, 2022

Idea: since we don't have guarantees, what if we use index feeds to "bookmark" other people's content? Is that what "claims" were?

We could use it in a number of interesting ways for partial replication:

  • Bookmark other people's content and inform your friends that these bookmarks are worth replicating. Essentially "seeding" them, or "pinning" them for posterity
  • If seeding/pinning is employed, we could delete everything that is older than some YYYY-MM-DD except those pinned messages. This would allow storage to shrink over time but still preserving some golden content
  • This could be useful for onboarding, where a new person replicating with you would get your content plus some select highlights from your friends

@arj03
Copy link
Member

arj03 commented Sep 8, 2022

We could drop the indexMsg.value.content.indexed.sequence since it's not used at all. Then the indexMsg might get even smaller, lower overhead.

👍

which means we need to trust the person using the main feed

Yeah, that was my thinking as well

what if we validate this when the message is replicated but before it's put in the database with addTransaction

Yeah, we need to validate a lot of these things anyway, like the key so it's not a it's a big overhead. So then the index message would only be the key.

Idea: since we don't have guarantees, what if we use index feeds to "bookmark" other people's content? Is that what "claims" were?

Exactly. The idea was that we know that certain people probably will never get index feeds, so we help the network by writing those indexes. And then you could use them if lets say they were written by someone you follow directly.

in a number of interesting ways

Yep, currated content :-)

@arj03
Copy link
Member

arj03 commented Sep 8, 2022

The claim idea also had to extra (optional) steps that someone else validated indexes and could post "yep this is legit". But in hinsight, maybe these things can quickly before very complicated so lets instead lean more towards our already existing hops trust metric.

@staltz
Copy link
Member Author

staltz commented Sep 8, 2022

Okay, so I'll move this EBT formats project forward and make the other changes in the other modules, it'll take some time though because I'm trying to work on fixing Force reindex in Manyverse.

Claims

Yeah we're on the same page. As you can guess I'm not going to work on this now, but thought it was convenient to share the idea now that the topic is fresh in our minds.

@staltz
Copy link
Member Author

staltz commented Sep 14, 2022

This PR is now passing all tests :)

@staltz staltz marked this pull request as ready for review September 16, 2022 10:09
@staltz
Copy link
Member Author

staltz commented Sep 16, 2022

@arj03 This PR is fully ready now!

@staltz staltz changed the title preliminary work on indexed-v1 feed format indexed-v1 feed format Sep 16, 2022
@staltz
Copy link
Member Author

staltz commented Sep 16, 2022

@arj03 Can has final review?

Copy link
Member

@arj03 arj03 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. Nice refactor and cleanup!

@staltz staltz merged commit 02931c0 into master Sep 16, 2022
@staltz staltz deleted the formats2 branch September 16, 2022 12:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants