Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add NIP-77 for expressing trust #1208

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Conversation

lez
Copy link

@lez lez commented Apr 30, 2024

This seems to be the smallest common denominator of what people think about web-of-trust or trusting people.

From here, we can branch out, use this common format in apps, clients, relays, algorithms, DVMs, et cetera.

Two specific branches already exist:

@mikedilger
Copy link
Contributor

IMHO trust is contextual and not transitive. In the context of "is this person worthy of listening to, or are they a spammer/asshole that nobody likes" then there is some transitivity.

But my prediction is that people won't bother to create this data, and so this will wither and die even if implemented. Yet I'm not against people giving it a try.

@lez
Copy link
Author

lez commented May 7, 2024

In hierarchies trust is transitive (e.g. the owner trusts the CEO, the CEO trusts the CTO, the CTO trusts the dev manager, and so on...) This is one way to create organizations in a decentralized and scalable way. Really, there are a lot of ways one can regard trust, and the aim with this NIP is to create a common representation / language of how we formalize them.

@wds4
Copy link

wds4 commented May 8, 2024

In the real world, sometimes trust is transitive and sometimes it's not, which is why NIP-77 provides a mechanism (actually two mechanisms) to indicate whether it is transitive or not. One is with the optional "transitive" tag. The other mechanism would be to specify transitivity as part of the context.

One of the things I really like about NIP-77 is that the way context is specified is left open ended. Context could be a human-readable string like "to curate content in Wikifreedia" or it could be an event id or naddr pointing to context as defined using whatever protocol or ontology you want to use. This will allow different devs to experiment with different ways to represent context. I have my own way of representing context which I'm working on now, but I invite other devs to play with different ideas.

@wds4
Copy link

wds4 commented May 8, 2024

Mike does have a point: people aren't going to issue trust attestations unless they are sufficiently motivated to do so. The question from a dev perspective is: what's the lowest hanging fruit, i.e. the easiest thing to build using NIP-77 that people will want to use? Wikifreedia is one possibility I'm thinking about. Katherine Maher's involvement with Wikipedia illustrates the need. To be honest, for MOST articles, censorship is not going to be at the forefront of most users' minds. But that's OK, bc for SOME articles, it will, and some is enough.

@arthurfranca
Copy link
Contributor

In #1220 I explore a simple way (T tag added to a parameterized replacable event with a d tag set to the reviewed pubkey) to discover good wiki article versions (written by well rated pubkeys).

Transitivity is possible in the case of wikis cause if me or my follow say pubkey A wrote a good article on "NIP"s the pubkey A is also expected to have some good knowledge on "Nostr" topic. So pubkey A should be able to rate people on both topics.

So if pubkey A positively rates pubkey B on "Nostr"...

...then 1) pubkey B may have written a good article on "Nostr"
or 2) pubkey B may have positively rated pubkey C on "Nostr" and so on.

It all starts with me or my follow clicking a thumbs up or down on a wiki article.

@vcavallo
Copy link

vcavallo commented Jun 8, 2024

IMHO trust is contextual and not transitive. In the context of "is this person worthy of listening to, or are they a spammer/asshole that nobody likes" then there is some transitivity.

But my prediction is that people won't bother to create this data, and so this will wither and die even if implemented. Yet I'm not against people giving it a try.

It's contextual AND transitive. And both of those (context and transitivity) should be subjective.

@vcavallo
Copy link

vcavallo commented Jun 8, 2024

@lez wondering what you make of this summary (a similar system built on Urbit):

https://gist.github.com/vcavallo/e008ed60968e9b5c08a9650c712f63bd

@vcavallo
Copy link

vcavallo commented Jun 8, 2024

Re: users' laziness in issuing trust. This should likely be an (opt-in) feature that is transparent to the user. Example: on a long-form writing platform, I "like" a piece that comes with some tags from the author. This action can be construed as signaling X amount of trust for the author along the contexts represented by the tags.
Developers could provide a "hard mode" that would allow a more proficient user to manually assign trust and/or use different context than what is in the author's tags. Or even to assign "negative" trust. (Imagine it's a post on how great McDonald's is and tagged with #recreation. An advanced user could give this author negative trust for #cusine).

But Ive digressed a bit. Application. Developers should provide sane defaults that intuit from users' natural behavior. And disclaim these features + provide overrides for those who wish to opt-out or take more control.

@staab
Copy link
Member

staab commented Jun 8, 2024

Getting normal users to create these is a different issue from defining them for people who do want to create them. I agree that there is plenty of signal in like/follow/reply etc that can be used to approximate "trust". This more formal proposal shouldn't replace that, just complement it. And I like the design, looking forward to seeing implementations.

@wds4
Copy link

wds4 commented Jun 8, 2024

This more formal proposal shouldn't replace that, just complement it.

I agree with this 100%. I used to favor explicit, contextual trust attestations over proxy indicators such as follows and likes, but I've since realized that we need to use both at the same time. What we need is a single composite contextual trust score that synthesizes highly abundant but low quality data (e.g. follows) with scarce but high quality data, e.g. attestations that follow the NIP-77 format. To do this right, we need a way to give less "weight" to a low quality data point (a follow) than a high quality data point. And the confidence score is an essential part of NIP-77, because it is a factor telling us how much weight to give to an individual data point. (With a second, independent factor being the most relevant trust score of the issuer.)

I propose weighted averages to be the cornerstone of how we synthesize data from multiple sources into a composite contextual trust score. What we do is we "interpret" low quality data, like follows, as if it were NIP-77 format, and we assign it an extremely low confidence because as we all know, follow != trust.

On the topic of lazy users: no one is going to issue any NIP-77 attestations at all unless there's a reason, which means we need to go ahead and build the weighted average method of calculating trust scores. Follows and mutes will yield a trust score that's kinda sorta a little bit useful, meaning that it's better than nothing, but if we throw in a small handful of NIP-77 attestations, maybe even just one, we will be able to catapult the composite score into the extremely useful range. Which means that users will be motivated to issue NIP-77 attestations, even if no one else is doing them yet. I'm in the process of building this at brainstorm.ninja (what I call the "Influence Score" is the composite trust score I'm talking about here) but if anyone else beats me to it, I won't mind!

@vitorpamplona
Copy link
Collaborator

I think we need to formalize specific situations so that processing them can be semantically meaningful.

To me, trust is always binary IN a context: "I trust my brother with my life I don't trust him to manage my money".

We can ask that question in the app and write two events with:

    ... 
    ["c", "life"]
    ["transitive", "false"],
    ["score", "100"],
    ["confidence", "100"],
    ... 
    ["c", "money"]
    ["transitive", "false"],
    ["score", "0"],
    ["confidence", "100"],

How useful are those events for anyone processing that? Probably very useful. But they need to know what was the question asked. Otherwise just money and life could mean so many different things that it becomes basically useless.

@wds4
Copy link

wds4 commented Jun 8, 2024

The more well-defined the context, the more useful (probably). But I like that NIP-77 specifies context as a string, like "money" or "life", with no additional formal structure (for now), which will allow devs to experiment with different ways to represent context.

My own impulse would be to represent context using two fields: an action and a category. The action might be "to give me advice on" and the category might be "money." Or action: "to rate and review", category "movies." Or action: "to serve as product manager", category: "social media websites." Or action: "all actions," category: "all categories." But is everyone going to agree with this semantic structure? I suspect other people will have other ideas.

In NIP-54, wiki categories are simple strings, selected by the author. Perhaps a similarly simple approach is the best place to start. If we want to add more fields we can do that later, but I suspect it's going to be very difficult to get everyone to agree on any particular semantic structure without playing around with it first.

My long term view is that my WoT will manage the semantic structure for me, but that's waaaay beyond this NIP.

@vitorpamplona
Copy link
Collaborator

I like the action + context.

But I think we need to be very prescriptive in the beginning to get traction. Let's make sure it can be flexible, but also get one or two useful use cases out of the gate to bootstrap this idea.

@staab
Copy link
Member

staab commented Jun 9, 2024

Informal use, refined over time through experimentation is the way to go IMO. Describing trust is as complex as language itself, whose goal is to communicate meaning. Meaning doesn't fit into rigid hierarchies, attempts at the semantic web notwithstanding. Unspecified c tag meaning also leaves room for using any existing ontology though, if that's preferred. A lot of this is relevant to what I was trying to accomplish with NIP 32.

@vcavallo
Copy link

vcavallo commented Jun 10, 2024

For https://github.com/lez/nips/blob/nip76/76.md - do you have a proposal for how to achieve the depth/transitive filtering? I recommend modeling the graph as a maximum flow problem (or "capacity constrained flow").

This maybe raises a few tangential questions, though: is negative trust allowed? is 0 "I don't know..." or "explicitly anti-trust" ? signalling positive trust is one thing, but negative trust / dis-recommending is another.


Edit: Duh, sorry. You're proposing using Grapevine. I'm going to leave my above comment because I still hold out hope for a max-flow algorithm being useful somewhere in WoT.

@wds4
Copy link

wds4 commented Jun 10, 2024

My view is that in the long term, we are going to have a myriad of algos for a variety of applications. Just bc I propose doing things one way doesn't mean I see no value or think there is no place for doing them a different way.

In what I'm building: I don't (currently) use negative trust, because my goal is that the Grapevine tells me how much attention to pay to a profile (or a piece of content) in some given context. This is the function of the Influence Score. A score of zero means I pay zero attention to that user in that context, and that's as low as it gets. You can see an implementation of the Influence Score at brainstorm.ninja where I calculate it based on follows and mutes and use it to stratify wiki content. My next step will be to calculate contextual influence scores based on follows + mutes + contextual attestations. I realized recently that merging highly abundant but low quality trust data (follows and mutes) with scarce but high quality trust data (NIP-77 formatted) is the way to go, so hopefully I will be able to demonstrate what I'm talking about.

Of course, there's a difference between completely ignoring a user because I have zero information on that profile, versus ignoring because I have information that the user is a bot or scammer or whatever. From that perspective, yes there is a role for a "negative" trust score or trust rating. So the question is not whether to allow negative scores, but what tools to build and most importantly, in what order to build them. My own roadmap, which I revise constantly, involves introducing the Concept Graph in stages; at some point, your Grapevine will have the ability to use the Concept Graph to invent new trust attestations (including those with negative scores) and new algorithms to process trust attestations into composite scores.

ADDENDUM: the Influence Score at brainstorm.ninja is my way of implementing transitive trust. The "depth" is managed by the attenuation factor, a number between 0 and 1 which decreases the confidence of each attestation and therefore decreases amount of attention we pay to users with each additional hop away, so a lower AF effectively means less "depth". This is a setting that will eventually be adjustable by the user, although for now I'm trying to keep things as simple as possible, so the AF is kept under the hood and is fixed at the default value of 0.8.

@wds4
Copy link

wds4 commented Jun 10, 2024

On the question of whether 0 means "explicitly anti trust" versus "I don't know": in my view, your question underscores the fact that trust requires two numerical fields, not just one. One for the score and one for the confidence. Because otherwise it is ambiguous whether 0 means anti trust versus no information. As users, we are already accustomed to two fields: when you go to Amazon, for example, you see an average score (the rating) and the number of raters (which is an imperfect and flawed proxy for confidence, with more raters = more confidence; obviously flawed bc more raters might simply mean more money to pay for ratings farms!).

@vcavallo
Copy link

Thanks, makes sense @wds4!

And I agree that eventually including negative trust will be important. A few months back there was a debate about "mute lists" and most of the proposed solutions were too centralized and would unnecessarily narrow the notes a user might see without their control. NIP-77-backed "mute lists" is the way to go, in my opinion:
Various users would have "lists" of "these people suck (+ context)" by virtue of their NIP-77 negatives. I could choose to Trust these users on that context, effectively delegating my mute list to them. (that context may even just be something like "is a spammer". or "posts too much porn")

In this way, "community curators" could naturally arise without needlessly tying users' hands or blinding their eyes without their consent. If I'm feeling like I don't like the work a given "curator's" mute list is working, I simply stop trusting them on that and the bots and spammers (and a few high-quality accounts the curator muted but I would prefer to see!) re-appear. Sovereignty-preserving without loss of functionality. This is the way, naturally.

With some imagination and entrepreneurship, "curators" could charge for their work...

@wds4
Copy link

wds4 commented Jun 10, 2024

I definitely envision that one day curators will charge for their work. Perhaps we will all become curators. Imagine that instead of leaving a review for a coffee shop for free at Yelp, you create an attestation and charge a few sats, which I am happy to pay bc it's only a few sats and I know it's worth it bc my WoT tells me you're trustworthy for that context (no need to be a coffee afficianado; I mostly just need to know you're a real person and not a bot!). Or perhaps you gain a reputation for having good judgement and ability to discern users who are real people but have a propensity to veer into troll-like behavior, trigger flame wars, etc. No need to be outright blocked, but maybe such a user deserves to be weighted a little bit less. Or the opposite: profiles who you may want to weigh a little bit more. It would take work + good judgement to make these calls and if you're good, perhaps you could turn it into a side gig!

@wds4
Copy link

wds4 commented Jun 10, 2024

On the topic of mute lists: When I calculate influence scores, I do what I call "interpretation," which means I take some piece of data, like a follow or a mute, and I treat it AS IF it were a NIP-77 formatted attestation. A follow is "interpreted" as a score of 100, and a mute is interpreted as a score of 0. And I give them a confidence of only 5% (a parameter than eventually the user will be able to adjust, if desired) because proxy trust data is low quality and ought to be given less weight than an explicit trust attestation.

If you think that some profile is worthy of higher-than-average attention in some context, for example: a wiki on Economics written by Lyn Alden is worth 10x more attention than some randomly selected non-bot author, and you are highly confident of that fact, then you can issue an attestation in the context of Economics, rating: 1000 (10-fold higher than 100), confidence: 80%. Or something like that. So the average user will have an Influence Score in the context of Economics of 1 (or a little less) based on follows, but Lynn will be one of only a handful of users with a score way higher than 1, bc she's been flagged by one or a few users as having special talent, worthy of your attention.

@vcavallo
Copy link

vcavallo commented Jun 10, 2024

Bit of a tangent, but incidentally: I believe one of the key ways to convince the uninitiated of the virtues of this new world is to show how it doesn't deprive anyone of the (vampiric) features they are accustomed to and (think they) want. Example: The NY Times sets up a Nostr account with NIP-77 involved. They put their editors to work attesting trust and building up an enormous trust graph. They charge $[NY Times subscription amount] for "subscribers" to have access to this trust graph. The users/subscribers install some client (maybe one created by NTY) that by default assigns 100% trust and confidence in the NYT graph.

Bam, they have the same managed worldview as yesterday, but operating on a new substrate. (In fact, the nefarious activities of a propagandistic publisher are enhanced in this world. Not only can they determine which words you read - and therefore your opinions - but they can also determine which Wiki articles you see, which social network users you see, which Yelp reviews are "right", etc.) If freedom tech is working properly, it both supercharges freedom but also supercharges manipulation - BUT the manipulation is always opt-in and exitable.

Imagine being one of these "default client" users and then one day turning off the big filter and finding all sorts of new restaurants you'd never heard of, new comedians who actually make you laugh, high-quality news articles that make you question everything you had read previously. That's the world I want to live in... And for those who panic and turn the big filter back on, so be it. At least they know what they're doing and I have the tools to avoid them.

@wds4
Copy link

wds4 commented Jun 10, 2024

Regarding the NYT: you are 100% on target. This is exactly how we are going to attract the masses. Because WoT is the only way to prevent the manipulation that you describe. Yes, the NYT will pay for 100k bots to upvote it, but if those bots are not verified by my social graph, their influence score will be zero, which means their impact will be nullified. They may as well not exist. And sure, the NYT can instruct their editors and their community to upvote the NYT. But the Influence Score requires a weight for each attestation, and the weight is determined by the contextual influence score OF MY CHOOSING, and if I choose a score that proscriptively eliminates people who work for the NYT (sets that particular score to zero), then their little manipulative game is once again nullified.

@wds4
Copy link

wds4 commented Jun 10, 2024

btw, one of the reasons I keep bringing up the Influence Score in this thread is that it follows a similar format as a NIP-77 trust attestation in the sense that it is composed of three required fields: a context which is a string, a confidence which is an integer between 0 and 100, and a score which is also an integer. The only difference is that I have a slightly different interpretation of the score: I interpret a score of 0 as worthy of zero attention, and a score of 100 as worthy of the same amount of attention as you would give to any other randomly selected "real person" profile. Which opens the door for scores ABOVE 100, interpretation being that you think this profile is worthy of MORE attention than some randomly selected real person profile.

@franzaps
Copy link
Contributor

Just read the NIP again. A few questions:

The special value * represents general trust in the person.

How about, just like score, leaving it undefined (meaning not sure) means general trust?

A Trust event is transitive by default (e.g. Alice trusts Bob, Bob trusts Charlie, therefore Alice trusts Charlie). Non-transitive Trust events MUST contain the ["transitive", "false"] tag.

Why is this? That's not at all how trust works in real life (friends of friends of friends means nothing to me) so should be non-transitive by default. That said probably nothing should be written about transitive trust, and be left to reads/computation of NIP-77 events.

A Trust event can be revoked by its author. Revoked Trust events MUST contain the ["revoked", "true"] tag.

This is a PRE so why not simple remove it instead of marking it as revoked?

@wds4
Copy link

wds4 commented Jun 12, 2024

This is a PRE so why not simple remove it instead of marking it as revoked?

Are you referring to NIP-9 for event deletion? I know PREs used to be described by NIP-33 but got moved to NIP-1. Is there a separate protocol to remove a PRE as opposed to replacing it?

@franzaps
Copy link
Contributor

This is a PRE so why not simple remove it instead of marking it as revoked?

Are you referring to NIP-9 for event deletion? I know PREs used to be described by NIP-33 but got moved to NIP-1. Is there a separate protocol to remove a PRE as opposed to replacing it?

What I mean is updating the 30077 event with no c, score or confidence tags, effectively replacing whatever was attested earlier.

@wds4
Copy link

wds4 commented Jun 12, 2024

What I mean is updating the 30077 event with no c, score or confidence tags, effectively replacing whatever was attested earlier.

That makes sense. In practice I imagine I will write code to interpret a 30077 event with no c/score/confidence tags as a deleted event that supersedes and revokes any older events with the same pubkey + d combination, which means that the revoked tag would in effect be optional, which raises the question of whether it would make sense to employ the revoked tag in the first place, given that it would be superfluous. If anyone uses it then we'd all have to write extra code to see if the revoked tag is present and true, so perhaps makes more sense not to use it.

@wds4
Copy link

wds4 commented Jun 12, 2024

Just read the NIP again. A few questions:

The special value * represents general trust in the person.

How about, just like score, leaving it undefined (meaning not sure) means general trust?

I imagine I will probably write code to interpret '*' or a blank or absent context in the same way: they each refer to generic trust, the "superset" of all contexts. Perhaps both of these could be suggested in the NIP but not required, and we see how various devs decide to use it.

To me, the heart and soul of this NIP is simply that we should have 3 fields: context, score, and confidence. I'm OK with being open to the idea that different devs may find reasons to use / interpret / format these 3 fields in ways we might not necessarily foresee yet.

@franzaps
Copy link
Contributor

The user would be signing, the app developer would be providing a default to sign.

Yeah, I think this is going to get messy. Different apps suggesting very different numbers. Also not a big fan of making users sign stuff they did not come up with. Hope I'm wrong. Not saying we should remove any fields.

@franzaps
Copy link
Contributor

franzaps commented Jun 13, 2024

Tangential topic, so maybe to be discussed elsewhere, but: how does it change things if user-subjective ratios/floats are used rather than "globally-inflatable" integers?

If I assign 1 to A, 1 to B and 5 to C Or I assign 10 to A, 10 to B and 50 to C Or I assign 0.1 to A, 0.1 to B and 0.5 to C

These feel like they should be equivalent (barring any errors I made in arithmetic. you get the point). I don't care if someone else uses 0.0-1.0 or 1-1000 or 1-1,000,000. it's the distribution of weight that matters (and the distribution of weights used by those in my transitive hops out, too).

Another reason why it will get messy. What if you are reading events signed from different apps using different scales, or done at different times, and trying to make transitive calculations?

@vcavallo
Copy link

The user would be signing, the app developer would be providing a default to sign.

Yeah, I think this is going to get messy. Different apps suggesting very different numbers. Also not a big fan of making users sign stuff they did not come up with. Hope I'm wrong. Not saying we should remove any fields.

I'm not the author of this NIP, but my two cents is app developers should make clear what is being signed, along with explanations about how these value are used and why the defaults are suggested. Ideally in a "progressive authority" manner where additional manual controls can be opted-into as the user begins to understand the system that they're being helpfully onboarded onto.

"here's what we're doing by default and here's why. here's how you can choose for yourself [along with some suggestions]. here's how you can take full control (or "i'm lazy, you pick")"

@vcavallo
Copy link

Tangential topic, so maybe to be discussed elsewhere, but: how does it change things if user-subjective ratios/floats are used rather than "globally-inflatable" integers?
If I assign 1 to A, 1 to B and 5 to C Or I assign 10 to A, 10 to B and 50 to C Or I assign 0.1 to A, 0.1 to B and 0.5 to C
These feel like they should be equivalent (barring any errors I made in arithmetic. you get the point). I don't care if someone else uses 0.0-1.0 or 1-1000 or 1-1,000,000. it's the distribution of weight that matters (and the distribution of weights used by those in my transitive hops out, too).

Another reason why it will get messy. What if you are reading events signed from different apps using different scales, or done at different times, and trying to make transitive calculations?

This is why a given user's weight distribution is most consistent.
It doesn't matter what numbers they used, all the algorithm needs to care about is the distribution/weights across their whole set.

I'm coming to this concept from an ecosystem where each user is running their own server, so a lot of these problems are made easier. It's more difficult when these data is all "public" or stored outside the user's context.

@wds4
Copy link

wds4 commented Jun 13, 2024

I suspect some users/devs will want to user numbers for the score for the sake of being as precise as possible; however, other users/devs may shy away from numbers and may prefer something like emojis (thumbs, stars, rockets, etc) or simple strings (follow, mute, block, etc). I can imagine lots of users being comfortable with thumbs up, thumbs down, 1-5 stars, etc. In the case of something like an emoji for the score, i think it is the job of the consumer to "interpret" the number. In brainstorm.ninja, I interpret a mute as a score of 0 and a follow as a score of 100. My plan is to interpret a NIP-77 with either a "follow" or a thumbs up emoji for the score field and "Economics" as context as a score of 100 in the context of Economics. Having said that, it is worth clarifying that i would be VERY CAUTIOUS about an app where the user clicks a thumbs up emoji and having the developer translate this into a score of 80 or 100 or whatever in the event that is signed by the user. If the user wants to be ambiguous, then respect that. If I want to interpret your ambiguous emoji as a score of 80 or whatever, that's my prerogative.

@franzaps
Copy link
Contributor

@vcavallo agree about apps progressively offering finer grained controls, but in general, and related to @wds4 latest comment I wonder if having relative controls in the UI (increase trust, decrease trust) rather than absolute (input number) is a better way to go. Doesn't that map human behavior better? We tend to assign more, or less, or completely break, trust with people's actions across time.

@wds4
Copy link

wds4 commented Jun 13, 2024

Perhaps the NIP should make a SUGGESTION that a score of 100 should be defined as the "reference" score.

@franzaps
Copy link
Contributor

Perhaps the NIP should make a SUGGESTION that a score of 100 should be defined as the "reference" score.

What do you mean by "reference" here?

Is 100 the maximum rating? What happens when I trust Alice 100 in Economics, but then Lyn Alden comes along which I trust way more than Alice? How does that reflect in the scores? Do I need to lower the score for Alice, and potentially many others?

@vcavallo
Copy link

Perhaps the NIP should make a SUGGESTION that a score of 100 should be defined as the "reference" score.

What do you mean by "reference" here?

Is 100 the maximum rating? What happens when I trust Alice 100 in Economics, but then Lyn Alden comes along which I trust way more than Alice? How does that reflect in the scores? Do I need to lower the score for Alice, and potentially many others?

...This is why relative weights are better. "Under the hood, I gave 100 to Alice, not knowing Lyn was out there. Now, I want to trust Lyn 10x more than Alice (whether or not I know I actually used the integer 100 earlier)." I should just be able to adjust my relative weights without having to constantly juggle numbers in my head or UIs.

@franzaps
Copy link
Contributor

...This is why relative weights are better. "Under the hood, I gave 100 to Alice, not knowing Lyn was out there. Now, I want to trust Lyn 10x more than Alice (whether or not I know I actually used the integer 100 earlier)." I should just be able to adjust my relative weights without having to constantly juggle numbers in my head or UIs.

Yes, good point. The downside is that for relative weights to work we'd need to grab all of the user's events to normalize. And what happens when an app decides to assign trust in the 100000-1000000 range? Interesting problem.

@wds4
Copy link

wds4 commented Jun 13, 2024

Perhaps the NIP should make a SUGGESTION that a score of 100 should be defined as the "reference" score.

What do you mean by "reference" here?

Is 100 the maximum rating? What happens when I trust Alice 100 in Economics, but then Lyn Alden comes along which I trust way more than Alice? How does that reflect in the scores? Do I need to lower the score for Alice, and potentially many others?

If I want to put an integer in the score tag, and I want to say "I trust Alice twice as much as the average user" then the score would be 200 and the reference would be "the average user." If I want to say I trust Alice twice as much as Bob, then Bob is the reference. Or I may want to attest that I trust Alice twice as much as me, in which case I am the reference in my own attestation.

According to this usage, there is no upper limit on the score. So if I attest I trust Alice some crazy number like one zillion times more than the average user, it is up to the consumer what to do with that number. One option would be to create a step function, where any score above N is given a weight of zero (i.e. it is ignored). Or if you want to be fancy, something like an S-curve centered around some adjustable parameter as cutoff. This is in fact what I did in a really old implementation. But it's really important to keep this NIP as simple as possible, which is why I'd suggest that the NIP makes the SUGGESTION that a score of 100 shall be interpreted as the normalized score of a reference user, where the user and dev gets to decide what the reference is. We could make it 1 (and allow decimals) or 1000 or whatever, but we just ought to suggest something so everyone gets on the same page.

@wds4
Copy link

wds4 commented Jun 13, 2024

@vcavallo agree about apps progressively offering finer grained controls, but in general, and related to @wds4 latest comment I wonder if having relative controls in the UI (increase trust, decrease trust) rather than absolute (input number) is a better way to go. Doesn't that map human behavior better? We tend to assign more, or less, or completely break, trust with people's actions across time.

There may be instances where I might want to attest "I trust Alice more than Bob" without putting a number to it. Indeed, I think there will be instances where ordering a series of options by preference (I like option A better than D, D better than C, etc) will be particularly useful. I was thinking about that for Nostrapedia: I may want to arrange all existing articles for a given topic in order of preference without the need to assign numbers to any of them.

So I'm thinking that the ability to order a set of items by preference from best to worst could be a quite useful thing, but perhaps it is outside the scope of this NIP.

ADDENDUM: I seem to recall running across an algo that processes order-of-preference data (as discussed above) in a very elegant manner, one that allows us to assign distinct weights to different authors. I particularly recall that this method is a good one if you expect new options to be rolling in on a continuous basis, as is the case with NIP-54 wikis. But yeah, outside the scope of this NIP I think.

@wds4
Copy link

wds4 commented Jun 13, 2024

Agree with @wds4

Besides deletions, c could also be updated/refined, correct? But d should be a stable identifier.

This is an interesting question. Do we allow the c tag and the d tag to be unmatched? I think the original intent is that the d tag is uniquely defined by the context (a string) and the ratee's pubkey. This would eliminate the possibility of a certain class of ambiguities where I rate Alice twice in the same context and it's unclear whether I'm intending to ignore the older rating or not.

Perhaps the practice should be that if you want to edit the context field, you delete the old attestation and simply create a new one with the new context.

@franzaps
Copy link
Contributor

There may be instances where I might want to attest "I trust Alice more than Bob" without putting a number to it. Indeed, I think there will be instances where ordering a series of options by preference (I like option A better than D, D better than C, etc) will be particularly useful. I was thinking about that for Nostrapedia: I may want to arrange all existing articles for a given topic in order of preference without the need to assign numbers to any of them.

I really like the idea of order/relativity in a trust context, much more than absolute numbers. I also want to keep this NIP simple. Maybe we can brainstorm and get it right.

If we're going to keep the absolute scale, how about recommending clients to issue values for these attestations in +1 and -1 increments (maybe overridable by user)? In the Alice/Lyn example, where you started to trust Lyn more: let's say every week you +1 Lyn, and -1 Alice. Over time the order will change.

But that looks very similar to social reactions (likes, zaps), which makes me think this could have more weight during reads.

@wds4
Copy link

wds4 commented Jun 13, 2024

Agree with @wds4 with the exception of:

How about this: none of the 4 fields are required, with the caveat that if context, score, AND confidence are ALL THREE absent, then we interpret this as a "deleted" event.

This debate is happening elsewhere: #1263 #1293 and the consensus seems to be leaning towards just using a tag deletions. This is no less effective than blanking an event, because delete events are delivered to the same places the new version of the event would be. Clients should be implementing deletes, because relays holding a replicated event may not receive the delete. This also prevents the destruction of information — there is value in being able to see what was on an event before it was deleted.

I haven't digested those two threads completely yet but I'm inclined to think we should be as consistent as possible with general nostr practice, which would mean using a tag deletions as @staab suggests.

@vcavallo
Copy link

...This is why relative weights are better. "Under the hood, I gave 100 to Alice, not knowing Lyn was out there. Now, I want to trust Lyn 10x more than Alice (whether or not I know I actually used the integer 100 earlier)." I should just be able to adjust my relative weights without having to constantly juggle numbers in my head or UIs.

Yes, good point. The downside is that for relative weights to work we'd need to grab all of the user's events to normalize. And what happens when an app decides to assign trust in the 100000-1000000 range? Interesting problem.

This is why this problem is far more tractable when each user has an always on personal server that is constantly crunching changes for you personally and serving up the relevant (public) details for outside services to consume via API.

Purely-functional VMs are very nicely suited to this type of problem.

I'm a strong believer in Nostr, but having worked in various decentralized ecosystems it has become exceedingly clear that there is some data and compute that is better provided by a personal VM than a "public" relay doing the work for many people.

@wds4
Copy link

wds4 commented Jun 13, 2024

...This is why relative weights are better. "Under the hood, I gave 100 to Alice, not knowing Lyn was out there. Now, I want to trust Lyn 10x more than Alice (whether or not I know I actually used the integer 100 earlier)." I should just be able to adjust my relative weights without having to constantly juggle numbers in my head or UIs.

Yes, good point. The downside is that for relative weights to work we'd need to grab all of the user's events to normalize. And what happens when an app decides to assign trust in the 100000-1000000 range? Interesting problem.

This is why this problem is far more tractable when each user has an always on personal server that is constantly crunching changes for you personally and serving up the relevant (public) details for outside services to consume via API.

Purely-functional VMs are very nicely suited to this type of problem.

I'm a strong believer in Nostr, but having worked in various decentralized ecosystems it has become exceedingly clear that there is some data and compute that is better provided by a personal VM than a "public" relay doing the work for many people.

I agree with you 💯pct about the importance of some really good options to keep track of personalized data and compute in the setting of WoT. Personal relay, VM; I can envision multiple solutions available to the consumer.

But I do not think that relative weights requires me to access all of a user's events to normalize. For one thing, Alice might not want to grant me access to all of her events, and we must not devise any solution that presumes that she will. But in general, what does relative weights mean? It means relative to something. To what? There is no single answer to that question, which is why I think we should avoid trying to force an answer for the sake of this NIP.

If we're going to suggest using a number for the score, then we must give at least a suggestion of what the number means. And the only suggestion that is tractable in my mind is that scores are always relative. So we ought to say that. But we ought to avoid asserting relative to what.

My suggestion is that this NIP should require the score field to be a string or number; integers are suggested and preferred, but strings (follow, super follow, mute, block) and emojis (thumbs, rocket, smiley, stars) are also acceptable. And if it's a number, we suggest to normalize it so that a score of 100 means equal to some appropriately chosen reference, with the reference defined at the discretion of the dev and/or user.

@vcavallo
Copy link

vcavallo commented Jun 13, 2024

I don't want to beat a dead horse here, and I mostly agree with everything you said, but humor me for a moment...

What if all scores were 0.0 - 1.0. we'll figure out what 0.0 "means" later, but 1.0 would mean "I trust this person maximally given the context". if it's "fix my car" it means "i'd hand them the tools and walk away and 100% expect it to be done correctly". if it's "recommend restaurant" it means "any restaurant they recommend, i'm certain I'm going to love".

in this context, 0.5 might mean "if they recommend restaurants, I'm going to like about half of their recommendations".

0.0 - 1.0 allows infinite degrees of granularity between points, which allows for doing order/relativity with infinite control. and it also means there is a known upper and lower bound. Which feels right. Once I trust someone maximally, what would it even mean to trust them more than that?

For a real-world representation of this, "power of attorney relative to X contract" is an explicit 1.0 trust on that attorney's representation of you on that contract. everyone else on earth has 0.0 for you on that task.

Futhermore, since this is basically a percentage, it allows for automations like "I've visited 10 of this person's posts on X topic and signaled trust on 3 of them. so assign 0.33. if I go back and signal trust on 2 more, increase to 0.5." these mean, respectively, "I agree with half of what this person says about X that i've seen" and "I agree with everything this person has said about X that i've seen".
That feels very useful.

@franzaps
Copy link
Contributor

0.0 - 1.0 allows infinite degrees of granularity between points, which allows for doing order/relativity with infinite control

I was thinking this too, decimal points allow for easy ordering.

Again, the problem with bounds (specifically an upper bound) is potentially having to update many events. If Alice, Bob and Charlie had 1.0 in Economics, but over time I shifted my thinking and prefer to give Lyn and Preston more weight - now I need to go to Alice, Bob and Charlie's events and lower their score.

@franzaps
Copy link
Contributor

franzaps commented Jun 13, 2024

Using a highly recommended reference/baseline (of 1.0, 100, whatever) and highly recommended increase/decrease steps (-1, +1, or -.1, +.1) we can make this NIP work.

By the way @wds4 , I forgot to comment on non-numeric input for scores, that's going to make it very difficult to interpret. If I put a 📣 on Lyn's score, what does that mean? For that kind of input I have all other nostr events

Another question: should a follow and a mute have a recommended value in this scheme? Maybe the defaults, so a NIP-77 event would be a sort of refinement over the basic follow/mute we presently have? (sorry it was probably mentioned somewhere I can't find)

@vcavallo
Copy link

0.0 - 1.0 allows infinite degrees of granularity between points, which allows for doing order/relativity with infinite control

I was thinking this too, decimal points allow for easy ordering.

Again, the problem with bounds (specifically an upper bound) is potentially having to update many events. If Alice, Bob and Charlie had 1.0 in Economics, but over time I shifted my thinking and prefer to give Lyn and Preston more weight - now I need to go to Alice, Bob and Charlie's events and lower their score.

I think this is the domain of quality of life apps (and this is a good thing). With these primitives, app developers can create "balancer" apps where you can readjust your weights with some nice UX. You visit your "Economics" context and drag-and-drop users to fit your current worldview, or drag some sliders, maybe opting to check a "distribute difference in slider between x y and z other users". There is a whole world of new UX research to be done into how to model this properly given human psychology. And app developers should compete on this.

Fundamentally, I see that problem as a UI/UX one that doesn't need to become relevant in the choice of whether or not to choose bounds.

@vcavallo
Copy link

In my opinion, a "follow" should mean absolutely nothing for NIP-77. I might follow accounts I truly detest and anti-trust to the max. But I do what to see what they're saying.

This is where negative trust becomes interesting... imagine a filter like "show me all the posts from people who I think have the worst taste in restaurants". or "show me what all the people I view as unhinged conspiracy theorists are saying". That's something you can't do on other networks or without this (negative) trust primitive.

@franzaps
Copy link
Contributor

You visit your "Economics" context and drag-and-drop users to fit your current worldview, or drag some sliders, maybe opting to check a "distribute difference in slider between x y and z other users".

I just can't see that working.

In my opinion, a "follow" should mean absolutely nothing for NIP-77. I might follow accounts I truly detest and anti-trust to the max. But I do what to see what they're saying.

You're right, this happens but is not the norm. A follow should have a default positive weight and if you detest someone you'll go out of your way to down-trust them.

This is where negative trust becomes interesting... imagine a filter like "show me all the posts from people who I think have the worst taste in restaurants". or "show me what all the people I view as unhinged conspiracy theorists are saying". That's something you can't do on other networks or without this (negative) trust primitive.

That's cool. We were talking earlier on showing recommendations of people to follow from outside your 2-hop web-of-follows. But way more interesting things can be done with these primitives.

@vcavallo
Copy link

You visit your "Economics" context and drag-and-drop users to fit your current worldview, or drag some sliders, maybe opting to check a "distribute difference in slider between x y and z other users".

I just can't see that working.

Ah that's too bad. I can and it's really exciting.

As an silly example: a dumbed-down user-friendly app has some predefined zones with labels like "seems like an expert", "makes good points", "sometimes useful", "total dingus" and nostr user Avatar bubbles that you drag around into the different zones.

Another dumb one: a gamified UI that has "head to head battles" to help you sort people. "Who knows more about economics?? [Lyn Alden] vs [vinney cavallo]" with a tournament-style knockout round mechanic. and at the end you have a sorted hierarchy of your economics people and you had fun while doing it.

@franzaps
Copy link
Contributor

Another dumb one: a gamified UI that has "head to head battles" to help you sort people. "Who knows more about economics?? [Lyn Alden] vs [vinney cavallo]" with a tournament-style knockout round mechanic. and at the end you have a sorted hierarchy of your economics people and you had fun while doing it.

NGL that sounds cool! But now we need devs (which we have a shortage of in nostr) to be incentivized to code and maintain such games. And I don't see people reminding themselves of "play balancing my web of trust game on Thursday", much less paying for it. I sound like a total party pooper, sorry man 😄

If we are going to gamify anything, I see it more in-client (that is in the context of where people already assign trust) with up/down icons representing +1/-1 on the trust score. Maybe holding the button you can apply a +5/-5 (to change relative score order faster) and get some cool animation with it.

@guaka
Copy link

guaka commented Jun 20, 2024

Context: We built Trustroots. It's a real life social network of 100k+ members, a bit like couchsurfing(tm) before 2010. We're moving this onto nostr, https://github.com/Trustroots/nostroots

Many of our users have existing trust relations, I want these to become available for users to take to other real life applications (I imagine sharing rides, food, parties...) thru NIP-77.

Currently we have boolean values for these:

  • "Met in person"
  • "I hosted them"
  • "They hosted me"
  • "Apart from your personal experience, would you recommend others to stay with them?" (can be left empty)

And an optional free text field:

  • "Would you like to describe something about your experience with them?"

(see also Trustroots/nostroots#20 (comment) for my thoughts on how I would do it if it was reimplemented now)

Currently our users can "share [their] experience" once, which means they fill in this form, and then the receiving user has 2 weeks or so to also leave their experience. (So it's blinded, like on airbnb, which doesn't seem very feasible to implement on nostr today.)

It can make sense to instead allow this to develop over time, which can also add additional safety. E.g. soon after users meet they can send a "met in person" event, which isn't even a trust statement. If they have been hosting or hosted one night they can add "hosted them", "hosted me" which again isn't a trust statement. Is there another nostr event type that makes more sense for this?

After the hosting period ended it's good if guest and host add trust statements, which can be as simple as "happy to meet them again", "would host them again", "would be their guest again". Good if they can refer to the prior "met in person", "hosted them" and "stayed with them" events. Ideally somehow all grouped together.

100 means full trust and 0 means that the person is not trusted in the specified context at all.

If it's prescribed like this it wouldn't really work well for our existing data, e.g. 0 for "met in person" doesn't mean they're not trusted.

It's great that there's "confidence" as well as "score".

The special value * represents general trust in the person.

I think this is too vague for us to use in any way.

transitive

Transitivity is very important for our use case, and for the real life apps that I imagine can be kickstarted with the trust we have in Trustroots now. Of course it's up to the app, or even more up to the users to decide how much transitivity they want.

(Only learned about this NIP yesterday.)

@staab
Copy link
Member

staab commented Jun 20, 2024

It sounds like those booleans are additional granularity on the confidence axis. Confidence as a number is great when you're talking about subjective assessment, but if there are concrete things that you can point to that back that confidence, that's more useful, but harder to systematize. Maybe a client could help translate those things into a confidence score, and track them some other way (maybe using badges?)

@wds4
Copy link

wds4 commented Jun 20, 2024

I've been thinking a lot lately about the importance of drawing a sharp distinction between the issuance of an attestation and the interpretation of that attestation for the purpose of calculating trust scores.

I agree with @staab that those booleans seem like they fit the confidence axis. But perhaps the interpretation of those boolean values into a confidence should be done by the consumer, not the author of the attestation. In other words: the role of the attestation author is to provide information. The role of the attestation consumer is to decide what (if anything) to do with it.

So a question for @guaka: have you thought about how you're going to synthesize the various trust relations at Trustroots into one or more trust scores? If you're not sure, I might suggest to take a look at my method of calculating Influence Scores, currently implemented at brainstorm.ninja for stratification of wiki data. In a nutshell, you "interpret" data from a variety of sources and formats into something resembling the NIP-77 format, and then you calculate weighted averages for each context of interest. Ultimately, the composite contextual trust score is also put into NIP-77 format. It takes a bit of number crunching but I think it's worth it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants