Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filtering events that are not replies (presence & absence filters) #523

Open
alexgleason opened this issue May 12, 2023 · 59 comments
Open

Comments

@alexgleason
Copy link
Member

I want to fetch only events that are not replies, ie they do not contain any "e" tags. Using the filter { "#e": [] }, the empty array is ignored, and I receive events from relays that contain "e" tags.

Passing an empty array to a tag filter is ambiguous, and I bet different relay software handles it differently. So I think it should be specified, and that an empty array should return events which specifically do not contain that tag.

I'm trying to adapt the following design to Nostr, where top-level posts are displayed in a separate tab than replies.

image

@alexgleason alexgleason changed the title Requesting events that are not replies (empty array filter is ambiguous) Filtering events that are not replies (empty array filter is ambiguous) May 12, 2023
@fiatjaf
Copy link
Member

fiatjaf commented May 12, 2023

@scsibug @hoytech @cameri @brugeman what do you think?

@cameri
Copy link
Member

cameri commented May 12, 2023

Nostream and nostr-rs-relay (IIRC) match no events when passing an empty array for a generic tag filter... so I don't see that behaviour as useful, so what @alexgleason proposes makes sense to me.

@scsibug
Copy link
Collaborator

scsibug commented May 13, 2023

I will do some testing tomorrow to see what the query and performance looks like.

Agree with @cameri, this is a useless query to do today (returns zero events); however I think that is the logical behavior (tag match requires at least one member of the provided list match, so a zero element list implies no match).

My first thought is we should leave it as is, and searching for the absence of a tag is not a path I want to go down. But I will test and think on it a bit more. I definitely understand the value of the proposal.

@alexgleason
Copy link
Member Author

I think you're right. My OP was based on the idea relays were ignoring the empty "#e" filter and returning all events, but that does not seem to be the case. I agree it's logical to not return any events.

I'll filter the events client-side for now.

@mikedilger
Copy link
Contributor

Filter sets can either be (1) missing, (2) empty or (3) have contents. Those can have three different meanings.

But I am wary of assigning special meaning to the empty array. It would also break gossip's current code quite badly.

(1) Missing from the filter means "do not filter on this field".
(2) Having contents means "filter by returning only events where the value matches one of these values"
(3) Specifying an empty set currently takes on the latter meaning, and works just like "WHERE tagvalue IN ()" giving zero events.

@cameri
Copy link
Member

cameri commented May 13, 2023

Filter sets can either be (1) missing, (2) empty or (3) have contents. Those can have three different meanings.

But I am wary of assigning special meaning to the empty array. It would also break gossip's current code quite badly.

(1) Missing from the filter means "do not filter on this field".

(2) Having contents means "filter by returning only events where the value matches one of these values"

(3) Specifying an empty set currently takes on the latter meaning, and works just like "WHERE tagvalue IN ()" giving zero events.

We could use null instead and it will be more meaningful.

"#e":null

@hoytech
Copy link
Contributor

hoytech commented May 15, 2023

I think NIP-01 is pretty clear on what should happen:

At least one of the array's values must match the relevant field in an event for the condition itself to be considered a match.

But yeah, an empty array is useless and a pretty common source of misunderstanding. I almost want to make it throw an error here "your query is broken".

Filtering for events that have 0 e tags might need a new index.

@fiatjaf
Copy link
Member

fiatjaf commented May 15, 2023

Does it damage anyone if we do the null thing? All relay implementations will return nothing when seeing such a query right now, I imagine. Maybe make this a separate NIP?

@scsibug
Copy link
Collaborator

scsibug commented May 15, 2023

I think it is a very different kind of query than we do in the other cases - so like @hoytech mentioned, it could necessitate a new index. I would need a subquery to deal with it, which isn't great for performance, but a proper benchmark is still on my short term todo list

I think it is simpler to not have different behavior for null, and would prefer to just throw an error. I would like to see more use cases if possible to justify this search-for-no-tags option.

@alexgleason
Copy link
Member Author

I would like to see more use cases

The media tab wants only events which have any media tag, but it doesn't care what its value is.

image

Let's pretend for a moment we implemented media attachments the right way on Nostr, by using an "m" tag. I want to filter out events which do not contain an "m" tag.

I have no idea what that filter looks like. #m: ["*"]? This is getting more complicated.

The more I think about it, I realize my own ask in this issue is flawed. But maybe there's some way to extend filters in a way that's more flexible and makes sense.

@arthurfranca
Copy link
Contributor

This could be achieved for kind 1 events if clients agree to using a depth tag.

In my opinion, an interesting feed would include not only root events (like this issue says, with no e tag) but also direct descendant replies (with only one e tag). In other words, depth 0 and depth 1.

@alexgleason
Copy link
Member Author

I'm going to close this. Thank you all for your feedback and ideas. There's maybe still a problem to be solved here, but it's not the one that's stated.

@offbyn
Copy link

offbyn commented Jun 18, 2023

For global feed it would make a lot of sense to have some way to query events without e tag. On some relays I need to fetch over 400 events before getting a non-reply event. This seems like huge waste of bandwidth.

Regardless of what the exact syntax would be (empty array, null, array with 1 null item, new tag), I strongly think this "Filtering events that are not replies" should somehow be possible.

@arthurfranca
Copy link
Contributor

I need to fetch over 400 events before getting a non-reply event. This seems like huge waste of network traffic.

Over 400! This issue should be reopened =0? @alexgleason

On this brief discussion I mentioned this possible solution:

What about pushing for this addition: ["REQ", <subscription_id>, { "kinds": [1], "count": { "#a": 0, "#e": 0 } }]

I suspect relay databases won't have a hard time storing and indexing one-letter tag occurence count.

@fiatjaf
Copy link
Member

fiatjaf commented Jun 18, 2023

Not a bad idea.

@fiatjaf fiatjaf reopened this Jun 18, 2023
@arthurfranca
Copy link
Contributor

arthurfranca commented Jun 18, 2023

Two different versions supporting OR query:

  1. ["REQ", <subscription_id>, { "count": { "#t": [1, 2, 3] }, "#t": ["bitcoin"] }] // example: notes with bitcoin topic but ignore notes with too many hashtags
    2) ["REQ", <subscription_id>, { "#t": [0, "bitcoin"] } }] // example: no topic or bitcoin topic

1) is an improvement but doesn't support for example 0 #t occurrences OR "bitcoin" because "count.#t" + '#t" are an AND clause (although it would be possible with 2 separate filters)
2) is better but would most likely break things so I shouldn't have mentioned <- wouldn't work for #t count AND #t value

@staab
Copy link
Member

staab commented Jun 18, 2023

Clould also do something like #t: "<3" . Wasn't some "runes" spec proposed?

@arthurfranca
Copy link
Contributor

@staab The runes thing was considered too complex. NIP-26 used a simpler version so no >= nor <= for example.
If not using integers, the above 2) option wouldn't be possible.

a) It would lead us to this: ["REQ", <subscription_id>, { "count": { "#t": ">-1&<4", "#e": "=1" } }]. (>-1 so to include 0)
b) Or even this ["REQ", <subscription_id>, { "count": ["#t>-1&#t<4&#e=1", "#o<2"] } }] allowing different keys at the same array index.

a) Is easier. #t count AND #e count must match
b) Is more complex. (#t count AND #e count match) OR (#o count match)

@arthurfranca
Copy link
Contributor

So the question is, which one is best: 1), 2), a) or b)?

@staab
Copy link
Member

staab commented Jun 19, 2023

I agree the runes were too complex, but a simpler version might work here. Of all of these options #1 is probably best, but count is a weird key to use, since search seems similar but does something completely different. I kind of want to go with #t: "<3". It's clear, backwards compatible (other than maybe causing relays to reject the filter or crash), and extensible.

@arthurfranca
Copy link
Contributor

I kind of want to go with #t: "<3"

Without the count key? But then you can't have #t: "<3" and #t: ["bitcoin", "cat"] together in the same filter.

@staab
Copy link
Member

staab commented Jun 19, 2023

[{"#t": ["bitcoin"]}, {"#t": "<3"}]?

@arthurfranca
Copy link
Contributor

Wait, I think NIP-01 says like ["REQ", <subscription_id>, {}, {}, ...] instead of ["REQ", <subscription_id>, [ {}, {}, ... ]] (not an array of {} filters?) and each {} is a separate filter each with its own "limit", "since" etc.

Sorry I'm confused. How would be your example, like ["REQ", <subscription_id>, ??], so that relay returns only events that are simultaneously of "bitcoin" t and with less than 3 t tags?

@staab
Copy link
Member

staab commented Jun 19, 2023

Sorry yeah, that's what I mean. And you're right, that's an OR, not an AND, it would be a firehose. So my example won't work.

@arthurfranca
Copy link
Contributor

Ok I removed the options that I think had problems (I can explain why if needed).
So which below option would you pick considering the worst relay database you can imagine should support it (would have to be able to store and index one-letter tag ocurrence counts and fulfill the query)? And how do you feel about adding it to NIP-01?
@alexgleason @fiatjaf @staab @scsibug @cameri @hoytech @mikedilger

Options to query by one-letter tag ocurrence count:
a) ["REQ", <sub_id>, { "count": { "#t": [1, 2, 3], "#e": [1] } }] // (array#t of ORs) AND (array#e of ORs)
b) ["REQ", <sub_id>, { "count": { "#t": ">-1&<4", "#e": "=1" } }] // (range/equal#t) AND (range/equal#e)
c) ["REQ", <sub_id>, { "count": ["#t>-1&#t<4&#e=1", "#o<2"] } }]] // (range/equal#t AND range/equal#e) OR (range/equal#o)
d) Other, tell us

@alexgleason
Copy link
Member Author

And how do you feel about adding it to NIP-01?

Nah I want to check if it's present in supported_nips before attempting the query. I like your idea, though.

@arthurfranca
Copy link
Contributor

@alexgleason a) b) or c) idea? 🙃️

@fiatjaf
Copy link
Member

fiatjaf commented Jul 23, 2023

These ideas are going way too far in treating relays like databases. These things you're coming up with are basically mongodb queries. Highly centralizing.

@fiatjaf
Copy link
Member

fiatjaf commented Jul 23, 2023

By the way, the q tag people are using for quoted events already solves this issue. If everybody starts using that then every kind1 with an e tag will already be a reply, so there is no need for any of this.

@fiatjaf fiatjaf closed this as completed Jul 23, 2023
@arthurfranca
Copy link
Contributor

I need to fetch over 400 events before getting a non-reply event. This seems like huge waste of network traffic.

@fiatjaf this issue is about fetching only root events.

@kennethstarkrl
Copy link

These ideas are going way too far in treating relays like databases. These things you're coming up with are basically mongodb queries. Highly centralizing.

How is querying relays centralizing? Clients can remove/hide anything anyways. Relays are just databases are they not?

By the way, the q tag people are using for quoted events already solves this issue. If everybody starts using that then every kind1 with an e tag will already be a reply, so there is no need for any of this.

The problem is not everyone does and not everyone will.

@mikedilger
Copy link
Contributor

mikedilger commented Jul 24, 2023

I didn't like any of the proposals in this discussion except the ones from way back. I don't like the term "count" (confusing) or "length". Honestly I think that's just too complex for relays and does too much. This issue was about getting events that are not replies, and I think a simple solution would be good enough for now without locking us out of some more advanced approach like a runes-based approach later on.

Here are the simple things you can put in your "#e" query:

  1. Don't have one. This means events are not filtered by 'e' tag.
  2. Have one and list specific values. This means only events with one of those specific values will be returned.
  3. Have one and specify an empty array. Currently this means that nothing comes back (no event has an e tag that is inside of the empty set)
  4. Have one and specify it as null.

I'm in favor of (4) meaning "please give me only events that do not have 'e' tags".

That's simple. It doesn't introduce a bunch of stuff that is hard to reason about and hard to code into relays. And it solves the problem this issue was opened to solve.

@fiatjaf fiatjaf reopened this Jul 24, 2023
@alexgleason
Copy link
Member Author

Related: I would like to get all events within a time range which contain ANY hashtag. I would use this to calculate trending hashtags.

It's not about "count", it's about the "presence" or "absence" of a specific tag.

  • I want to get all events with the ABSENCE of ANY "e" tag (top-level posts only)
  • I want to get all events with the PRESENCE of ANY "e" tag (replies only)
  • I want to get all events with the PRESENCE of ANY "t" tag (posts that contain a hashtag)

@alexgleason
Copy link
Member Author

A simple way to do this might be something like:

  • PRESENCE filter - { "*e": true }
  • ABSENCE filter - { "!e": true }

@kennethstarkrl
Copy link

Considering the events do have an empty array I think a filter with { "#e": [ ] } should return only events with an empty tag array would be the easiest.

@alexgleason
Copy link
Member Author

It seems like it's very hard for databases to do what I want (presence or absence of ANY tag), because it would require a boolean index of every possible tag on every event.

You could have a partial index for only presence, or only absence, but even then you'd have to have it for every possible tag.

The only way it seems doable is to do a full table scan of all events. Maybe some database genius here knows differently.

@kennethstarkrl
Copy link

It seems like it's very hard for databases to do what I want (presence or absence of ANY tag), because it would require a boolean index of every possible tag on every event.

Agree. That's the only reason I suggested expanding it to an exclude filter for all filterable items looking ahead to other potential benefits of an exclude filter, like the hashtag trending feature you're thinking about. It doesn't really seem like there's a good way to do something like that without it.

If maybe that's too complex I'd be fine with the easiest route for now, but an exclude filter would be a great future add on.

@alexgleason
Copy link
Member Author

#683 proposes a presence filter, with a syntax we haven't seen yet.

@alexgleason
Copy link
Member Author

One of primal.net's 20-or-so databases has an "is_reply" (0 or 1) column, which is pretty interesting.

image

@alexgleason alexgleason changed the title Filtering events that are not replies (empty array filter is ambiguous) Filtering events that are not replies (presence & absence filters) Aug 10, 2023
@alexgleason
Copy link
Member Author

After some more research I think it is very possible for databases to achieve presence and absence filters. It's harder for some databases than others, and only particular tags would be able to support this. I think it should not be expected to be a standard feature of Nostr, and only something that particular relays implement. But I do think a way to represent the intent is needed. So I think there should be a NIP for this.

How about this syntax:

  • events with one or more "e" tags: {"presence": ["e"]}
  • events without any "e" tags: {"absence": ["e"]}
  • events with an "e" tag and without any "p" tags: {"presence": ["e"], "absence": ["p"]}

Other notes:

  • {"presence": ["e"], "absence": ["e"]} (presence and absence of same tag) always yields an empty array []
  • ???

I will open an MR for a NIP at some point. There are bigger tofu to fry at the moment.

@arthurfranca
Copy link
Contributor

I think it should not be expected to be a standard feature of Nostr, and only something that particular relays implement.

@alexgleason Why do you think that?

#772 adds without_tag:e NIP-50 search extension. Problem is no relay is going to implement it other than the OP's own relay.

I could add ["REQ", <sub_id>, { "kinds": [1], ..., "nip17": { "isRoot": true } }] (and "isReply") to NIP-17 but not all relays would implement it.

For a client to ask just for root (tag abscence) or just for reply events (tag presence), it needs confidence that ALL/MOST relays implements the filter, specially considering most times the client shouldn't choose relays it prefers but instead pick strictly what is inside NIP-65 events or other relay hints. That's why it should be a NIP-01 addition or else no client is going to use it.

@alexgleason
Copy link
Member Author

I'm planning to use the syntax like { "presence": ["p"] } and { "kinds": [0], "absence": ["e"] } in my relay.

I would approve a NIP for presence and absence filters.

@mattn
Copy link
Member

mattn commented Oct 24, 2023

I propose to add new another filter #

{"#": ["g"]}

If # is given for filters, REQ returns a result containing all of the list of tag names have that follow.

motivation: current specification can not find events only that have g tag.

events that have geohash

{"#": ["g"]}

events that have e and p both

{"#": ["e", "p"]}

@fiatjaf
Copy link
Member

fiatjaf commented Oct 24, 2023

That complicates the queries on the relay side.

How about using a different kind for events that are always expected to have g, if, for example, you want to make a map or something that relies on g tags being present?

That's the purpose of kinds.

@alexgleason
Copy link
Member Author

The fact multiple devs have independently decided they need presence filters indicates a pain-point in the protocol. The workarounds are not great, or impossible, to do solely on the client.

@fabianfabian
Copy link

I prefer a different kind for replies, so for example kind 1 for roots and kind 11 for replies, but this would break everything so maybe a flag day 6 months from now would help.

@fiatjaf
Copy link
Member

fiatjaf commented Oct 24, 2023

@alexgleason the protocol has multiple pain-points that come from the fact that Nostr isn't a centralized MongoDB.

Our goal should be to work around them in a way that doesn't introduce code bloat, performance issues or complexity that results in centralization.

Also there are many more clients that work perfectly well and didn't need this.

@fiatjaf
Copy link
Member

fiatjaf commented Oct 24, 2023

@fabianfabian in retrospect I also think it would have been better to use a different kind for replies, but I wouldn't want to change that at this point.

However we could try to use different kinds for different use cases from now on. I'm interested in learning what is the concrete use case of @mattn and @alexgleason for wanting these features so we can come up with a solution together that can be standardized -- I'm pretty sure it can be done with either a new kind or a new normal tag, or both, without having to change the relay query language.

@jb55
Copy link
Contributor

jb55 commented Oct 25, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests