Filtering events that are not replies (presence & absence filters) #523

alexgleason · 2023-05-12T18:44:54Z

I want to fetch only events that are not replies, ie they do not contain any "e" tags. Using the filter { "#e": [] }, the empty array is ignored, and I receive events from relays that contain "e" tags.

Passing an empty array to a tag filter is ambiguous, and I bet different relay software handles it differently. So I think it should be specified, and that an empty array should return events which specifically do not contain that tag.

I'm trying to adapt the following design to Nostr, where top-level posts are displayed in a separate tab than replies.

The text was updated successfully, but these errors were encountered:

fiatjaf · 2023-05-12T20:02:47Z

@scsibug @hoytech @cameri @brugeman what do you think?

cameri · 2023-05-12T23:29:32Z

Nostream and nostr-rs-relay (IIRC) match no events when passing an empty array for a generic tag filter... so I don't see that behaviour as useful, so what @alexgleason proposes makes sense to me.

scsibug · 2023-05-13T00:53:15Z

I will do some testing tomorrow to see what the query and performance looks like.

Agree with @cameri, this is a useless query to do today (returns zero events); however I think that is the logical behavior (tag match requires at least one member of the provided list match, so a zero element list implies no match).

My first thought is we should leave it as is, and searching for the absence of a tag is not a path I want to go down. But I will test and think on it a bit more. I definitely understand the value of the proposal.

alexgleason · 2023-05-13T01:46:25Z

I think you're right. My OP was based on the idea relays were ignoring the empty "#e" filter and returning all events, but that does not seem to be the case. I agree it's logical to not return any events.

I'll filter the events client-side for now.

mikedilger · 2023-05-13T04:03:40Z

Filter sets can either be (1) missing, (2) empty or (3) have contents. Those can have three different meanings.

But I am wary of assigning special meaning to the empty array. It would also break gossip's current code quite badly.

(1) Missing from the filter means "do not filter on this field".
(2) Having contents means "filter by returning only events where the value matches one of these values"
(3) Specifying an empty set currently takes on the latter meaning, and works just like "WHERE tagvalue IN ()" giving zero events.

cameri · 2023-05-13T14:22:09Z

Filter sets can either be (1) missing, (2) empty or (3) have contents. Those can have three different meanings.

But I am wary of assigning special meaning to the empty array. It would also break gossip's current code quite badly.

(1) Missing from the filter means "do not filter on this field".

(2) Having contents means "filter by returning only events where the value matches one of these values"

(3) Specifying an empty set currently takes on the latter meaning, and works just like "WHERE tagvalue IN ()" giving zero events.

We could use null instead and it will be more meaningful.

"#e":null

hoytech · 2023-05-15T15:49:02Z

I think NIP-01 is pretty clear on what should happen:

At least one of the array's values must match the relevant field in an event for the condition itself to be considered a match.

But yeah, an empty array is useless and a pretty common source of misunderstanding. I almost want to make it throw an error here "your query is broken".

Filtering for events that have 0 e tags might need a new index.

fiatjaf · 2023-05-15T18:47:47Z

Does it damage anyone if we do the null thing? All relay implementations will return nothing when seeing such a query right now, I imagine. Maybe make this a separate NIP?

scsibug · 2023-05-15T20:44:34Z

I think it is a very different kind of query than we do in the other cases - so like @hoytech mentioned, it could necessitate a new index. I would need a subquery to deal with it, which isn't great for performance, but a proper benchmark is still on my short term todo list

I think it is simpler to not have different behavior for null, and would prefer to just throw an error. I would like to see more use cases if possible to justify this search-for-no-tags option.

alexgleason · 2023-05-15T20:55:16Z

I would like to see more use cases

The media tab wants only events which have any media tag, but it doesn't care what its value is.

Let's pretend for a moment we implemented media attachments the right way on Nostr, by using an "m" tag. I want to filter out events which do not contain an "m" tag.

I have no idea what that filter looks like. #m: ["*"]? This is getting more complicated.

The more I think about it, I realize my own ask in this issue is flawed. But maybe there's some way to extend filters in a way that's more flexible and makes sense.

arthurfranca · 2023-05-15T20:59:07Z

This could be achieved for kind 1 events if clients agree to using a depth tag.

In my opinion, an interesting feed would include not only root events (like this issue says, with no e tag) but also direct descendant replies (with only one e tag). In other words, depth 0 and depth 1.

alexgleason · 2023-05-17T03:17:35Z

I'm going to close this. Thank you all for your feedback and ideas. There's maybe still a problem to be solved here, but it's not the one that's stated.

offbyn · 2023-06-18T07:09:18Z

For global feed it would make a lot of sense to have some way to query events without e tag. On some relays I need to fetch over 400 events before getting a non-reply event. This seems like huge waste of bandwidth.

Regardless of what the exact syntax would be (empty array, null, array with 1 null item, new tag), I strongly think this "Filtering events that are not replies" should somehow be possible.

arthurfranca · 2023-06-18T10:53:17Z

I need to fetch over 400 events before getting a non-reply event. This seems like huge waste of network traffic.

Over 400! This issue should be reopened =0? @alexgleason

On this brief discussion I mentioned this possible solution:

What about pushing for this addition: ["REQ", <subscription_id>, { "kinds": [1], "count": { "#a": 0, "#e": 0 } }]

I suspect relay databases won't have a hard time storing and indexing one-letter tag occurence count.

fiatjaf · 2023-06-18T15:25:35Z

Not a bad idea.

arthurfranca · 2023-06-18T17:55:45Z

Two different versions supporting OR query:

["REQ", <subscription_id>, { "count": { "#t": [1, 2, 3] }, "#t": ["bitcoin"] }] // example: notes with bitcoin topic but ignore notes with too many hashtags
~~2) ["REQ", <subscription_id>, { "#t": [0, "bitcoin"] } }] // example: no topic or bitcoin topic~~

1) is an improvement but doesn't support for example 0 #t occurrences OR "bitcoin" because "count.#t" + '#t" are an AND clause (although it would be possible with 2 separate filters)
2) ~~is better but would most likely break things so I shouldn't have mentioned~~ <- wouldn't work for #t count AND #t value

staab · 2023-06-18T19:38:23Z

Clould also do something like #t: "<3" . Wasn't some "runes" spec proposed?

arthurfranca · 2023-06-19T15:15:24Z

@staab The runes thing was considered too complex. NIP-26 used a simpler version so no >= nor <= for example.
If not using integers, the above 2) option wouldn't be possible.

a) It would lead us to this: ["REQ", <subscription_id>, { "count": { "#t": ">-1&<4", "#e": "=1" } }]. (>-1 so to include 0)
b) Or even this ["REQ", <subscription_id>, { "count": ["#t>-1&#t<4&#e=1", "#o<2"] } }] allowing different keys at the same array index.

a) Is easier. #t count AND #e count must match
b) Is more complex. (#t count AND #e count match) OR (#o count match)

arthurfranca · 2023-06-19T15:18:27Z

So the question is, which one is best: 1), 2), a) or b)?

staab · 2023-06-19T15:35:19Z

I agree the runes were too complex, but a simpler version might work here. Of all of these options #1 is probably best, but count is a weird key to use, since search seems similar but does something completely different. I kind of want to go with #t: "<3". It's clear, backwards compatible (other than maybe causing relays to reject the filter or crash), and extensible.

arthurfranca · 2023-06-19T17:17:30Z

I kind of want to go with #t: "<3"

Without the count key? But then you can't have #t: "<3" and #t: ["bitcoin", "cat"] together in the same filter.

staab · 2023-06-19T17:55:33Z

[{"#t": ["bitcoin"]}, {"#t": "<3"}]?

arthurfranca · 2023-06-19T18:29:48Z

Wait, I think NIP-01 says like ["REQ", <subscription_id>, {}, {}, ...] instead of ["REQ", <subscription_id>, [ {}, {}, ... ]] (not an array of {} filters?) and each {} is a separate filter each with its own "limit", "since" etc.

Sorry I'm confused. How would be your example, like ["REQ", <subscription_id>, ??], so that relay returns only events that are simultaneously of "bitcoin" t and with less than 3 t tags?

staab · 2023-06-19T19:13:59Z

Sorry yeah, that's what I mean. And you're right, that's an OR, not an AND, it would be a firehose. So my example won't work.

arthurfranca · 2023-06-20T14:14:45Z

Ok I removed the options that I think had problems (I can explain why if needed).
So which below option would you pick considering the worst relay database you can imagine should support it (would have to be able to store and index one-letter tag ocurrence counts and fulfill the query)? And how do you feel about adding it to NIP-01?
@alexgleason @fiatjaf @staab @scsibug @cameri @hoytech @mikedilger

Options to query by one-letter tag ocurrence count:
a) ["REQ", <sub_id>, { "count": { "#t": [1, 2, 3], "#e": [1] } }] // (array#t of ORs) AND (array#e of ORs)
b) ["REQ", <sub_id>, { "count": { "#t": ">-1&<4", "#e": "=1" } }] // (range/equal#t) AND (range/equal#e)
c) ["REQ", <sub_id>, { "count": ["#t>-1&#t<4&#e=1", "#o<2"] } }]] // (range/equal#t AND range/equal#e) OR (range/equal#o)
d) Other, tell us

alexgleason · 2023-06-20T14:16:56Z

And how do you feel about adding it to NIP-01?

Nah I want to check if it's present in supported_nips before attempting the query. I like your idea, though.

arthurfranca · 2023-06-20T14:19:53Z

@alexgleason a) b) or c) idea? 🙃️

fiatjaf · 2023-07-23T13:27:57Z

These ideas are going way too far in treating relays like databases. These things you're coming up with are basically mongodb queries. Highly centralizing.

fiatjaf · 2023-07-23T13:31:50Z

By the way, the q tag people are using for quoted events already solves this issue. If everybody starts using that then every kind1 with an e tag will already be a reply, so there is no need for any of this.

arthurfranca · 2023-07-23T21:56:49Z

I need to fetch over 400 events before getting a non-reply event. This seems like huge waste of network traffic.

@fiatjaf this issue is about fetching only root events.

kennethstarkrl · 2023-07-23T23:22:31Z

These ideas are going way too far in treating relays like databases. These things you're coming up with are basically mongodb queries. Highly centralizing.

How is querying relays centralizing? Clients can remove/hide anything anyways. Relays are just databases are they not?

By the way, the q tag people are using for quoted events already solves this issue. If everybody starts using that then every kind1 with an e tag will already be a reply, so there is no need for any of this.

The problem is not everyone does and not everyone will.

mikedilger · 2023-07-24T00:08:08Z

I didn't like any of the proposals in this discussion except the ones from way back. I don't like the term "count" (confusing) or "length". Honestly I think that's just too complex for relays and does too much. This issue was about getting events that are not replies, and I think a simple solution would be good enough for now without locking us out of some more advanced approach like a runes-based approach later on.

Here are the simple things you can put in your "#e" query:

Don't have one. This means events are not filtered by 'e' tag.
Have one and list specific values. This means only events with one of those specific values will be returned.
Have one and specify an empty array. Currently this means that nothing comes back (no event has an e tag that is inside of the empty set)
Have one and specify it as null.

I'm in favor of (4) meaning "please give me only events that do not have 'e' tags".

That's simple. It doesn't introduce a bunch of stuff that is hard to reason about and hard to code into relays. And it solves the problem this issue was opened to solve.

alexgleason · 2023-07-24T19:14:42Z

Related: I would like to get all events within a time range which contain ANY hashtag. I would use this to calculate trending hashtags.

It's not about "count", it's about the "presence" or "absence" of a specific tag.

I want to get all events with the ABSENCE of ANY "e" tag (top-level posts only)
I want to get all events with the PRESENCE of ANY "e" tag (replies only)
I want to get all events with the PRESENCE of ANY "t" tag (posts that contain a hashtag)

alexgleason · 2023-07-24T19:20:07Z

A simple way to do this might be something like:

PRESENCE filter - { "*e": true }
ABSENCE filter - { "!e": true }

kennethstarkrl · 2023-07-24T22:50:36Z

Considering the events do have an empty array I think a filter with { "#e": [ ] } should return only events with an empty tag array would be the easiest.

alexgleason · 2023-07-25T17:26:31Z

It seems like it's very hard for databases to do what I want (presence or absence of ANY tag), because it would require a boolean index of every possible tag on every event.

You could have a partial index for only presence, or only absence, but even then you'd have to have it for every possible tag.

The only way it seems doable is to do a full table scan of all events. Maybe some database genius here knows differently.

kennethstarkrl · 2023-07-25T19:12:54Z

It seems like it's very hard for databases to do what I want (presence or absence of ANY tag), because it would require a boolean index of every possible tag on every event.

Agree. That's the only reason I suggested expanding it to an exclude filter for all filterable items looking ahead to other potential benefits of an exclude filter, like the hashtag trending feature you're thinking about. It doesn't really seem like there's a good way to do something like that without it.

If maybe that's too complex I'd be fine with the easiest route for now, but an exclude filter would be a great future add on.

alexgleason · 2023-07-26T12:38:06Z

#683 proposes a presence filter, with a syntax we haven't seen yet.

alexgleason · 2023-08-07T18:29:17Z

One of primal.net's 20-or-so databases has an "is_reply" (0 or 1) column, which is pretty interesting.

alexgleason · 2023-08-10T17:08:11Z

After some more research I think it is very possible for databases to achieve presence and absence filters. It's harder for some databases than others, and only particular tags would be able to support this. I think it should not be expected to be a standard feature of Nostr, and only something that particular relays implement. But I do think a way to represent the intent is needed. So I think there should be a NIP for this.

How about this syntax:

events with one or more "e" tags: {"presence": ["e"]}
events without any "e" tags: {"absence": ["e"]}
events with an "e" tag and without any "p" tags: {"presence": ["e"], "absence": ["p"]}

Other notes:

{"presence": ["e"], "absence": ["e"]} (presence and absence of same tag) always yields an empty array []
???

I will open an MR for a NIP at some point. There are bigger tofu to fry at the moment.

arthurfranca · 2023-09-11T22:09:20Z

I think it should not be expected to be a standard feature of Nostr, and only something that particular relays implement.

@alexgleason Why do you think that?

#772 adds without_tag:e NIP-50 search extension. Problem is no relay is going to implement it other than the OP's own relay.

I could add ["REQ", <sub_id>, { "kinds": [1], ..., "nip17": { "isRoot": true } }] (and "isReply") to NIP-17 but not all relays would implement it.

For a client to ask just for root (tag abscence) or just for reply events (tag presence), it needs confidence that ALL/MOST relays implements the filter, specially considering most times the client shouldn't choose relays it prefers but instead pick strictly what is inside NIP-65 events or other relay hints. That's why it should be a NIP-01 addition or else no client is going to use it.

alexgleason · 2023-09-11T22:19:04Z

I'm planning to use the syntax like { "presence": ["p"] } and { "kinds": [0], "absence": ["e"] } in my relay.

I would approve a NIP for presence and absence filters.

mattn · 2023-10-24T14:59:48Z

I propose to add new another filter #

{"#": ["g"]}

If # is given for filters, REQ returns a result containing all of the list of tag names have that follow.

motivation: current specification can not find events only that have g tag.

events that have geohash

{"#": ["g"]}

events that have e and p both

{"#": ["e", "p"]}

fiatjaf · 2023-10-24T18:14:01Z

That complicates the queries on the relay side.

How about using a different kind for events that are always expected to have g, if, for example, you want to make a map or something that relies on g tags being present?

That's the purpose of kinds.

alexgleason · 2023-10-24T19:29:26Z

The fact multiple devs have independently decided they need presence filters indicates a pain-point in the protocol. The workarounds are not great, or impossible, to do solely on the client.

fabianfabian · 2023-10-24T19:45:42Z

I prefer a different kind for replies, so for example kind 1 for roots and kind 11 for replies, but this would break everything so maybe a flag day 6 months from now would help.

fiatjaf · 2023-10-24T21:12:28Z

@alexgleason the protocol has multiple pain-points that come from the fact that Nostr isn't a centralized MongoDB.

Our goal should be to work around them in a way that doesn't introduce code bloat, performance issues or complexity that results in centralization.

Also there are many more clients that work perfectly well and didn't need this.

fiatjaf · 2023-10-24T21:14:48Z

@fabianfabian in retrospect I also think it would have been better to use a different kind for replies, but I wouldn't want to change that at this point.

However we could try to use different kinds for different use cases from now on. I'm interested in learning what is the concrete use case of @mattn and @alexgleason for wanting these features so we can come up with a solution together that can be standardized -- I'm pretty sure it can be done with either a new kind or a new normal tag, or both, without having to change the relay query language.

jb55 · 2023-10-25T01:26:25Z

On Tue, Oct 24, 2023 at 02:15:02PM -0700, fiatjaf_ wrote: However we could try to use different kinds for different use cases from now on. I'm interested in learning what is the concrete use case of @mattn and @alexgleason for wanting these features so we can come up with a solution together that can be standardized -- I'm pretty sure it can be done with either a new kind or a new normal tag, or both, without having to change the relay query language.

one use case I ran into the other day was returning all kind1 events with hashtags so that damus could build trending hashtag stats locally. Right now we're relying on a fixed set of hashtags or simply everything which is not ideal. It's a pretty niche usecase though.

alexgleason changed the title ~~Requesting events that are not replies (empty array filter is ambiguous)~~ Filtering events that are not replies (empty array filter is ambiguous) May 12, 2023

alexgleason closed this as completed May 17, 2023

fiatjaf reopened this Jun 18, 2023

fiatjaf closed this as completed Jul 23, 2023

fiatjaf reopened this Jul 24, 2023

alexgleason mentioned this issue Jul 26, 2023

Querying events by tags presence #683

Open

alexgleason changed the title ~~Filtering events that are not replies (empty array filter is ambiguous)~~ Filtering events that are not replies (presence & absence filters) Aug 10, 2023

alexgleason mentioned this issue Oct 24, 2023

find tag by name #835

Closed

alexgleason mentioned this issue Mar 2, 2024

Create a generic reply kind #1096

Open

mikedilger mentioned this issue Mar 3, 2024

Empty filter array handling mikedilger/chorus#15

Open

arthurfranca mentioned this issue May 19, 2024

NIP-22 - Comment #1233

Open

Filtering events that are not replies (presence & absence filters) #523

Filtering events that are not replies (presence & absence filters) #523

Comments

alexgleason commented May 12, 2023

fiatjaf commented May 12, 2023

cameri commented May 12, 2023

scsibug commented May 13, 2023

alexgleason commented May 13, 2023

mikedilger commented May 13, 2023

cameri commented May 13, 2023

hoytech commented May 15, 2023

fiatjaf commented May 15, 2023

scsibug commented May 15, 2023

alexgleason commented May 15, 2023

arthurfranca commented May 15, 2023

alexgleason commented May 17, 2023

offbyn commented Jun 18, 2023 • edited Loading

arthurfranca commented Jun 18, 2023

fiatjaf commented Jun 18, 2023

arthurfranca commented Jun 18, 2023 • edited Loading

staab commented Jun 18, 2023

arthurfranca commented Jun 19, 2023

arthurfranca commented Jun 19, 2023

staab commented Jun 19, 2023

arthurfranca commented Jun 19, 2023

staab commented Jun 19, 2023

arthurfranca commented Jun 19, 2023

staab commented Jun 19, 2023

arthurfranca commented Jun 20, 2023

alexgleason commented Jun 20, 2023

arthurfranca commented Jun 20, 2023

fiatjaf commented Jul 23, 2023

fiatjaf commented Jul 23, 2023

arthurfranca commented Jul 23, 2023

kennethstarkrl commented Jul 23, 2023

mikedilger commented Jul 24, 2023 • edited Loading

alexgleason commented Jul 24, 2023

alexgleason commented Jul 24, 2023

kennethstarkrl commented Jul 24, 2023

alexgleason commented Jul 25, 2023

kennethstarkrl commented Jul 25, 2023

alexgleason commented Jul 26, 2023

alexgleason commented Aug 7, 2023

alexgleason commented Aug 10, 2023

arthurfranca commented Sep 11, 2023

alexgleason commented Sep 11, 2023

mattn commented Oct 24, 2023 • edited Loading

fiatjaf commented Oct 24, 2023

alexgleason commented Oct 24, 2023

fabianfabian commented Oct 24, 2023

fiatjaf commented Oct 24, 2023

fiatjaf commented Oct 24, 2023

jb55 commented Oct 25, 2023 via email

offbyn commented Jun 18, 2023 •

edited

Loading

arthurfranca commented Jun 18, 2023 •

edited

Loading

mikedilger commented Jul 24, 2023 •

edited

Loading

mattn commented Oct 24, 2023 •

edited

Loading