Querying events by tags presence #683

fernandolguevara · 2023-07-26T12:26:59Z

An attempt to stop using different zoom levels on location specific use cases

View it

alexgleason · 2023-07-26T12:36:31Z

This is #523

It's a presence filter specifically.

100.md

alexgleason · 2023-07-26T13:40:15Z

I think this would solve many problems and we should have it. But there are challenges in relays actually implementing it: #523 (comment)

We would have to define which specific tags get the index. We can't do all of them without a full table scan.

arthurfranca · 2023-07-26T17:17:22Z

[...] there are challenges in relays actually implementing it [...]

@alexgleason Relays already have to deal with indexing array of values like array of e tag values.
An array of tag names that are present in the event would be similar, so no problem.
If the relay DB isn't able to do "not contains" type of query, it can create an array of absent tag names (A-Za-z minus the present ones).

alexgleason · 2023-07-26T17:24:52Z

You would have to basically double the amount of space used by tag indexes. Maybe more since you can't use a partial index if you want both presence and absence filters. It's probably worth doing it at least for "e" tags since we have a strong use-case in #523 that is a very common one. Other tags would need strong arguments in favor of doing it, I think.

Still, I think that shouldn't necessarily block this NIP from proceeding. We can nail down the API.

arthurfranca · 2023-07-26T17:39:14Z

If this NIP-100 gets merged but implementing it isn't a requirement, in practice it may be like if it never existed because no client would be able to rely on it cause not all relays would implement it. Probably when fetching the feed, clients would continue requesting all notes be them root or not, because potentially many relays are involved.

@fiatjaf any chance you mark this NIP, if merged, and others like NIP-12 and NIP-20, as "Required" on README.md NIP list? Meaning it would be as if they were inside NIP-01, like minimum NIPs that must be supported to consider the relay/client nostr-compatible. Also, as an exception, put them near the top just below NIP-01 despite the numbers.

fiatjaf · 2023-07-27T11:50:57Z

Me stating a thing is required doesn't cause everybody to immediately implement it.

Also, as @alexgleason said in the other issue, it's very costly for relays to implement this. I think we absolutely do not need it.

fiatjaf · 2023-07-27T11:53:09Z

I think that if your client depends on this you're implementing something wrong. If you're following a person and want to read what they write you want all their kind1s. Regardless of whether you'll display everything in the UI or in different views according to the tags, you still should download everything, store locally and display when appropriate.

arthurfranca · 2023-07-27T15:36:57Z

Also, as @alexgleason said in the other issue, it's very costly for relays to implement this

Disagree or else I wouldn't be supporting this but don't know what DB you guys are considering.

[...] you still should download everything, store locally and display when appropriate.

Although some clients are doing this I don't consider it efficient. But if doing everything client-side is the recommended way I think its ok to close this PR and the other issue.

vitorpamplona · 2023-07-27T16:11:40Z

I like this.

I think that if your client depends on this you're implementing something wrong.

Not exactly. There are use cases where this is a real need. If you want Global without replies, for instance, it doesn't make sense to download everything and then filter replies out.

If you are doing a map of Nostr posts with a GeoHash, it doesn't make sense to download everything and then discard everything that doesn't include a g tag (which is almost everything right now - huge waste).

fiatjaf · 2023-07-27T21:21:33Z

If you want Global without replies, for instance, it doesn't make sense to download everything and then filter replies out.

I can understand this, but does anyone really want this? Sounds like some skewed preferences here. "Global without replies". Global is not a thing, and replies are not different from normal notes, technically. Should they have a different kind?

If you are doing a map of Nostr posts with a GeoHash, it doesn't make sense to download everything and then discard everything that doesn't include a g tag (which is almost everything right now - huge waste).

This I don't think is a valid use case (I mean, whatever, it is valid, but what I'm saying is that it doesn't fit Nostr, not all things fit Nostr if we want Nostr to remain simple). Either you are already fetching posts from people that you want, storing these locally somehow, and then you are displaying those that have g tags in a map -- or you should be using the map for a more restrict set of events, that aren't kind 1. Like ads for a local marketplace or whatnot, in this case you expect them all to have g tags.

Maybe we should be making more kinds for different types of events and relying less on tags for indexing. Since tags are so flexible it's easy to think they should be used for everything, but if we start doing that and relying on that this will not end well.

fiatjaf · 2023-07-27T21:22:40Z

I don't consider it efficient

what is not efficient? To store events that you want locally? You think it's more efficient to load them from relays over and over multiple times every day?

vitorpamplona · 2024-02-29T14:01:13Z

Can we come back to this, please? Has any relay tried to implement it?

@mikedilger since you have just coded a relay, what do you think about this filter?

arthurfranca · 2024-02-29T15:16:10Z

what is not efficient? To store events that you want locally? You think it's more efficient to load them from relays over and over multiple times every day?

"Feed" events, for example. This event set gets stale so often that when the user re-opens the app they aren't interested anymore on the previously received events. That's why i believe these events should live in memory instead of in a persistent local db.

My unreleased client's "feed" is made of root events (no e tags) and top level replies (with one e tag) so this PR wouldn't help unless i could filter by number of e tags.

edit removed ugly syntax examples. Would be great to have it but i know it won't happen =]~

alexgleason · 2024-02-29T17:02:18Z

It seems like the solution to everything not in NIP-01 is DVMs.

alexgleason · 2024-02-29T17:02:36Z

I regret posting that.

vitorpamplona · 2024-02-29T17:13:11Z

It seems like the solution to everything not in NIP-01 is DVMs.

Maybe we should indeed align our expectations to what the core protocol should solve for and what is expected of a Layer 2 design (DVMs) to do it. If we want to keep the relay dev simple, we should "outsource" everything to layer 2.

Or maybe we just create a new network of relays working on the same events but with more interesting filtering options. Clients can then choose which network they want/need to integrate with.

alexgleason · 2024-02-29T17:25:18Z

I don't think filters are getting any more changes at this point.

I need more than just presence/absence tags, anyway. I need joins.

vitorpamplona · 2024-02-29T18:53:58Z

@Semisol had some interest in building a new type of relay with a new filtering language. I am not sure if he ended up doing anything. But we could just do a relay with regular read-only SQL as an entry point.

vitorpamplona · 2024-02-29T18:57:50Z

Or maybe this idea of filters and subscriptions themselves should be turned into replaceable events. I can imagine a client signing an event with a Nostr filter (or an SQL query) instead of using the REQ call. The relay would simply attach whatever comes in that event as a subscription to that connection. The d tag becomes the subscription id, then.

fernandolguevara · 2024-03-01T14:10:17Z

@alexgleason

I don't think filters are getting any more changes at this point.

I need more than just presence/absence tags, anyway. I need joins.

can you elaborate on joins? what's you use case?

mikedilger · 2024-03-03T21:58:33Z

Ok I just went back and read #523 and this issue again, and I have a few more things to say:

First, I have not encountered the need to do presence or absence (or tag count!) queries. But I don't think it is unreasonable.

Second, this idea that relays will need to do complex indexing is wrong. Relays should not index for these kinds of queries at all. Neither should they do hard scans of every event. Relays should (1) require such filters also contain other fields that already narrow down the event set to something reasonable, or else reject the filter as a scraper, and (2) load all the events ignoring the new presence/absence filter specifications, and (3) post-filter all matching events with these new fields. Sure, you loaded more events than you needed and then stripped them back... but that is far less resource consuming than sending them over the network and having the client strip them back. Basically it just pushes that filter operation to the relay to save on network bandwidth.

That being said, if a crafty relay developer wants to index these to boast about hyper-fast performance, that's fine, but we don't need to design for that case. And seriously, if someone sends a "give me all events that don't have a geo tag" were you really going to send them 99.9% of all the events in your database? I don't think so.

I don't like modifying the "#e" to be a non-array (e.g. having a 'null' option). I prefer this PR's method of adding a new field.

Clients SHOULD check NIP-11 before using the new field. But also the rule for relays ought to be "if you see a filter field you do not recognize, that is an error". I don't know if that was codified elsewhere but I think it should be.

I don't follow the need to count for the number of "e" tags, especially if we are moving to "q" tags.

I think this PR is pretty close as is. I'll add it to my relay if there is the momentum to do it (not too easy for me as I have meticulous memory layouts and detailed parsing to update).

EDIT: I don't think this NIP becomes required or part of the core of nostr. It will be okay if most relays don't implement it. Client will have to deal with errors from relays filters that don't accept the new field. BUT we probably do have to push through a small required change which is to make those errors machine-readable (new prefix) and specify that relays must reject filters with fields they do not recognize (I didn't check the current NIPs maybe that is already there).

arthurfranca · 2024-03-04T14:12:00Z

it just pushes that filter operation to the relay to save on network bandwidth

Interesting take. A caveat is it may mess with "limit" filter, like if a client asks for limit:1, relay fetches 1 record from DB, then filter that 1 out after running the presence/absence/tagcount in memory and returns 0 records, when there could have a matching item on DB. Could be a good trade-off.

the rule for relays ought to be "if you see a filter field you do not recognize, that is an error"

That's the part i disagree. If incompatible relays simply ignore the unknown filter field and apply just the ones it understands, client can still apply the extra filter client-side. Client would still have the option to use the strategy of checking NIP-11 to skip incompatible relays if it prefers not to re-filter client-side.

mikedilger · 2024-03-04T21:07:32Z

it just pushes that filter operation to the relay to save on network bandwidth

Interesting take. A caveat is it may mess with "limit" filter, like if a client asks for limit:1, relay fetches 1 record from DB, then filter that 1 out after running the presence/absence/tagcount in memory and returns 0 records, when there could have a matching item on DB. Could be a good trade-off.

Oh right.

the rule for relays ought to be "if you see a filter field you do not recognize, that is an error"

That's the part i disagree. If incompatible relays simply ignore the unknown filter field and apply just the ones it understands, client can still apply the extra filter client-side. Client would still have the option to use the strategy of checking NIP-11 to skip incompatible relays if it prefers not to re-filter client-side.

The problem I'm worried about is if a client specifies a new filter field the relay doesn't understand in order to prune the search to something reasonable, but the relay skips that new filter and dumps massive events on the client.

alexgleason · 2024-03-06T20:03:34Z

Instead of adding new filter properties, we can include an extension to NIP-50's search property, like:

{ "search": "has:#e" }

This fixes everything IMO.

staab · 2024-03-06T20:20:25Z

Instead of adding new filter properties, we can include an extension to NIP-50's search property

Special syntax for searches is super annoying because it mixes data with code. What if someone wants to search for a note that includes "has:#e"? Why not just add a new filter property, e.g. has: ["e", "a"]

alexgleason · 2024-03-06T20:27:59Z

It's already part of NIP-50 https://github.com/nostr-protocol/nips/blob/master/50.md#extensions

To search for a note with "has:#e" in the text, you'd do this:

{ "search": "\"has:#e\"" }

The functionality in question (filter by tag presence/absence) IS a search functionality. It makes most sense for search relays to implement it.

Also the key:value syntax is common among search engines, and is even used by Postgres and SQLite FTS. You have to quote strings for them to not be treated as search tokens.

alexgleason · 2024-03-06T20:29:35Z

Why not just add a new filter property, e.g. has: ["e", "a"]

Because after a lot of discussion and many months, I realized it's not going to happen. And it probably shouldn't happen.

staab · 2024-03-06T20:36:04Z

It's already part of NIP-50

I did not realize that. Lame.

Also the key:value syntax is common among search engines

Yes, and I've spent way too much time dealing with user inputs that include special characters. As far as I'm aware, with postgres at least you have to do the escaping in your application code, which is painful and error-prone. There's no reason we need to make the same mistakes as the past.

alexgleason · 2024-03-06T20:40:37Z

Check also #1105 to see another application of NIP-50 extensions. It makes sense to do advanced filtering there.

staab · 2024-03-06T21:49:31Z

I'm not saying these aren't useful, but cramming them in a plain text field is a mistake. Instead of new keys you could add an extensions key with the same syntax. But since it's already in NIP 50 it's probably a done deal and there's no point arguing.

alexgleason · 2024-03-06T21:57:53Z

I believe the intended way is to pass the user's search input directly into postgres/fts5. I'm not sure how viable that is.

staab · 2024-03-06T22:16:19Z

fts5 is new to me, I was using postgres' built-in tsvector/tsquery stuff, which didn't play nice with raw user input.

alexgleason · 2024-03-07T16:56:44Z

@staab Doesn't seem that bad in the grand scheme of things. And any parsing errors etc have basically no consequence. https://chat.openai.com/share/9e5f4a6f-b0b9-4644-ae14-2995ac71ee38

staab · 2024-03-07T17:02:18Z

It's not the worst thing, and since it's already happened means I've already lost the argument. I just wish nostr developers (and developers in general) would stop making everyone write parsers.

vitorpamplona · 2024-03-07T18:14:41Z

would stop making everyone write parsers.

Agree. Custom parsers suck.

feat(nips): querying events by tags structure

239ae10

alexgleason mentioned this pull request Jul 26, 2023

Filtering events that are not replies (presence & absence filters) #523

Open

feat(nip100): ask for empty/non present tags

5f77c0e

fernandolguevara changed the title ~~Querying events by tags structure~~ Querying events by tags presence Jul 26, 2023

fernandolguevara force-pushed the nip100 branch from 227191e to 87882c4 Compare July 26, 2023 13:16

rewording to tags presence

d1ec51a

fernandolguevara force-pushed the nip100 branch from 87882c4 to d1ec51a Compare July 26, 2023 13:17

alexgleason reviewed Jul 26, 2023

View reviewed changes

100.md Show resolved Hide resolved

100.md Outdated Show resolved Hide resolved

rewording non-presence to absence

5a5e36b

arthurfranca mentioned this pull request Aug 11, 2023

merge nips 12, 16, 20 and 33 into nip 01 #703

Merged

alexgleason mentioned this pull request Mar 2, 2024

Create a generic reply kind #1096

Open

arthurfranca mentioned this pull request May 19, 2024

NIP-22 - Comment #1233

Open

Querying events by tags presence #683

Are you sure you want to change the base?

Querying events by tags presence #683

Conversation

fernandolguevara commented Jul 26, 2023 • edited Loading

alexgleason commented Jul 26, 2023

alexgleason commented Jul 26, 2023

arthurfranca commented Jul 26, 2023 • edited Loading

alexgleason commented Jul 26, 2023

arthurfranca commented Jul 26, 2023

fiatjaf commented Jul 27, 2023 • edited Loading

fiatjaf commented Jul 27, 2023

arthurfranca commented Jul 27, 2023

vitorpamplona commented Jul 27, 2023

fiatjaf commented Jul 27, 2023

fiatjaf commented Jul 27, 2023

vitorpamplona commented Feb 29, 2024

arthurfranca commented Feb 29, 2024 • edited Loading

alexgleason commented Feb 29, 2024

alexgleason commented Feb 29, 2024

vitorpamplona commented Feb 29, 2024

alexgleason commented Feb 29, 2024

vitorpamplona commented Feb 29, 2024

vitorpamplona commented Feb 29, 2024 • edited Loading

fernandolguevara commented Mar 1, 2024 • edited Loading

mikedilger commented Mar 3, 2024 • edited Loading

arthurfranca commented Mar 4, 2024

mikedilger commented Mar 4, 2024

alexgleason commented Mar 6, 2024

staab commented Mar 6, 2024

alexgleason commented Mar 6, 2024

alexgleason commented Mar 6, 2024

staab commented Mar 6, 2024

alexgleason commented Mar 6, 2024

staab commented Mar 6, 2024

alexgleason commented Mar 6, 2024

staab commented Mar 6, 2024

alexgleason commented Mar 7, 2024

staab commented Mar 7, 2024

vitorpamplona commented Mar 7, 2024

fernandolguevara commented Jul 26, 2023 •

edited

Loading

arthurfranca commented Jul 26, 2023 •

edited

Loading

fiatjaf commented Jul 27, 2023 •

edited

Loading

arthurfranca commented Feb 29, 2024 •

edited

Loading

vitorpamplona commented Feb 29, 2024 •

edited

Loading

fernandolguevara commented Mar 1, 2024 •

edited

Loading

mikedilger commented Mar 3, 2024 •

edited

Loading