Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Querying events by tags presence #683

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

fernandolguevara
Copy link
Contributor

@fernandolguevara fernandolguevara commented Jul 26, 2023

An attempt to stop using different zoom levels on location specific use cases

View it

@alexgleason
Copy link
Member

This is #523

It's a presence filter specifically.

@fernandolguevara fernandolguevara changed the title Querying events by tags structure Querying events by tags presence Jul 26, 2023
100.md Show resolved Hide resolved
100.md Outdated Show resolved Hide resolved
@alexgleason
Copy link
Member

I think this would solve many problems and we should have it. But there are challenges in relays actually implementing it: #523 (comment)

We would have to define which specific tags get the index. We can't do all of them without a full table scan.

@arthurfranca
Copy link
Contributor

arthurfranca commented Jul 26, 2023

[...] there are challenges in relays actually implementing it [...]

@alexgleason Relays already have to deal with indexing array of values like array of e tag values.
An array of tag names that are present in the event would be similar, so no problem.
If the relay DB isn't able to do "not contains" type of query, it can create an array of absent tag names (A-Za-z minus the present ones).

@alexgleason
Copy link
Member

You would have to basically double the amount of space used by tag indexes. Maybe more since you can't use a partial index if you want both presence and absence filters. It's probably worth doing it at least for "e" tags since we have a strong use-case in #523 that is a very common one. Other tags would need strong arguments in favor of doing it, I think.

Still, I think that shouldn't necessarily block this NIP from proceeding. We can nail down the API.

@arthurfranca
Copy link
Contributor

If this NIP-100 gets merged but implementing it isn't a requirement, in practice it may be like if it never existed because no client would be able to rely on it cause not all relays would implement it. Probably when fetching the feed, clients would continue requesting all notes be them root or not, because potentially many relays are involved.

@fiatjaf any chance you mark this NIP, if merged, and others like NIP-12 and NIP-20, as "Required" on README.md NIP list? Meaning it would be as if they were inside NIP-01, like minimum NIPs that must be supported to consider the relay/client nostr-compatible. Also, as an exception, put them near the top just below NIP-01 despite the numbers.

@fiatjaf
Copy link
Member

fiatjaf commented Jul 27, 2023

Me stating a thing is required doesn't cause everybody to immediately implement it.

Also, as @alexgleason said in the other issue, it's very costly for relays to implement this. I think we absolutely do not need it.

@fiatjaf
Copy link
Member

fiatjaf commented Jul 27, 2023

I think that if your client depends on this you're implementing something wrong. If you're following a person and want to read what they write you want all their kind1s. Regardless of whether you'll display everything in the UI or in different views according to the tags, you still should download everything, store locally and display when appropriate.

@arthurfranca
Copy link
Contributor

Also, as @alexgleason said in the other issue, it's very costly for relays to implement this

Disagree or else I wouldn't be supporting this but don't know what DB you guys are considering.

[...] you still should download everything, store locally and display when appropriate.

Although some clients are doing this I don't consider it efficient. But if doing everything client-side is the recommended way I think its ok to close this PR and the other issue.

@vitorpamplona
Copy link
Collaborator

I like this.

I think that if your client depends on this you're implementing something wrong.

Not exactly. There are use cases where this is a real need. If you want Global without replies, for instance, it doesn't make sense to download everything and then filter replies out.

If you are doing a map of Nostr posts with a GeoHash, it doesn't make sense to download everything and then discard everything that doesn't include a g tag (which is almost everything right now - huge waste).

@fiatjaf
Copy link
Member

fiatjaf commented Jul 27, 2023

If you want Global without replies, for instance, it doesn't make sense to download everything and then filter replies out.

I can understand this, but does anyone really want this? Sounds like some skewed preferences here. "Global without replies". Global is not a thing, and replies are not different from normal notes, technically. Should they have a different kind?

If you are doing a map of Nostr posts with a GeoHash, it doesn't make sense to download everything and then discard everything that doesn't include a g tag (which is almost everything right now - huge waste).

This I don't think is a valid use case (I mean, whatever, it is valid, but what I'm saying is that it doesn't fit Nostr, not all things fit Nostr if we want Nostr to remain simple). Either you are already fetching posts from people that you want, storing these locally somehow, and then you are displaying those that have g tags in a map -- or you should be using the map for a more restrict set of events, that aren't kind 1. Like ads for a local marketplace or whatnot, in this case you expect them all to have g tags.


Maybe we should be making more kinds for different types of events and relying less on tags for indexing. Since tags are so flexible it's easy to think they should be used for everything, but if we start doing that and relying on that this will not end well.

@fiatjaf
Copy link
Member

fiatjaf commented Jul 27, 2023

I don't consider it efficient

what is not efficient? To store events that you want locally? You think it's more efficient to load them from relays over and over multiple times every day?

@vitorpamplona
Copy link
Collaborator

Can we come back to this, please? Has any relay tried to implement it?

@mikedilger since you have just coded a relay, what do you think about this filter?

@arthurfranca
Copy link
Contributor

arthurfranca commented Feb 29, 2024

what is not efficient? To store events that you want locally? You think it's more efficient to load them from relays over and over multiple times every day?

"Feed" events, for example. This event set gets stale so often that when the user re-opens the app they aren't interested anymore on the previously received events. That's why i believe these events should live in memory instead of in a persistent local db.


My unreleased client's "feed" is made of root events (no e tags) and top level replies (with one e tag) so this PR wouldn't help unless i could filter by number of e tags.

edit removed ugly syntax examples. Would be great to have it but i know it won't happen =]~

@alexgleason
Copy link
Member

It seems like the solution to everything not in NIP-01 is DVMs.

@alexgleason
Copy link
Member

I regret posting that.

@vitorpamplona
Copy link
Collaborator

It seems like the solution to everything not in NIP-01 is DVMs.

Maybe we should indeed align our expectations to what the core protocol should solve for and what is expected of a Layer 2 design (DVMs) to do it. If we want to keep the relay dev simple, we should "outsource" everything to layer 2.

Or maybe we just create a new network of relays working on the same events but with more interesting filtering options. Clients can then choose which network they want/need to integrate with.

@alexgleason
Copy link
Member

I don't think filters are getting any more changes at this point.

I need more than just presence/absence tags, anyway. I need joins.

@vitorpamplona
Copy link
Collaborator

@Semisol had some interest in building a new type of relay with a new filtering language. I am not sure if he ended up doing anything. But we could just do a relay with regular read-only SQL as an entry point.

@vitorpamplona
Copy link
Collaborator

vitorpamplona commented Feb 29, 2024

Or maybe this idea of filters and subscriptions themselves should be turned into replaceable events. I can imagine a client signing an event with a Nostr filter (or an SQL query) instead of using the REQ call. The relay would simply attach whatever comes in that event as a subscription to that connection. The d tag becomes the subscription id, then.

@fernandolguevara
Copy link
Contributor Author

fernandolguevara commented Mar 1, 2024

@alexgleason

I don't think filters are getting any more changes at this point.

I need more than just presence/absence tags, anyway. I need joins.

can you elaborate on joins? what's you use case?

@mikedilger
Copy link
Contributor

mikedilger commented Mar 3, 2024

Ok I just went back and read #523 and this issue again, and I have a few more things to say:

First, I have not encountered the need to do presence or absence (or tag count!) queries. But I don't think it is unreasonable.

Second, this idea that relays will need to do complex indexing is wrong. Relays should not index for these kinds of queries at all. Neither should they do hard scans of every event. Relays should (1) require such filters also contain other fields that already narrow down the event set to something reasonable, or else reject the filter as a scraper, and (2) load all the events ignoring the new presence/absence filter specifications, and (3) post-filter all matching events with these new fields. Sure, you loaded more events than you needed and then stripped them back... but that is far less resource consuming than sending them over the network and having the client strip them back. Basically it just pushes that filter operation to the relay to save on network bandwidth.

That being said, if a crafty relay developer wants to index these to boast about hyper-fast performance, that's fine, but we don't need to design for that case. And seriously, if someone sends a "give me all events that don't have a geo tag" were you really going to send them 99.9% of all the events in your database? I don't think so.

I don't like modifying the "#e" to be a non-array (e.g. having a 'null' option). I prefer this PR's method of adding a new field.

Clients SHOULD check NIP-11 before using the new field. But also the rule for relays ought to be "if you see a filter field you do not recognize, that is an error". I don't know if that was codified elsewhere but I think it should be.

I don't follow the need to count for the number of "e" tags, especially if we are moving to "q" tags.

I think this PR is pretty close as is. I'll add it to my relay if there is the momentum to do it (not too easy for me as I have meticulous memory layouts and detailed parsing to update).

EDIT: I don't think this NIP becomes required or part of the core of nostr. It will be okay if most relays don't implement it. Client will have to deal with errors from relays filters that don't accept the new field. BUT we probably do have to push through a small required change which is to make those errors machine-readable (new prefix) and specify that relays must reject filters with fields they do not recognize (I didn't check the current NIPs maybe that is already there).

@arthurfranca
Copy link
Contributor

it just pushes that filter operation to the relay to save on network bandwidth

Interesting take. A caveat is it may mess with "limit" filter, like if a client asks for limit:1, relay fetches 1 record from DB, then filter that 1 out after running the presence/absence/tagcount in memory and returns 0 records, when there could have a matching item on DB. Could be a good trade-off.

the rule for relays ought to be "if you see a filter field you do not recognize, that is an error"

That's the part i disagree. If incompatible relays simply ignore the unknown filter field and apply just the ones it understands, client can still apply the extra filter client-side. Client would still have the option to use the strategy of checking NIP-11 to skip incompatible relays if it prefers not to re-filter client-side.

@mikedilger
Copy link
Contributor

it just pushes that filter operation to the relay to save on network bandwidth

Interesting take. A caveat is it may mess with "limit" filter, like if a client asks for limit:1, relay fetches 1 record from DB, then filter that 1 out after running the presence/absence/tagcount in memory and returns 0 records, when there could have a matching item on DB. Could be a good trade-off.

Oh right.

the rule for relays ought to be "if you see a filter field you do not recognize, that is an error"

That's the part i disagree. If incompatible relays simply ignore the unknown filter field and apply just the ones it understands, client can still apply the extra filter client-side. Client would still have the option to use the strategy of checking NIP-11 to skip incompatible relays if it prefers not to re-filter client-side.

The problem I'm worried about is if a client specifies a new filter field the relay doesn't understand in order to prune the search to something reasonable, but the relay skips that new filter and dumps massive events on the client.

@alexgleason
Copy link
Member

Instead of adding new filter properties, we can include an extension to NIP-50's search property, like:

{ "search": "has:#e" }

This fixes everything IMO.

@staab
Copy link
Member

staab commented Mar 6, 2024

Instead of adding new filter properties, we can include an extension to NIP-50's search property

Special syntax for searches is super annoying because it mixes data with code. What if someone wants to search for a note that includes "has:#e"? Why not just add a new filter property, e.g. has: ["e", "a"]

@alexgleason
Copy link
Member

It's already part of NIP-50 https://github.com/nostr-protocol/nips/blob/master/50.md#extensions

To search for a note with "has:#e" in the text, you'd do this:

{ "search": "\"has:#e\"" }

The functionality in question (filter by tag presence/absence) IS a search functionality. It makes most sense for search relays to implement it.

Also the key:value syntax is common among search engines, and is even used by Postgres and SQLite FTS. You have to quote strings for them to not be treated as search tokens.

@alexgleason
Copy link
Member

Why not just add a new filter property, e.g. has: ["e", "a"]

Because after a lot of discussion and many months, I realized it's not going to happen. And it probably shouldn't happen.

@staab
Copy link
Member

staab commented Mar 6, 2024

It's already part of NIP-50

I did not realize that. Lame.

Also the key:value syntax is common among search engines

Yes, and I've spent way too much time dealing with user inputs that include special characters. As far as I'm aware, with postgres at least you have to do the escaping in your application code, which is painful and error-prone. There's no reason we need to make the same mistakes as the past.

@alexgleason
Copy link
Member

Check also #1105 to see another application of NIP-50 extensions. It makes sense to do advanced filtering there.

@staab
Copy link
Member

staab commented Mar 6, 2024

I'm not saying these aren't useful, but cramming them in a plain text field is a mistake. Instead of new keys you could add an extensions key with the same syntax. But since it's already in NIP 50 it's probably a done deal and there's no point arguing.

@alexgleason
Copy link
Member

I believe the intended way is to pass the user's search input directly into postgres/fts5. I'm not sure how viable that is.

@staab
Copy link
Member

staab commented Mar 6, 2024

fts5 is new to me, I was using postgres' built-in tsvector/tsquery stuff, which didn't play nice with raw user input.

@alexgleason
Copy link
Member

@staab Doesn't seem that bad in the grand scheme of things. And any parsing errors etc have basically no consequence. https://chat.openai.com/share/9e5f4a6f-b0b9-4644-ae14-2995ac71ee38

image

@staab
Copy link
Member

staab commented Mar 7, 2024

It's not the worst thing, and since it's already happened means I've already lost the argument. I just wish nostr developers (and developers in general) would stop making everyone write parsers.

@vitorpamplona
Copy link
Collaborator

would stop making everyone write parsers.

Agree. Custom parsers suck.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants