Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NIP-114: ids_only filter #1027

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Conversation

mmalmi
Copy link

@mmalmi mmalmi commented Feb 6, 2024

When a filter has ids_only: true, the relay must return only the ids of the events that match the filter.

This allows a client to request the full event only when it is not already stored locally or received from another relay. It can save a lot of bandwidth, especially when connecting to multiple relays or subscribing to a lot of data.

@fiatjaf
Copy link
Member

fiatjaf commented Feb 6, 2024

It's probably better to use a new verb for this instead of REQ, maybe just HAVE as in the response.

And instead of using GET why not REQ with {ids:[]}?

@mmalmi
Copy link
Author

mmalmi commented Feb 9, 2024

Unlike REQ, GET doesn't create a subscription. In this case, we don't want subscriptions, because they would need to be separately closed and might exceed the relay's subscription limit.

Maybe the format should be ["GET", filters] instead of ["GET", eventId] so it could also be used in other requests where a subscription is not wanted.

Also HAVE messages could return an array of matching ids instead of just one id. Depends on the DB & buffering strategy how many ids the relay wants to put in one HAVE response.

If we want a new verb instead of using REQ + filter.ids_only, I think it should be something other than a reuse of HAVE because it needs to be handled differently. Maybe SUB?

@monlovesmango
Copy link
Member

Unlike REQ, GET doesn't create a subscription. In this case, we don't want subscriptions

saying its not a subscription is just a superficial statement from the perspective of the client. from the relay side it takes the same amount of resources and I would think relay operators would definitely want to count this as another REQ subscription.

besides, using EOSE you can effectively get the same behavior you are describing imo. if relay doesn't support EOSE by now it probably won't support GET anytime soon either.

I like the ids_only in the filter but not the GET that is duplicative of REQ

I lean towards not having HAVE return an array, but might be good to use in conjunction with EOSE?

I don't like SUB for new verb if you want that for ids_only filter (since REQ is referred to as a sub a lot). maybe GOT?

@fiatjaf
Copy link
Member

fiatjaf commented Feb 10, 2024

One honest question: why not use negentropy?

@vitorpamplona
Copy link
Collaborator

One honest question: why not use negentropy?

Too complicated to code. There are no libs and these alternative sync options resolve most of the issue with 20 min implementations from scratch.

If we want force Negentropy to be a mandatory part of Nostr, we lose most of the simplicity benefits of Nostr.

@fiatjaf
Copy link
Member

fiatjaf commented Feb 10, 2024

I don't think we should force negentropy to be mandatory, but this shouldn't be mandatory either. And if negentropy does the same that is achieved here more efficiently it is better to have only nip01 + negentropy instead of nip01 + ids_only + negentropy as things to code.

By agree it is complicated. I was implementing it and stopped to give my brain some time to breath.

@vitorpamplona
Copy link
Collaborator

but this shouldn't be mandatory either

Maybe not this, but some type of easy SYNC should be mandatory in the long run.

Many apps now are observing the benefits of a local cache with persistent databases. All those databases need to sync from time to time to make sure the user is not missing anything. Downloading everything, all the time is just not a scaleable solution to sync.

A simple (30min implementation) sync solution does improve Nostr's decentralization while reducing the data bandwidth of the protocol. And both clients and relays waste less resources in doing so.

I sync (re-download everything, from all ~3000 relays) ~2GB worth of events every other day for my user. I would prefer not doing that to relays, but I have to. The current state of sync is just dumb.

@fiatjaf
Copy link
Member

fiatjaf commented Feb 10, 2024

I sync (re-download everything, from all ~3000 relays) ~2GB worth of events every other day for my user. I would prefer not doing that to relays, but I have to. The current state of sync is just dumb.

I'd guess most people aren't doing that, and isn't a requirement for having Nostr work, but I agree that when you need to do it it is dumb. If we're thinking about a good long term solution then I think negentropy looks better than this proposal though. Also even though it is hard to code it isn't that hard and seems to be worth it for those strict use cases that really need it.

I would be interested in something that is better than negentropy if there is anything. I've played with https://github.com/sipa/minisketch and I was shocked about how well it worked -- but it has a lot of drawbacks in comparison with negentropy that make it unpractical. Also it is completely impossible for me to understand or implement -- although it is in C so it would be possible to use almost everywhere.

I guess we could be fine with a negentropy implementation in C and wrappers.

@vitorpamplona
Copy link
Collaborator

vitorpamplona commented Feb 10, 2024

I tried to implement last year's (it changed) Negentropy from scratch and gave up. I am not sure if a negentropy lib that "automatically" works with all database types will ever exist. The amount of work on just mapping the database structure and the relay messages to the needs of the library alone is a time-consuming process and it does require you to have a good clue of what the library is doing to assemble the right sets. I am not sure its going to be as simple as just adding it, calling a function, and boom, synced.

However, If you are looking for the most efficient way and are willing to pay the price of protocol simplicity for it, negentropy wins. I just don't think Nostr is the place for that simplicity sacrifice. At least not on the mandatory part of Nostr.

After testing a bunch of stuff, my recommendation was to do #826

Nostr.wine implemented it in 30 minutes and I did the same on Amethyst in about 1hr. It's not the most efficient way to solve Sync, but it is pretty darn good for the simplicity of the proposal.

This one is even simpler.

@mmalmi
Copy link
Author

mmalmi commented Feb 13, 2024

Negentropy or similar is good for syncing existing datasets, but this NIP is also useful if you have open subscriptions to many relays that might send the same new events many times over. That might not be a huge amount of data on a light client, but especially on relay-to-relay communications it can save bandwidth.

@mmalmi
Copy link
Author

mmalmi commented Feb 13, 2024

Unlike REQ, GET doesn't create a subscription. In this case, we don't want subscriptions

saying its not a subscription is just a superficial statement from the perspective of the client. from the relay side it takes the same amount of resources and I would think relay operators would definitely want to count this as another REQ subscription.

It's not the same. Subscriptions persistently occupy memory on the relay and need to be checked against every time a new event is received. If GET is only for event ids, the lookup is usually very fast.

besides, using EOSE you can effectively get the same behavior you are describing imo. if relay doesn't support EOSE by now it probably won't support GET anytime soon either.

Client side logic is a lot simpler and data is synced faster if there's GET that doesn't create a subscription. You can just ask right away for the not-haves. Otherwise, you have to buffer them and send them as a batch after some arbitrary timeout or debounce, which slows down the sync. Then you need to buffer again until the relay responds with EOSE, and as a fallback for not receiving EOSE, use a timeout.

@monlovesmango
Copy link
Member

monlovesmango commented Feb 13, 2024

I see, yeah I guess being able to make single GET is more nimble and lightweight.

It's not the same. Subscriptions persistently occupy memory on the relay and need to be checked against every time a new event is received. If GET is only for event ids, the lookup is usually very fast.

even if the lookup is very fast, lookup for multiple GETs is still slower than a single REQ with only id array in the filter. just because it is closed right away shouldn't mean it is exempt from having a subscription ID. also not having a subscription ID would mean relays would have to rely solely on rate limiting to limit excessive GETs and there would be no way to communicate relay errors to the client with any specific GET.

I do agree that subscriptions persist on the relay which cause overhead, and it would be nice for a relay to know that "once I am done with this look up I can close this subscription right away".

You can just ask right away for the not-haves. Otherwise, you have to buffer them and send them as a batch after some arbitrary timeout or debounce, which slows down the sync.

agree

Then you need to buffer again until the relay responds with EOSE, and as a fallback for not receiving EOSE, use a timeout.

disagree, you can handle the events however you want the moment the EVENT is received from the relay just like if multiple GETs were sent if thats what you want to do. you would only need to wait for EOSE or timeout to closeout the subscription.

overall, it seems to me the main goal of the GET is to be able to make a REQ (for a specific event ID) which will close automatically upon completion. I think this is useful. however, I don't agree that just bc its only querying a single ID it should be exempt from relay max subscription limits and not have subscription ID. clients still need to be limited in how many queries they can send to the relay at once and relays still need to be able to communicate errors.

all this to say, I think GET should be more of a REQ + CLOSE message (and might be less ambiguous to name something like REQCLOSE). this way it will:

  • give you the behavior you want for requesting a single ID with little overhead
  • have the added benefit of allowing the auto close behavior for any filter (not just a single ID), which I think actually adds a lot of convenience to general querying (most REQs probably don't need to stay open)
  • preserve the subscription ID model (which imo GET shouldn't be exempted from just because it has little overhead)

@vitorpamplona
Copy link
Collaborator

vitorpamplona commented Feb 13, 2024

This proposal works for sporadic SYNCs involving thousands of events. Each HAVE is about 100 chars, including the WebSocket protocol overhead bytes. Which leads to 1MB transfer for a 10,000 event sync. Given that users on average are connected to 10 relays, that's 10MB of data transfer costs for a single SYNC event with all relays. Looks good.

However, I was hoping to SYNC Amethyst's local database (~50,000 to 100,000 events) every time the user brings the app back to the foreground. That's a 5MB to 10MB download 10-20 times a day for each relay. A daily sync cost of 500MB - 2GB for all relays.

So, I would still prefer #826

Sidepoint: Amethyst does a TON (dozens of times a second) of GETs (REQ + CLOSE). If a subscription is indeed heavier to support, then we can start using GET directly independent of this ids_only proposal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants