Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add locally_discovered_nodes API #2601

Closed
ramfox opened this issue Aug 8, 2024 · 14 comments
Closed

feat: add locally_discovered_nodes API #2601

ramfox opened this issue Aug 8, 2024 · 14 comments
Assignees
Milestone

Comments

@ramfox
Copy link
Contributor

ramfox commented Aug 8, 2024

Add a method on the Endpoint that gives you a list of the nodes that have been discovered using mdns/swarm-discovery so far. If local discovery is not configured for that Endpoint it returns an empty list.

@ramfox ramfox added this to the v0.23.0 milestone Aug 8, 2024
@andrewdavidmackenzie
Copy link

I'd consider returning a Result

  • Ok<Vec<Node>> (Vec could be empty, which is valid, assuming it will filter out "self")
  • Err(e) if discovery is not configured, or some other error

The entry in the list should at least have the information necessary to connect to it (NodeId?)

@flub
Copy link
Contributor

flub commented Aug 9, 2024

What is the usecase for this?

Mostly I'm wondering why mDNS nodes are different here than other nodes and whether this is trying to expose a very specialised sub-view of the NodeState we have. And thus whether this might be better as a more generic view into the NodeState.

The property of mDNS discovered nodes in NodeState terms is probably:

  • We have a UDP path for the node which we know the node itself currently thinks it is reachable on. This is currently the PathState::last_alive condition but sadly that's a very confusing name.

  • We know we are on the same network as this path.

I don't think we have this 2nd bit of information in the PathState currently. Maybe that's what should be added to enable this.

You can already approximate it by guessing based on the network, but we can both be on 192.168.0.0/24 without being on the same network.


We already have Endpoint::connection_infos which gives you a view of the NodeState. Maybe this is sufficient and the returned ConnectionInfo could have a new fields or methods to figure out if you think this node is on the same network as you are?

@matheus23
Copy link
Contributor

What is the usecase for this?

From discord:

I'd like to have a method in my app that finds compatible instances of the app (peers) in the local network and lists them in the UI, for selection to connect to.


We already have Endpoint::connection_infos which gives you a view of the NodeState. Maybe this is sufficient and the returned ConnectionInfo could have a new fields or methods to figure out if you think this node is on the same network as you are?

AFAIU connection_infos only lists you nodes that you've connected to or tried to connect to at least once. But what our users are asking for is a way to "discover" node IDs that they likely want to connect to.

@flub
Copy link
Contributor

flub commented Aug 9, 2024

What is the usecase for this?

From discord:
[...]

Ah, thanks. I did browse discord quickly but did not find this. Anyway, good to include it in the issue.

AFAIU connection_infos only lists you nodes that you've connected to or tried to connect to at least once. But what our users are asking for is a way to "discover" node IDs that they likely want to connect to.

Endpoint::connection_infos gives you all nodes in the NodeMap, that is all nodes we have any kind of information about - be that via discovery, via an explicit Endpoint::add_node_addr(), whatever.

I'm going to update the docs to make this clearer.

But maybe also: should we improve the function name itself? I'm tempted to say that we should.

@andrewdavidmackenzie
Copy link

andrewdavidmackenzie commented Aug 9, 2024 via email

@matheus23
Copy link
Contributor

Endpoint::connection_infos gives you all nodes in the NodeMap, that is all nodes we have any kind of information about - be that via discovery, via an explicit Endpoint::add_node_addr(), whatever.

I'm going to update the docs to make this clearer.

But maybe also: should we improve the function name itself? I'm tempted to say that we should.

Maybe we should add nodes that we discover via LocalSwarmDiscovery using Endpoint::add_node_addr?
I don't think we do that today. We basically keep our own node_addrs: HashMap<PublicKey, Peer> internally to the LocalSwarmDiscovery at this point.

@ramfox
Copy link
Contributor Author

ramfox commented Aug 14, 2024

Apologies if this is rude at all, somehow github crashed on me TWICE while writing this (and I should have learned my lesson the first time) so I'm very annoyed, lol. But thank you all for the feedback.

  1. There isn't a way to access the Endpoint internally in any Discovery service, we only get access to the Endpoint when it's passed in using a method in the Discovery trait. That's why my first "quick-but-gross" version implemented just adds a method to Discovery to access the locally discovered node addrs.

  2. I agree that we probably still want to add nodes to the node map as we discover them, but I don't think that calling connection_infos (or whatever it gets renamed to) actually solves the use case in this situation. They want a list of nodes that they can expect to be online right now. Since the NodeMap is persisted, they can't trust that everything with Source::Mdns is actually a node that was discovered during the current "session". It seems to me users would still want a locally_discovered_nodes method (but maybe named better lmao), that returns the current crop of nodes that are on the local network.

  3. However, this method (locally_discovered_nodes) is NOT a natural fit for the Discovery trait, as can be seen in my PR. Even without the dumb name, it's goofy. The Discovery trait expects something very simple: I ask you for info on a NodeId and you find it for me. However, this is (rightfully) NOT what users expect from something we tell them is mdns-like. So, in my latest iteration I've pulled apart LocalSwarmDiscovery from Discovery and instead added an mdns field in magicsock

  4. Because this version of LocalSwarmDiscovery is created inside magicsock, it does have access to the NodeMap and passively adds NodeAddrs as they come. Yay! But it also allows us a way (not yet fully implemented in the PR), to get the "current" nodes using a locally_discovered_nodes API on the Endpoint.

  5. What sucks about this is that we can no longer use LocalSwarmDiscovery as a Discovery service, since there is no way to "add" it back to the list of Discovery services, once we are in the magicsock. We can change this by expecting a ConcurrentDiscovery in the magicsock, rather than a Box<dyn Discovery>, but I wanted to float this idea before making that change.

here is the PR I referenced above: #2612

@andrewdavidmackenzie
Copy link

As per your comment re: method naming. If the notion of "session" exists, then I would use that in the method name to clarify how (since when) the list has been constructed.

I thought about "online" (as in "now") but you never know do you (a node could have gone offline since discovery)? Depending on how strict you are you might not want to use that in a name "recent" maybe better.

@ramfox
Copy link
Contributor Author

ramfox commented Aug 15, 2024

Hi again folks!

Update:

The new plan is to add a subscribe method to the Discovery trait that returns a stream of DiscoveryItems. The magicsock will subscribe to this and update the NodeMap with every new address discovered.

Then, you can fetch your connection_infos (soon to be node_infos), and filter for which nodes were discovered locally! This also side-steps the naming bikeshed, since there would no longer be a need for a locally_discovered_nodes method.

I was also incorrect about something I mentioned before: I thought we persisted the ENTIRE node state when we persist peers between sessions. This is false, we only persist the actual addresses (right now). So any Source information will be source information we got during the most recent session.

@andrewdavidmackenzie
Copy link

andrewdavidmackenzie commented Aug 15, 2024 via email

@matheus23
Copy link
Contributor

What distinguishes "locally discovered nodes" from others, that we will use to filter on?

You can go through the ConnectionInfos, grab the addrs, and find out if there's an Ipv4 address that is private.

At least I think that's how it could work. LMK if that does anything.

@dignifiedquire
Copy link
Contributor

there will be a source indicating which service discovered it, that you then can use

@ramfox ramfox modified the milestones: v0.23.0, v0.24.0 Aug 21, 2024
@rklaehn
Copy link
Contributor

rklaehn commented Aug 26, 2024

there will be a source indicating which service discovered it, that you then can use

An addr can be discovered by multiple sources, so this needs to be a set.

@ramfox ramfox moved this to 🏗 In progress in iroh Aug 30, 2024
@ramfox ramfox modified the milestones: v0.24.0, v0.25.0 Sep 2, 2024
@dignifiedquire dignifiedquire modified the milestones: v0.25.0, v0.26.0 Sep 16, 2024
@ramfox
Copy link
Contributor Author

ramfox commented Oct 10, 2024

Hello!

With v0.26.0, you can now view the Source of each remote you know about, and filter for Source's that come from the local_swarm_discovery service. Take a look at the docs and the example code.

@ramfox ramfox closed this as completed Oct 10, 2024
@github-project-automation github-project-automation bot moved this from 🏗 In progress to ✅ Done in iroh Oct 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

No branches or pull requests

6 participants