-
Notifications
You must be signed in to change notification settings - Fork 160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Runtime panic (invalid memory address or nil pointer dereference) when setting up registry for hidden paths in local AS #4364
Comments
Would you be able to provide a minimal test case? |
Ah. NM. Matthias figured how to produce it. |
It would seem that this is the first test that hits this case: i.e. trying to resolve a service address within the same AS. At least, with my very limited understanding of this code, I think that it would crash every time we do this and so, we'd have hit this bug sooner. But then, I'm probably over-simplifying. Here's what I believe I found: BuildFullAddress ends-up with a path that is a partial path, which has nil metadata. As a result p.Metadata().Interfaces crashed. Based on the comment right above that code ( SVC addresses in the local AS get resolved via topology lookup) I infer that not having an interface in this case is normal. What is overlooked is that Metadata is also nil, rather than empty. This is simple to fix. I am trying adding a Interfaces() method to the Path interface, so there's no need to use Metadata() and be exposed to it being nil. With that, I get no crash, but things still don't work. If I interpret the error messages correctly, there is no local Hidden Path registry, so the hidden path can't get registered. I am doing this in a very simple test environment and I don't believe a hidden path registry is being setup so I don't expect this to work. Markus, based on the rather contrived configuration you gave (which Matthias and I duplicated in our minimal test environment) it is hard to figure what result you expect. If you do expect some kind of successful end-to-end behavior, could you help me set things up correctly so I can verify that my fix addresses the real problem and not an irrelevant symptom? |
Hem...taking a closer look. We do create a hiddenpaths registry; as directed by the "registries" entry in the hidden path group config.
|
Hi @jiceatscion. I actually don't believe that the configuration is that contrived: For example, this could be used to hide a path from any other AS but allow local end hosts to use it. So the local control service should be able to register segments there and other local services should be able to retrieve them. Specifically, I'm experimenting with an additional service in the AS that retrieves and further distributes the hidden segments. I would expect this service to be able to communicate with the hidden-path registry. |
Thanks for the clarification. I agree that communicating with the local
registry shouldn't itself be a problem. Since then I have continued down
the path of resolving the connection issues until I reached the point where
the hidden paths get successfully registered. I can tell whether it worked
or not based on the logs, so I don't really need some end-2-end proof of
the pudding just yet.
Resolving the nil reference was just the tip of the iceberg. It seems that
resolving global addresses to local (even internal to the same server)
connections wasn't an existing use case until now because it doesn't work.
I had to fix a couple of issues where a nil path was used where an empty
path was expected, and even after that, I ran into deadline exceeded errors
that I haven't been able to root-cause yet.
I'll update the issue when I have something more tangible to report.
Best,
J-C
…On Fri, Aug 4, 2023 at 12:11 PM Markus Legner ***@***.***> wrote:
Markus, based on the rather contrived configuration you gave (which
Matthias and I duplicated in our minimal test environment) it is hard to
figure what result you expect. If you do expect some kind of successful
end-to-end behavior, could you help me set things up correctly so I can
verify that my fix addresses the real problem and not an irrelevant symptom?
Hi @jiceatscion <https://github.com/jiceatscion>.
Thank you very much for all your debugging efforts.
I actually don't believe that the configuration is *that* contrived: For
example, this could be used to hide a path from any other AS but allow
local end hosts to use it. So the local control service should be able to
register segments there and other local services should be able to retrieve
them.
Specifically, I'm experimenting with an additional service in the AS that
retrieves and further distributes the hidden segments. I would expect this
service to be able to communicate with the hidden-path registry.
—
Reply to this email directly, view it on GitHub
<#4364 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/BBLEYOAQGUWHSPO2PPECHILXTTDDRANCNFSM6AAAAAA2ET4HWY>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
A quick update, mostly notes to myself... So, the hidden paths registry and lookup services use the same port as the control service. That is a dynamic port, which means that the records in the topology file that the discovery service uses can't possibly contain the right port number. In the case of the Control service, the port number that's in the topology file is that of the anycast resolution port. If we do the same for the hiddenpaths service, as I attempted to do, the clients must resolve the address exactly like they would for the control service, and not by querying the discovery service. Illustration. In trying to reproduce the issue, I added the following to the topology file:
As a result here is what the resolution dance looks like in the logs:
The response is wrong as the port for the hiddenpaths service should be the same as that of the control service and discovery service i.e. 32768. Instead it gives us what I put in the topo file, not knowing what I was doing: 31010. So we have some choices:
My own take:
I do not know enough to make a judgement call just yet, so your opinions are welcome. Also, wouldn't it be nice if there was a debug log somewhere that tells us when a port receives an rpc that has no server at that port? |
#4376 Is my attempt at doing number 1 in the least disruptive manner... I think. |
roosd@ suggested a 4th approach that is less disruptive than my proposals: Make the control server QUIC/SCION port static by explicitly configuring it (via the cs toml config - quic.address). The port needs to be different from that of the public address (because the SVC redirector binds to the public address' port). If we do that, we can set the QUIC/SCION address of the hiddenpath service in the corresponding entry of the topo file for the discovery service to use, while the control server's public address remains as it is. The public address continues to be reachable via TCP or over scion via the SVC redirection as that works with either static or dynamic ports. There remains a few things to be resolved:
Regarding the config schema flaw... Server has three relevant addresses:
Constraints:
Solution: The last constraint is wrong: as soon as SVC resolution is supported the SCION service address cannot be the same as the public address and there's nothing we can do about that. Therefore it is wrong to assume they're the same. Therefore let's drop that constraint. To that end:
|
FWIW, I don't think the last constraint holds. The discovery service only exists in the SCION world, not the TCP world, and it advertises the hidden segment services. Thus, while it is true that it needs to announce the SCION service address (because we multiplex hidden segment API and SCION control plane API on the same address), it is not true that it must be the public address.
I don't think this is a work around. It is the intended use of that field. The discovery service should announce In any case. I think part of the problem is that we use the topology for configuration and discovery, and that we assume every node needs to have the same view of the topology. In an ideal world, I would do away with the topology file altogether (or at least use it for discovery only and not for configuration.). Another problem is the asymmetry between the TCP/IP world and the QUIC/SCION world. I have an alternative proposal: Instead of allocating a dynamic port, for the SCION Service address, we use the public address. To still support SVC resolution, we add an additional address to the The topology would look something like this:
As long as we still have a dispatcher this is a backwards compatible change. Side-note: The port is currently not really important for SVC resolution. In fact, judging from the code, it doesn't even look like it ever appears on the wire. |
An additional benefit of the alternative proposal is that it allows for full flexibility. If the control service serves both the SCION control plane API, discovery API and hidden segment API on the same port it can be expressed by using the same port for everything. If it uses different listeners based on API, that can be expressed by using different ports. Any client wanting to talk to the different APIs does not need to be aware of these internal details. |
I think we are generally in agreement. My reason for preferring to keep SVC Resolver addr and Public addr the same was my assumption that this was an entrenched thing that I'd better not try to change. (Comparing that with simply updating the field used by the Discovery service.) Since you are much for familiar with the code, I trust your judgement that SVC resolution and public addresses can easily be different. Granted that, I agree that your option is preferable. It makes more intuitive sense.
I think both proposals do that anyway. Which is what I was aiming for. I'll give a try to your approach. If I don't hit a roadblock, I'll be happy to adopt it. |
I'm pretty confident it will work. On the client side, we simply take the address that we receive during SVC resolution: https://github.com/scionproto/scion/blob/4096d879b05a0a7e287bfb79590941facdf40bf4/private/app/appnet/addr.go#L222C12-L226 Lines 191 to 194 in 4096d87
Now, the SVC resolution packet gets routed to the control service in the following way:
Thus, at no point during the SVC resolution process, the UDP port of the listener is taken into account. (But again, this only holds as long as we still have the dispatcher. Otherwise, UDP port matters, but only from the router's point of view) |
Btw... would the service_resolution address configured for services other than the control service be ever used for anything? In general, only one service will initialize the resolver and the others don't need to care, do they? If that is the case we might not even need to change the schema, only add a dedicated service in the topo file just to configure the resolver. WDYT? |
The discovery and the control service are conceptually two separate things. Thus, you need to be able to configure them separately from the viewpoint of the router. That they are served on the same listener is an implementation detail of the monolith |
Update Summary: Making the resolver address != from the public address (and making the SCION address == public address) works. |
Another thing I tried succesfully (after fixing a bug in the dispatcher), is to let the resolver get a dynamic port number by default. This has the advantage of not requiring any change in the configuration schema since there's no need to specify a separate address:port for the resolver. |
That works for now, and can be our preferred option. We will need to revisit during dispatcher removal though. |
Yeah, but then, I doubt that having kept the resolver port static will make the removal critically simpler. We can design a new scheme at that point. To summarize the options that I have at this point. I can do either or both of:
I can also allow the configuration of a separate resolver address for each service, but I am not sure I understand what this would be for. It would also be worth reflecting a bit on the interactions between the server config files and the topology file but if we go for number 1 we can make it a separate conversation. I'll assume we're happy with just number 1. Anyone thinks it's wrong, speak up! |
Ooops, I added some confusion... In my previous comment I swapped both options. What I wanted to suggest was to do just option 2 which if done by itself is a very small change. I'm ok with either, though. One more thing I'd like your (oncilla@, matzf@, lukedirtwalker@) opinions about before I propose a fix: it seems we have more defaults, fall-backs, and other flexibility features than is really good for us. It makes it easy to misconfigure things without realizing it. I'd like to remove one of those. Now that we are able to move the resolver port out of the way (be it dynamic, explicit, or both), is there any point in defaulting the SCION port of a service to a dynamic port? Would it not make more sense to make it identical to the public address? If so, is it even useful to allow it to be configured to something else? |
…otely. Fixes #4364 There were a number of issues: * Because the normal case is that the registry is remote, the registerer makes no attempt at using the intraAS network to reach the registry (and the registry isn't listening on TCP). However, quic/scion path resolution to an AS-local host was broken: the resulting paths are expected to be complete paths with Metadata, but intraAS paths were meta-data-less paths, resulting in nill-ptr dereference. * By default the control service port (shared with the hidden_segment registry) was a dynamic port. The discovery service cannot only knows of statically configured port (configured in the topo file). So the only way to make the hidden_segments service discoverable is to give the control service a static port. While possible, this would result in a confusing and conter-intuitive configuration: * the control_service quic address would be configured by the cs yaml file and be different from what appears in the topo file for the CS but identical to the hidden_segment service address. * the CS address in the topo file was acually the TCP address and could never be mirrored by the quic address (because that quic address was occupied by the service resolver). * the address of the hidden_segments registry in the topo file would be different from that of the CS, eventhough they share the same port. * This was all invisible to the integration test which was skillfully predicting the dynamic port and configuring the discovery service for it. To correct these infelicities: * Give the Svc resolver a dynamic port * By default, give to the CS and the hidden_segment registry the quic address that has the same IP and port as the control service's TCP address. * Produce complete Path objects to represent intraAS paths. In passing: * A handful of AssertEqual() had arguments in the wrong order. * In one case we were constructing a full SVC address with a nil path instead of an empty path.
Thank you so much for the great work on this, @jiceatscion! |
When trying to set up a registry for hidden segments locally (see config below), the control service has a runtime panic.
Tested version: SCIONLab package 4.6.2 on Ubuntu 22.04
Stack trace:
The problematic line seems to be
so I assume that for the concrete path
Metadata()
returnsnil
.HP config:
I haven't tested it with the up-to-date version in
master
yet, but the relevant code seems to be unchanged.The text was updated successfully, but these errors were encountered: