Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Runtime panic (invalid memory address or nil pointer dereference) when setting up registry for hidden paths in local AS #4364

Closed
mlegner opened this issue Jul 10, 2023 · 21 comments · Fixed by #4376
Labels
bug Something isn't working

Comments

@mlegner
Copy link
Contributor

mlegner commented Jul 10, 2023

When trying to set up a registry for hidden segments locally (see config below), the control service has a runtime panic.

Tested version: SCIONLab package 4.6.2 on Ubuntu 22.04

Stack trace:

github.com/scionproto/scion/go/lib/log.HandlePanic
  /builds/PRV-PERRIG/scionlab/scion-builder/scion/go/lib/log/log.go:132
runtime.gopanic
  /usr/local/go/src/runtime/panic.go:1038
runtime.panicmem
  /usr/local/go/src/runtime/panic.go:221
runtime.sigpanic
  /usr/local/go/src/runtime/signal_unix.go:735
github.com/scionproto/scion/go/lib/infra/messenger.AddressRewriter.buildFullAddress
  /builds/PRV-PERRIG/scionlab/scion-builder/scion/go/lib/infra/messenger/addr.go:178
github.com/scionproto/scion/go/lib/infra/messenger.AddressRewriter.redirectToQUIC
  /builds/PRV-PERRIG/scionlab/scion-builder/scion/go/lib/infra/messenger/addr.go:108
github.com/scionproto/scion/go/lib/infra/messenger.AddressRewriter.RedirectToQUIC
  /builds/PRV-PERRIG/scionlab/scion-builder/scion/go/lib/infra/messenger/addr.go:76
github.com/scionproto/scion/go/cs/onehop.(*AddressRewriter).RedirectToQUIC
  /builds/PRV-PERRIG/scionlab/scion-builder/scion/go/cs/onehop/addr.go:67
github.com/scionproto/scion/go/pkg/grpc.(*QUICDialer).Dial
  /builds/PRV-PERRIG/scionlab/scion-builder/scion/go/pkg/grpc/dialer.go:159
github.com/scionproto/scion/go/pkg/hiddenpath/grpc.(*Discoverer).Discover
  /builds/PRV-PERRIG/scionlab/scion-builder/scion/go/pkg/hiddenpath/grpc/discovery.go:37
github.com/scionproto/scion/go/pkg/hiddenpath.resolve
  /builds/PRV-PERRIG/scionlab/scion-builder/scion/go/pkg/hiddenpath/discovery.go:96
github.com/scionproto/scion/go/pkg/hiddenpath.RegistrationResolver.Resolve
  /builds/PRV-PERRIG/scionlab/scion-builder/scion/go/pkg/hiddenpath/discovery.go:55
github.com/scionproto/scion/go/pkg/hiddenpath.(*BeaconWriter).Write.func1
  /builds/PRV-PERRIG/scionlab/scion-builder/scion/go/pkg/hiddenpath/beaconwriter.go:108
github.com/scionproto/scion/go/pkg/hiddenpath.(*remoteWriter).run
  /builds/PRV-PERRIG/scionlab/scion-builder/scion/go/pkg/hiddenpath/beaconwriter.go:155
github.com/scionproto/scion/go/pkg/hiddenpath.(*BeaconWriter).Write.func3
  /builds/PRV-PERRIG/scionlab/scion-builder/scion/go/pkg/hiddenpath/beaconwriter.go:123

The problematic line seems to be

	if len(p.Metadata().Interfaces) == 0 { //when local AS

so I assume that for the concrete path Metadata() returns nil.

HP config:

---
groups:
  ffaa:1:1099-000a:
    owner: 17-ffaa:1:1099
    readers:
    - 17-ffaa:1:1099
    registries:
    - 17-ffaa:1:1099
    writers:
    - 17-ffaa:1:1099

registration_policy_per_interface:
  1:
    - "ffaa:1:1099-000a"

I haven't tested it with the up-to-date version in master yet, but the relevant code seems to be unchanged.

@mlegner mlegner added the bug Something isn't working label Jul 10, 2023
@jiceatscion
Copy link
Contributor

Would you be able to provide a minimal test case?

@jiceatscion
Copy link
Contributor

Ah. NM. Matthias figured how to produce it.

@jiceatscion
Copy link
Contributor

It would seem that this is the first test that hits this case: i.e. trying to resolve a service address within the same AS. At least, with my very limited understanding of this code, I think that it would crash every time we do this and so, we'd have hit this bug sooner. But then, I'm probably over-simplifying.

Here's what I believe I found:

BuildFullAddress ends-up with a path that is a partial path, which has nil metadata. As a result p.Metadata().Interfaces crashed. Based on the comment right above that code ( SVC addresses in the local AS get resolved via topology lookup) I infer that not having an interface in this case is normal. What is overlooked is that Metadata is also nil, rather than empty.

This is simple to fix. I am trying adding a Interfaces() method to the Path interface, so there's no need to use Metadata() and be exposed to it being nil.

With that, I get no crash, but things still don't work. If I interpret the error messages correctly, there is no local Hidden Path registry, so the hidden path can't get registered. I am doing this in a very simple test environment and I don't believe a hidden path registry is being setup so I don't expect this to work.

Markus, based on the rather contrived configuration you gave (which Matthias and I duplicated in our minimal test environment) it is hard to figure what result you expect. If you do expect some kind of successful end-to-end behavior, could you help me set things up correctly so I can verify that my fix addresses the real problem and not an irrelevant symptom?

@jiceatscion
Copy link
Contributor

jiceatscion commented Aug 2, 2023

Hem...taking a closer look. We do create a hiddenpaths registry; as directed by the "registries" entry in the hidden path group config.
The actual errors aren't quite saying the reistry isn't there. Rather it's the process of looking it up that fails:

2023-08-02 14:33:30.001605+0000	DEBUG	            appnet/addr.go:185	Sending SVC resolution request	{
  "debug_id": "d2399253", "isd_as": "1-ff00:0:112", "svc": "DS", "svcResFraction": 1.337
}
2023-08-02 14:33:30.002533+0000	DEBUG	            appnet/addr.go:190	SVC resolution failed	{
  "debug_id": "d2399253", "err": {
    "msg": "unable to write", "cause": {
      "msg": "serialize SCION packet", "cause": {
        "msg": "no path set", "stacktrace": [
          "github.com/scionproto/scion/pkg/snet.(*Packet).Serialize pkg/snet/packet.go:527",
	  "github.com/scionproto/scion/pkg/snet.(*SCIONPacketConn).WriteTo pkg/snet/packet_conn.go:128",
	  "github.com/scionproto/scion/private/svc.roundTripper.RoundTrip private/svc/resolver.go:159",
	  "github.com/scionproto/scion/private/svc.(*Resolver).LookupSVC private/svc/resolver.go:116",
	  "github.com/scionproto/scion/private/app/appnet.AddressRewriter.resolveSVC private/app/appnet/addr.go:188",
	  "github.com/scionproto/scion/private/app/appnet.AddressRewriter.RedirectToQUIC private/app/appnet/addr.go:102",
	  "github.com/scionproto/scion/control/onehop.(*AddressRewriter).RedirectToQUIC control/onehop/addr.go:67",
	  "github.com/scionproto/scion/pkg/grpc.(*QUICDialer).Dial pkg/grpc/dialer.go:146",
	  "github.com/scionproto/scion/pkg/experimental/hiddenpath/grpc.(*Discoverer).Discover pkg/experimental/hiddenpath/grpc/discovery.go:37",
	  "github.com/scionproto/scion/pkg/experimental/hiddenpath.resolve pkg/experimental/hiddenpath/discovery.go:96",
	  "github.com/scionproto/scion/pkg/experimental/hiddenpath.RegistrationResolver.Resolve pkg/experimental/hiddenpath/discovery.go:55",
	  "github.com/scionproto/scion/pkg/experimental/hiddenpath.(*BeaconWriter).Write.func1 pkg/experimental/hiddenpath/beaconwriter.go:108",
	  "github.com/scionproto/scion/pkg/experimental/hiddenpath.(*remoteWriter).run pkg/experimental/hiddenpath/beaconwriter.go:155",
	  "github.com/scionproto/scion/pkg/experimental/hiddenpath.(*BeaconWriter).Write.func3 pkg/experimental/hiddenpath/beaconwriter.go:123",
	  "runtime.goexit GOROOT/src/runtime/asm_amd64.s:1594"
	]
      }
    }
  }
}
2023-08-02 14:33:30.002680+0000	ERROR	hiddenpath/beaconwriter.go:157	Unable to choose server	{
  "debug_id": "d2399253", "hp_group": "ff00:0:112-a", "err": {
    "msg": "discovering hidden path server", "cause": {
      "msg": "dialing", "cause": {
        "msg": "resolving SVC address", "cause": {
	  "msg": "unable to write", "cause": {
	    "msg": "serialize SCION packet", "cause": {
	      "msg": "no path set", "stacktrace": [
	        "github.com/scionproto/scion/pkg/snet.(*Packet).Serialize pkg/snet/packet.go:527",
		"github.com/scionproto/scion/pkg/snet.(*SCIONPacketConn).WriteTo pkg/snet/packet_conn.go:128",
		"github.com/scionproto/scion/private/svc.roundTripper.RoundTrip private/svc/resolver.go:159",
		"github.com/scionproto/scion/private/svc.(*Resolver).LookupSVC private/svc/resolver.go:116",
		"github.com/scionproto/scion/private/app/appnet.AddressRewriter.resolveSVC private/app/appnet/addr.go:188",
		"github.com/scionproto/scion/private/app/appnet.AddressRewriter.RedirectToQUIC private/app/appnet/addr.go:102",
		"github.com/scionproto/scion/control/onehop.(*AddressRewriter).RedirectToQUIC control/onehop/addr.go:67",
		"github.com/scionproto/scion/pkg/grpc.(*QUICDialer).Dial pkg/grpc/dialer.go:146",
		"github.com/scionproto/scion/pkg/experimental/hiddenpath/grpc.(*Discoverer).Discover pkg/experimental/hiddenpath/grpc/discovery.go:37",
		"github.com/scionproto/scion/pkg/experimental/hiddenpath.resolve pkg/experimental/hiddenpath/discovery.go:96",
		"github.com/scionproto/scion/pkg/experimental/hiddenpath.RegistrationResolver.Resolve pkg/experimental/hiddenpath/discovery.go:55",
		"github.com/scionproto/scion/pkg/experimental/hiddenpath.(*BeaconWriter).Write.func1 pkg/experimental/hiddenpath/beaconwriter.go:108",
		"github.com/scionproto/scion/pkg/experimental/hiddenpath.(*remoteWriter).run pkg/experimental/hiddenpath/beaconwriter.go:155",
		"github.com/scionproto/scion/pkg/experimental/hiddenpath.(*BeaconWriter).Write.func3 pkg/experimental/hiddenpath/beaconwriter.go:123",
		"runtime.goexit GOROOT/src/runtime/asm_amd64.s:1594"
	      ]
	    }
	  }
	}
      }
    }
  }
}

2023-08-02 14:33:30.002827+0000	ERROR	       beaconing/writer.go:110	Unable to register	{
  "debug_id": "d2399253", "seg_type": "down", "err": {
    "msg": "no beacons registered", "stacktrace": [
      "github.com/scionproto/scion/pkg/experimental/hiddenpath.(*BeaconWriter).Write pkg/experimental/hiddenpath/beaconwriter.go:131",
      "github.com/scionproto/scion/control/beaconing.(*WriteScheduler).run control/beaconing/writer.go:124",
      "github.com/scionproto/scion/control/beaconing.(*WriteScheduler).Run control/beaconing/writer.go:109",
      "github.com/scionproto/scion/private/periodic.(*Runner).onTick private/periodic/periodic.go:213",
      "github.com/scionproto/scion/private/periodic.(*Runner).runLoop private/periodic/periodic.go:195",
      "github.com/scionproto/scion/private/periodic.StartWithMetrics.func1 private/periodic/periodic.go:145",
      "runtime.goexit GOROOT/src/runtime/asm_amd64.s:1594"
    ],
    "candidates": 1
  }
}

@mlegner
Copy link
Contributor Author

mlegner commented Aug 4, 2023

Markus, based on the rather contrived configuration you gave (which Matthias and I duplicated in our minimal test environment) it is hard to figure what result you expect. If you do expect some kind of successful end-to-end behavior, could you help me set things up correctly so I can verify that my fix addresses the real problem and not an irrelevant symptom?

Hi @jiceatscion.
Thank you very much for all your debugging efforts.

I actually don't believe that the configuration is that contrived: For example, this could be used to hide a path from any other AS but allow local end hosts to use it. So the local control service should be able to register segments there and other local services should be able to retrieve them.

Specifically, I'm experimenting with an additional service in the AS that retrieves and further distributes the hidden segments. I would expect this service to be able to communicate with the hidden-path registry.

@jiceatscion
Copy link
Contributor

jiceatscion commented Aug 4, 2023 via email

@jiceatscion
Copy link
Contributor

jiceatscion commented Aug 9, 2023

A quick update, mostly notes to myself...
Many thanks to Jordi and Francois for the copious hand-holding.

So, the hidden paths registry and lookup services use the same port as the control service. That is a dynamic port, which means that the records in the topology file that the discovery service uses can't possibly contain the right port number.

In the case of the Control service, the port number that's in the topology file is that of the anycast resolution port. If we do the same for the hiddenpaths service, as I attempted to do, the clients must resolve the address exactly like they would for the control service, and not by querying the discovery service.

Illustration. In trying to reproduce the issue, I added the following to the topology file:

  "hidden_segment_lookup_service": {
    "cs1-ff00_0_112-1": {
      "addr": "[fd00:f00d:cafe::7f00:a]:31010"
    }
  },
  "hidden_segment_registration_service": {
    "cs1-ff00_0_112-1": {
      "addr": "[fd00:f00d:cafe::7f00:a]:31010"
    }
  },

As a result here is what the resolution dance looks like in the logs:

** Resolve the discovery service:** 2023-08-09 09:22:18.301326+0000	DEBUG	            appnet/addr.go:190	Sending SVC resolution request	{"debug_id": "6be212d3", "isd_as": "1-ff00:0:112", "svc": "DS", "svcResFraction": 1.337}
** Worked port is the control server port - 32768:** 2023-08-09 09:22:18.302435+0000	DEBUG	            appnet/addr.go:199	SVC resolution successful	{"debug_id": "6be212d3", "reply": {"Transports":{"QUIC":"[fd00:f00d:cafe::7f00:a]:32768"},"ReturnPath":{}}}
** Query the discovery service for the hidden path service:** 2023-08-09 09:22:18.302613+0000	DEBUG	        grpc/interceptor.go:41	Outgoing RPC	{"debug_id": "fb792fde", "trace_id": "29b9105df6cefdab", "method": "/proto.discovery.v1.DiscoveryService/HiddenSegmentServices", "target": "1-ff00:0:112,[fd00:f00d:cafe::7f00:a]:32768"}
** Seems to work: ** 2023-08-09 09:22:18.305890+0000	DEBUG	  discovery/toposervice.go:115	Replied with hidden segment services	{"debug_id": "e7f7d314", "trace_id": "29b9105df6cefdab", "lookups": 1, "registration": 1}
** But response is wrong:** 2023-08-09 09:22:18.306665+0000	DEBUG	        grpc/interceptor.go:41	Outgoing RPC	{"debug_id": "e124c2a7", "trace_id": "29b9105df6cefdab", "method": "/proto.hidden_segment.v1.HiddenSegmentRegistrationService/HiddenSegmentRegistration", "target": "1-ff00:0:112,[fd00:f00d:cafe::7f00:a]:**31010**"}

The response is wrong as the port for the hiddenpaths service should be the same as that of the control service and discovery service i.e. 32768. Instead it gives us what I put in the topo file, not knowing what I was doing: 31010.
If I modify the topo fle to list the hidden paths lookup and registry services are at port 32768, the registration just works.

So we have some choices:

  1. Change the code the dialer used by the hidden paths registerer so it resolves the service like is done for the control service.
  2. Give the hiddenpaths registry its own server with a static port.
  3. Make the discovery service more capable: add a way for servers to register dynamic ports.

My own take:

  • 1 can be done right away but may not be the most desirable as it adds an optional service that the border routers would need to know about (Jordi, Francois, did I get this right?)... unless clients outright assumes that control service and hidden paths service are at the same port.
  • 2 is conceptually simple but ruins whatever benefits are derived from having all the essential services in one server.
  • 3 would be the high-road but may not be worth it if the hidden paths service is the only thing to ever use that capability.

I do not know enough to make a judgement call just yet, so your opinions are welcome.

Also, wouldn't it be nice if there was a debug log somewhere that tells us when a port receives an rpc that has no server at that port?

jiceatscion added a commit to jiceatscion/scion that referenced this issue Aug 10, 2023
@jiceatscion
Copy link
Contributor

#4376 Is my attempt at doing number 1 in the least disruptive manner... I think.

@jiceatscion
Copy link
Contributor

jiceatscion commented Aug 16, 2023

roosd@ suggested a 4th approach that is less disruptive than my proposals:

Make the control server QUIC/SCION port static by explicitly configuring it (via the cs toml config - quic.address). The port needs to be different from that of the public address (because the SVC redirector binds to the public address' port).

If we do that, we can set the QUIC/SCION address of the hiddenpath service in the corresponding entry of the topo file for the discovery service to use, while the control server's public address remains as it is. The public address continues to be reachable via TCP or over scion via the SVC redirection as that works with either static or dynamic ports.

There remains a few things to be resolved:

  • AS-Local communications should not go over scion (eventhough that is possible).
  • The topology schema is slightly flawed, although we can make-do with it.

Regarding the config schema flaw...

Server has three relevant addresses:

  • Public address (TCP and SCion interpretation)
  • SCION service address
  • SCION SVC resolver address

Constraints:

  • SCION service address can be dynamic but must be static to be Discoverable.
  • SCION service address can't be the same as SVC resolver (they're in the same address space).
  • Config of public address is used verbatim by direct TCP intra-AS addressing
  • Config of public address is used by clients to access SVC resolver so must be equal to SVC resolver SCION address.
  • Config of public address is returned by Discovery service so must be equal to SCION service address.
    => All three addresses must be equal, yet service address must be different from SVC resolver address. Impossible; one constraint must be dropped.

Solution:

The last constraint is wrong: as soon as SVC resolution is supported the SCION service address cannot be the same as the public address and there's nothing we can do about that. Therefore it is wrong to assume they're the same. Therefore let's drop that constraint. To that end:

  • Make the discovery obtain the service address without relying on the "addr" service config field of the topo file.
  • As long as discovery is configure statically, discoverable services info must comes from the topo file; the only one that has info on the whole AS.
  • Therefore Discovery needs to get the service address from a separate field in topo.
    • Work-around for hidden path service: reuse the "addr" field of hidden path service for Discovery. That's possible because:
      • Clients do not try to use direct TCP addressing (though they should).
      • There is no SVC resolution for the hidden path service.
      • The actual service port is the same as the control service which has its own separate configuration section.
    • More general solution: add a field "discoverable_addr" which supplies the Discovery response. When missing, the address field is used for BW compatibility (for what it's worth).

@oncilla
Copy link
Contributor

oncilla commented Aug 16, 2023

Constraints:

  • SCION service address can be dynamic but must be static to be Discoverable.
  • SCION service address can't be the same as SVC resolver (they're in the same address space).
  • Config of public address is used verbatim by direct TCP intra-AS addressing
  • Config of public address is used by clients to access SVC resolver so must be equal to SVC resolver SCION address.
  • Config of public address is returned by Discovery service so must be equal to SCION service address.
    => All three addresses must be equal, yet service address must be different from SVC resolver address. Impossible; one
    constraint must be dropped.

FWIW, I don't think the last constraint holds. The discovery service only exists in the SCION world, not the TCP world, and it advertises the hidden segment services. Thus, while it is true that it needs to announce the SCION service address (because we multiplex hidden segment API and SCION control plane API on the same address), it is not true that it must be the public address.

Work-around for hidden path service: reuse the "addr" field of hidden path service for Discovery. That's possible because:
Clients do not try to use direct TCP addressing (though they should).
There is no SVC resolution for the hidden path service.
The actual service port is the same as the control service which has its own separate configuration section.

I don't think this is a work around. It is the intended use of that field. The discovery service should announce
the hidden segement services with the addresses that are configured in the topology file.

In any case. I think part of the problem is that we use the topology for configuration and discovery, and that we assume every node needs to have the same view of the topology. In an ideal world, I would do away with the topology file altogether (or at least use it for discovery only and not for configuration.).

Another problem is the asymmetry between the TCP/IP world and the QUIC/SCION world.
The public address is used for the API in the TCP world. In the QUIC world it is not.

I have an alternative proposal:

Instead of allocating a dynamic port, for the SCION Service address, we use the public address.
Thus, no matter whether you are in the SCION world or not, you will always use the same port to talk to the service API.

To still support SVC resolution, we add an additional address to the control_service and discover_service elements where the SVC resolution is available. This is used by the control service to initialize the listener.

The topology would look something like this:

{
  "control_service": {
    "cs1-ff00_0_110-1": {
      "addr": "172.20.0.85:30252",
      "svc_resolution": "172.20.0.85:40252"
    }
  },
  "discovery_service": {
    "cs1-ff00_0_110-1": {
      "addr": "172.20.0.85:30252",
      "svc_resolution": "172.20.0.85:40252"
    }
  },
}

As long as we still have a dispatcher this is a backwards compatible change.
In the TCP world nothing changes, in the SCION world, the dispatcher is in charge of delivering SVC packets to the right place. The routers simply forward based on the IP in the topology file, the port is ignored. (We need to have a port if we want to move to a dispatcher-less world.)

Side-note: The port is currently not really important for SVC resolution. In fact, judging from the code, it doesn't even look like it ever appears on the wire.

@oncilla
Copy link
Contributor

oncilla commented Aug 16, 2023

An additional benefit of the alternative proposal is that it allows for full flexibility.

If the control service serves both the SCION control plane API, discovery API and hidden segment API on the same port it can be expressed by using the same port for everything. If it uses different listeners based on API, that can be expressed by using different ports.

Any client wanting to talk to the different APIs does not need to be aware of these internal details.
If they want to talk to the SCION control plane API, they just talk to one of the entries in control_services.
If they want to talk to the hidden segment API, they just talk to one of the entries in hidden_segment_*

@jiceatscion
Copy link
Contributor

jiceatscion commented Aug 17, 2023

I think we are generally in agreement. My reason for preferring to keep SVC Resolver addr and Public addr the same was my assumption that this was an entrenched thing that I'd better not try to change. (Comparing that with simply updating the field used by the Discovery service.) Since you are much for familiar with the code, I trust your judgement that SVC resolution and public addresses can easily be different. Granted that, I agree that your option is preferable. It makes more intuitive sense.

An additional benefits of the alternative is that it allows full flexibility...

I think both proposals do that anyway. Which is what I was aiming for.

I'll give a try to your approach. If I don't hit a roadblock, I'll be happy to adopt it.

@oncilla
Copy link
Contributor

oncilla commented Aug 17, 2023

I'm pretty confident it will work.

On the client side, we simply take the address that we receive during SVC resolution: https://github.com/scionproto/scion/blob/4096d879b05a0a7e287bfb79590941facdf40bf4/private/app/appnet/addr.go#L222C12-L226
The IP address in the SCION header of the SVC resolution reply is not taken into consideration. Nor does the SVC resolution handler on the server side ever set its own port in the UDP payload:

scion/private/svc/svc.go

Lines 191 to 194 in 4096d87

Payload: snet.UDPPayload{
DstPort: udp.SrcPort,
SrcPort: udp.DstPort,
Payload: h.Message,

Now, the SVC resolution packet gets routed to the control service in the following way:

Thus, at no point during the SVC resolution process, the UDP port of the listener is taken into account.

(But again, this only holds as long as we still have the dispatcher. Otherwise, UDP port matters, but only from the router's point of view)

@jiceatscion
Copy link
Contributor

jiceatscion commented Aug 17, 2023

Btw... would the service_resolution address configured for services other than the control service be ever used for anything? In general, only one service will initialize the resolver and the others don't need to care, do they? If that is the case we might not even need to change the schema, only add a dedicated service in the topo file just to configure the resolver. WDYT?

@oncilla
Copy link
Contributor

oncilla commented Aug 17, 2023

The discovery and the control service are conceptually two separate things. Thus, you need to be able to configure them separately from the viewpoint of the router.

That they are served on the same listener is an implementation detail of the monolith

@jiceatscion
Copy link
Contributor

Update Summary:

Making the resolver address != from the public address (and making the SCION address == public address) works.
Configuring the resolver as if it were of a new service works too (at least in the limited integration test deployment).

jiceatscion added a commit to jiceatscion/scion that referenced this issue Aug 18, 2023
@jiceatscion
Copy link
Contributor

Another thing I tried succesfully (after fixing a bug in the dispatcher), is to let the resolver get a dynamic port number by default. This has the advantage of not requiring any change in the configuration schema since there's no need to specify a separate address:port for the resolver.

@oncilla
Copy link
Contributor

oncilla commented Aug 21, 2023

Another thing I tried succesfully (after fixing a bug in the dispatcher), is to let the resolver get a dynamic port number by default. This has the advantage of not requiring any change in the configuration schema since there's no need to specify a separate address:port for the resolver.

That works for now, and can be our preferred option. We will need to revisit during dispatcher removal though.

@jiceatscion
Copy link
Contributor

Another thing I tried succesfully (after fixing a bug in the dispatcher), is to let the resolver get a dynamic port number by default. This has the advantage of not requiring any change in the configuration schema since there's no need to specify a separate address:port for the resolver.

That works for now, and can be our preferred option. We will need to revisit during dispatcher removal though.

Yeah, but then, I doubt that having kept the resolver port static will make the removal critically simpler. We can design a new scheme at that point.

To summarize the options that I have at this point. I can do either or both of:

  1. Allow the explicit configuration of an address for the resolver (as a separate service).
  2. Give a dynamic port to the resolver service if not otherwise configured.

I can also allow the configuration of a separate resolver address for each service, but I am not sure I understand what this would be for. It would also be worth reflecting a bit on the interactions between the server config files and the topology file but if we go for number 1 we can make it a separate conversation.

I'll assume we're happy with just number 1. Anyone thinks it's wrong, speak up!

@jiceatscion
Copy link
Contributor

jiceatscion commented Aug 21, 2023

Ooops, I added some confusion... In my previous comment I swapped both options. What I wanted to suggest was to do just option 2 which if done by itself is a very small change. I'm ok with either, though.

One more thing I'd like your (oncilla@, matzf@, lukedirtwalker@) opinions about before I propose a fix: it seems we have more defaults, fall-backs, and other flexibility features than is really good for us. It makes it easy to misconfigure things without realizing it. I'd like to remove one of those. Now that we are able to move the resolver port out of the way (be it dynamic, explicit, or both), is there any point in defaulting the SCION port of a service to a dynamic port? Would it not make more sense to make it identical to the public address? If so, is it even useful to allow it to be configured to something else?

matzf pushed a commit that referenced this issue Aug 29, 2023
…otely.

Fixes #4364

There were a number of issues:

* Because the normal case is that the registry is remote, the registerer makes no attempt at using the intraAS network to reach the registry (and the registry isn't listening on TCP). However, quic/scion path resolution to an AS-local host was broken: the resulting paths are expected to be complete paths with Metadata, but intraAS paths were meta-data-less paths, resulting in nill-ptr dereference.
* By default the control service port (shared with the hidden_segment registry) was a dynamic port. The discovery service cannot only knows of statically configured port (configured in the topo file). So the only way to make the hidden_segments service discoverable is to give the control service a static port. While possible, this would result in a confusing and conter-intuitive configuration:
  * the control_service quic address would be configured by the cs yaml file and be different from what appears in the topo file for the CS but identical to the hidden_segment service address.
  * the CS address in the topo file was acually the TCP address and could never be mirrored by the quic address (because that quic address was occupied by the service resolver).
  * the address of the hidden_segments registry in the topo file would be different from that of the CS, eventhough they share the same port.
* This was all invisible to the integration test which was skillfully predicting the dynamic port and configuring the discovery service for it.

To correct these infelicities:
* Give the Svc resolver a dynamic port
* By default, give to the CS and the hidden_segment registry the quic address that has the same IP and port as the control service's TCP address.
* Produce complete Path objects to represent intraAS paths.

In passing:
* A handful of AssertEqual() had arguments in the wrong order.
* In one case we were constructing a full SVC address with a nil path instead of an empty path.
@mlegner
Copy link
Contributor Author

mlegner commented Aug 29, 2023

Thank you so much for the great work on this, @jiceatscion!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants