Modularize Discovery Protocols as Pods #198

kate-goldenring · 2021-01-11T21:55:46Z

Is your feature request related to a way you would like Akri extended? Please describe.
Currently, protocol discovery is all implemented in the Akri Agent, which as explained in the protocol extension proposal makes the Agent bigger than needed (if someone is not using all the protocols), creates a larger attack surface, and requires discovery protocols to be written in rust.
Akri should be able to be deployed with only the desired discovery protocols, and discovery protocols should be able to be added without changing Akri's core components (the Agent, Controller, and Configuration CRD). This would make extending Akri easier.

Describe the solution you'd like
This could be done by implementing idea 3 of the protocol extension proposal, wherein protocols are implemented as pods and exposed over a gRPC interface. The DiscoveryHandler's two methods are_shared and discover could be exposed over this interface.

Additionally, the Configuration CRD would need to be modified so protocols are generic rather than explicitly defined in the CRD. Potentially as a map, as mentioned in this thread.

Also, the Agent should still be able to be compiled with DiscoveryHandlers in case users want the smaller footprint of one Agent pod rather than Agent pod + 1 pod for each DiscoveryHandler.

Describe alternatives you've considered
Other ideas listed in the protocol extension proposal, such as exposing the Agent as a library.

Additional context
The Zeroconf protocol #163 increases the Agent's size considerably and would benefit from running in its own Pod.

The text was updated successfully, but these errors were encountered:

jiayihu · 2021-01-11T22:34:38Z

I'll try to add some thoughts I have on my mind. I personally don't know gRPC, I've used only old RPC in my past so I don't have anything against it. I'd like just to point some benefits of REST over gRPC:

I think it's easier for people working with REST/HTTP rather than gRPC. I'm talking about protocol knowledge, libraries, debugging tools etc.
The agent don't even need to actually run as pod inside the cluster. As far as agent and handler are able to communicate HTTP in some way, they can talk to each other. Not sure if gRPC can also do it, maybe yes since I think it has been designed at L7 level for that purpose.
- This allows to locally test the handler during development. I think it's also easier to test HTTP rather than gRPC, there are many tools to work with HTTP like curl, Postman, etc.
- This allows the discovery handler to be some external service, could be helpful for use cases where devices are managed by some service not within the cluster.
- A random thought: do we even need a broker-per-instance? Suppose I have my own external broker which handles discovery and communication with the devices. I think that's pretty common. In that case the cluster is only interested in knowing about those devices, via the agent <---> externalBroker communication, right? Then provisioning a device would mean allowing a connection between a cluster-pod to the external broker, without a in-cluster-broker.
- RESTful communication is being supported with non-TCP protocols like CoAP which uses UDP. We give even more freedom to implementations.

Not strictly about REST:

What do you think about reversing the relationship? Instead of having the agent polling the discovery handler, we could have discovery handlers notify the agent pushing the discovered devices. This could be potentially much less work for the agent.

kate-goldenring · 2021-01-11T22:59:17Z

I'm in the opposite boat, more familiar with gRPC than REST; however, I can see how REST may be better, especially since discovery handlers would only need to support two endpoints: one for discover that returns all devices and the other for are_shared to specify whether the devices can be accessed/seen by more than one node.

For testing gRPC, there is a grpCURL tool that @DazWilkin blogged about using to inspect Akri's Device Plugin's with are gRPC over UDS.

For your point about brokers not always being necessary, I agree. Brokers are currently optional, and Akri can just discover (which could mean reaching out to an external broker that exposes the REST/gRPC interface) and expose those devices to the cluster as Kubernetes resources, but not deploy a broker to the device. Is this what you were thinking?

As for push vs pull, I was wondering the same. Pulling/polling makes sense with the current flow of applying a Configuration kicking off action in Akri. In a pull scenario, the Agent would see a Configuration and then reach out to some endpoint specified in the Configuration to discover devices. If it was a push scenario, what would the Agent do in response to seeing a Configuration? Would it be ready to receive information about devices from an endpoint specified in a Configuration?

jiayihu · 2021-01-11T23:34:20Z

Akri can just discover (which could mean reaching out to an external broker that exposes the REST/gRPC interface) and expose those devices to the cluster as Kubernetes resources, but not deploy a broker to the device

Yes exactly

If it was a push scenario, what would the Agent do in response to seeing a Configuration? Would it be ready to receive information about devices from an endpoint specified in a Configuration?

Good question indeed. Does it need to do something though? It could maybe have only one endpoint/interface for receiving discovered devices, but a new configuration would mean that more device types are accepted. Otherwise it could reject unknown configurations.

I don't know if this is the right issue but I have an issue on my mind with CoRE. CoRE via CoAP basically works this way:

You send GET :5683:well-known/core via multicast or by knowing the device IP.
The device responds with a list of REST resources it can offer, such as /temperature and /humidity with types oic.r.temperature and oic.r.humidity as defined by IANA. (Actually I don't know if oic.r.humidy exists)
How should I register this information?

I can register the device and it has a generic name like coap-core-614135. From its metadata, I can see the applied properties like oic.r.temperature: /sensors/temp. Now if the cluster needs a temperature sensor, it can search through the akrii list and look for an instance with oic.r.temperature. Then look for the broker associated and allow to reach its service.

This flow is reasonable but I wonder if it could be improved or what use cases cannot be solved. What if I want a broker which can aggregate all temperature measurements from the devices to give a more accurate result?

By allowing the discovery handler to push new devices instead of polling, maybe we can support a big number of devices types. ~~Currently a Configuration means a discovery protocol AFAIK, whereas it could mean a device type instead.~~ It would be impractical to have a discovery handler for each resource type and ask the agent to poll each one. The reverse seems to be more pratical instead. I can have a discovery handler per protocol and push new devices along with new resource types. The agent can decide what to accept.

I hope it makes sense and can add another perspective to this issue.

kate-goldenring · 2021-01-12T17:16:22Z

~~Currently a Configuration means a discovery protocol AFAIK, whereas it could mean a device type instead. ~~

Maybe you crossed out this line because you came to the same conclusion, but while a discovery protocol is specified in a Configuration, a Configuration is specific to a device type, which is why you also specify filters in a Configuration. For example, for a Configuration to discover local video devices, you specify the udev protocol and the filter (udev rule) KERNEL=video[0-9]*.

That's a good point that, currently, we do discovery for each Configuration/device type, so the more device types/ Configurations, the more polling the Agent has to do.

This flow is reasonable but I wonder if it could be improved or what use cases cannot be solved. What if I want a broker which can aggregate all temperature measurements from the devices to give a more accurate result?

Are you wondering about whether we could have one broker for multiple instances? Currently, we only have one "broker deployment strategy" of one broker per instance. This is a limiting use case, and we'd love to support others, such as the one you mention, which we describe as "Instance Pooling" in our broker deployment strategies proposal

kate-goldenring · 2021-01-12T17:45:45Z

If we use the method of discovery handler pods pushing discovery results to the agent, how do we prevent a denial of service attack or the Agent being overloaded by too many results?

DazWilkin · 2021-01-12T18:17:07Z

I'm familiar with both gRPC and REST. REST is conventionally JSON and HTTP but need not be; Constrained RESTful Environments (CoRE) is REST/UDP . gRPC is conventionally protobuf messages over HTTP/2 but need not be.

It is true that there are more developer tools available with REST but this is partly because REST is more common. Whether REST-like or RPC-like is chosen, Akri should consider providing SDKs for the popular languages (via OpenAPI or Protos) so that Akri discovery protocol developers can use a known-good SDK rather than be exposed to what should be an implementation detail.

Request-Response, polling, streaming, bidirectional etc are implementation decisions and don't affect, for example, where handlers and agents may run (both permit running handlers remotely).

gRPC is probably better if:

Consistency with a in-process Rust mechanism is desired; because gRPC is RPC it is closer to the current Rust SDK
We consider protocol implementations to be Akri microservices; peers rather than clients?
Bidirectional|Streaming is desired; gRPC clients and servers can stream events to one another
gRPC provides mechanisms that expose services as REST/HTTP for non-gRPC uses
With apologies for being circular, gRPC Is probably better if RPC better matches the interaction pattern

REST is probably better if:

It is considered important to ease the debugging of handler client-server (!) flows; if both must be servers, is this complex?
Protocol implementations are clients of Akri; if not, would we need to add e.g. WebSockets for callbacks?
OpenAPI describes sufficient typing to offset any benefits of protobufs
With apologies for being circular, REST Is probably better if the Akri discovery host is more of a set of resources with state

Kubernetes' components frequently use gRPC service-service but the API Server (clients-service) is almost exclusively REST.

jiayihu · 2021-01-12T19:42:40Z

I think k8s is a good example of gRPC vs REST indeed. For internal communication between parts gRPC is more immediate because it's closer to the SDK/language itself. For more higher-level communication between heterogeneous components REST has more advantages. So I agree gRPC or REST can be decided later after choosing the agent architecture.

Maybe you crossed out this line because you came to the same conclusion, but while a discovery protocol is specified in a Configuration, a Configuration is specific to a device type, which is why you also specify filters in a Configuration.

I crossed because I couldn't think of anything better 😄 I still feel that a Configuration is a more a discovery protocol than device type and you can filter what is discovered. For instance, to support CoRE I need to add a new Configuration. But CoRE is not a device type, it's a discovery protocol indeed. Actually I don't even think there is a concept of device types in CoRE, both CoRE and CoAP talk about resource types because a device might provide different sensor data like both temperature and humidity. It doesn't matter what the device types is. By talking about resources types instead of device types, the latter don't matter. A camera could provide both video and audio resources. Or environment temperature could be provided by different device types under the same resource type.

If we use the method of discovery handler pods pushing discovery results to the agent, how do we prevent a denial of service attack or the Agent being overloaded by too many results?

If you mean malicious DOS, I believe it's out of the scope of akri. If you allow external services to communicate in any way to your cluster it's up to you to make it safe. Overloading could instead solved by horizontal scaling maybe? In the case of many results, even the pull approach would have severe issues. Maybe it would even be harder to scale horizontally because you should decide how to distribute the polling operations between the agents.

kate-goldenring · 2021-01-21T22:49:12Z

I put together a spec on hackmd, pulling in a lot of this discussion into a design that I am working on implementing now. Would love to hear any comments people have on HackMD.

DazWilkin · 2021-01-22T02:59:38Z

Good job!

I tried to comment from my phone but I was challenged 😄 After switching to a laptop...

I've added some preliminary feedback.

I need to better educate myself on the message flows between an Agent and remote DiscoveryHandler but I think, since we already have the in-process calling mechanism defined (by the Rust SDK), and, if we agree that this mechanism is correct, then the RPC implementation should closely (exactly?) model it. gRPC's tagline after all, should be something like "develop as if you were just using an infinitely large computer"!

kate-goldenring · 2021-01-22T16:31:56Z

Thanks for the feedback @DazWilkin and @jiayihu, it has led me to pivot a bit. It makes sense for the response to an Agent's call on a DH to be a streamed response, truly like ListAndWatch in Device Plugin -- I was previously comparing polling to ListAndWatch which probably was confusing. This allows the DH to determine the rate at which to update the agent instead of vice versa, since it is likely to be protocol specific.

Specifically, in the proto file, what was:

service Discovery {
  rpc Discover (DiscoverRequest) returns (DiscoverResponse);
}

Would have a streamed response:

service Discovery {
  rpc Discover (DiscoverRequest) returns (stream DiscoverResponse);
}

kate-goldenring linked a pull request Jan 11, 2021 that will close this issue

[Extensibility] Zeroconf #163

Closed

8 tasks

kate-goldenring added the enhancement New feature or request label Jan 11, 2021

kate-goldenring self-assigned this Jan 11, 2021

jiayihu mentioned this issue Jan 12, 2021

Allow conditional compilation of agent protocols #196

Merged

8 tasks

jiayihu mentioned this issue Jan 21, 2021

[Extensibility] CoAP CoRE #203

Closed

8 tasks

kate-goldenring mentioned this issue Jan 25, 2021

Update key dependencies: kube-rs, tonic, and tokio #223

Closed

kate-goldenring mentioned this issue Feb 23, 2021

Enable Discovery Handlers as Pods #252

Merged

8 tasks

kate-goldenring closed this as completed in #252 Mar 10, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Modularize Discovery Protocols as Pods #198

Modularize Discovery Protocols as Pods #198

kate-goldenring commented Jan 11, 2021

jiayihu commented Jan 11, 2021 •

edited

Loading

kate-goldenring commented Jan 11, 2021 •

edited

Loading

jiayihu commented Jan 11, 2021 •

edited

Loading

kate-goldenring commented Jan 12, 2021 •

edited

Loading

kate-goldenring commented Jan 12, 2021

DazWilkin commented Jan 12, 2021

jiayihu commented Jan 12, 2021 •

edited

Loading

kate-goldenring commented Jan 21, 2021

DazWilkin commented Jan 22, 2021 •

edited

Loading

kate-goldenring commented Jan 22, 2021 •

edited

Loading

Modularize Discovery Protocols as Pods #198

Modularize Discovery Protocols as Pods #198

Comments

kate-goldenring commented Jan 11, 2021

jiayihu commented Jan 11, 2021 • edited Loading

kate-goldenring commented Jan 11, 2021 • edited Loading

jiayihu commented Jan 11, 2021 • edited Loading

kate-goldenring commented Jan 12, 2021 • edited Loading

kate-goldenring commented Jan 12, 2021

DazWilkin commented Jan 12, 2021

jiayihu commented Jan 12, 2021 • edited Loading

kate-goldenring commented Jan 21, 2021

DazWilkin commented Jan 22, 2021 • edited Loading

kate-goldenring commented Jan 22, 2021 • edited Loading

jiayihu commented Jan 11, 2021 •

edited

Loading

kate-goldenring commented Jan 11, 2021 •

edited

Loading

jiayihu commented Jan 11, 2021 •

edited

Loading

kate-goldenring commented Jan 12, 2021 •

edited

Loading

jiayihu commented Jan 12, 2021 •

edited

Loading

DazWilkin commented Jan 22, 2021 •

edited

Loading

kate-goldenring commented Jan 22, 2021 •

edited

Loading