Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Extensibility] ZeroConf #132

Closed
DazWilkin opened this issue Nov 17, 2020 · 28 comments
Closed

[Extensibility] ZeroConf #132

DazWilkin opened this issue Nov 17, 2020 · 28 comments
Assignees
Labels
enhancement New feature or request stale

Comments

@DazWilkin
Copy link
Contributor

**Is your feature request related to a problem? Please describe.**g

I've begun prototyping (!) a ZeroConf protocol implementation for Akri

Is your feature request related to a way you would like Akri extended? Please describe.

ZeroConf is probably (!) a useful protocol implementation particularly due to its reliable naming of (transient) devices.

Describe the solution you'd like

Akri protocol implementation with Broker samples.

@DazWilkin DazWilkin changed the title [Extensibility] ZeroConfjd [Extensibility] ZeroConf Nov 17, 2020
@kate-goldenring
Copy link
Contributor

@DazWilkin, can you put in a PR with a proposal for what you envision? Maybe something like the one for OPC UA that provides background on zero conf, what filters/settings will be a part of the Configuration, and any security considerations?

@DazWilkin
Copy link
Contributor Author

Yes, I can draft something.

To set expectations: I'm mostly unfamiliar with ZeroConf and am writing more of a proof-of-concept using it to learn more about ZeroConf, Akri and Rust. I don't have any production uses of ZeroConf.

It would be useful to read a proposal written by folks that are familiar with ZeroConf and who use it in production.

I'll draft a proposal and include programming notes, concerns, omissions, feature requests etc.

https://github.com/DazWilkin/akri/blob/proposal-zeroconf/docs/proposals/zeroconf.md

@DazWilkin
Copy link
Contributor Author

@kate-goldenring @bfjelds
Should I raise this as a separate issue? Or would you prefer I continue to keep ZeroConf-related questions here?

The zeroconf crate has a bunch of Linux dependencies, see #104 (comment) and it requires that the agent be started with several daemons.

Dockerfile.agent was:

ENTRYPOINT ["./agent"]

To:

ENTRYPOINT ["bash","-c","service dbus start && service avahi-daemon start && ./agent"]

This uncommon approach yields a shell running the 2 services and the agent.

Problems:

@kate-goldenring
Copy link
Contributor

Hey Daz,

I think we can keep this issue as our thread for Zeroconf questions.

Maybe on Slack we can make a shout out to see who has experience with Zeroconf. Like you said, it would be great to get their input on a proposal. Zeroconf is also new to me.

As to Zeroconf bulking up the Agent, hopefully this is only a temporary issue that will be mitigated once we have implemented new extensibility models like options 3 and 4 in the proposal. Instead of the agent having to contain support for all protocols, the discovery handler portion could be its own pod or the agent could be exposed as a library, only bringing in support for requested protocols. These implementations are on the roadmap for us. When implemented, we are thinking we will port all existing protocols over to the new model. So while this is messy, do you think a better extensibility model removes the mess?

@DazWilkin
Copy link
Contributor Author

@kate-goldenring yes, that would be excellent.

The current issue (mega-Agent) is that everyone must necessarily run every protocol (and all their encumbrances and vulnerabilities). By moving to a per-protocol Agent, not only does the user get more focused Agents but these may be optimized for Akri's and the protocol's dependencies and maintained (patching) on a cycle specific to the protocol and its deps.

@DazWilkin
Copy link
Contributor Author

Ok, I have a working prototype of ZeroConf.

A single mDNS service (18628d4f173f.local) is running

Confirmed by avahi-browse:

avahi-browse --all
+ docker0 IPv4 18628d4f173f                                  Web Site             local

After deploying the agent, its (!) logs report finding the service:

[zeroconf:new] Entered
[zeroconf:discover] Entered
[zeroconf:discovery] Started browsing
[zeroconf:discovery:λ] Service Discovered: ServiceDiscovery { name: "18628d4f173f", kind: "_http._tcp", domain: "local", host_name: "18628d4f173f.local", address: "172.17.0.3", port: 8080, txt: Some(AvahiTxtRecord(UnsafeCell)) }
[zeroconf:discovery] Stopped browsing
[zeroconf:discovery] Iterating over services
[zeroconf:discovery] Service: ServiceDiscovery { name: "18628d4f173f", kind: "_http._tcp", domain: "local", host_name: "18628d4f173f.local", address: "172.17.0.3", port: 8080, txt: Some(AvahiTxtRecord(UnsafeCell)) }

A single Akri instance is deployed:

kubectl get instances
NAME              CONFIG     SHARED   NODES    AGE
zeroconf-e7f45d   zeroconf   true     [akri]   2m

And:

for INSTANCE in $(kubectl get instances --output=jsonpath="{.items[].metadata.name}")
do
  POD="pod/akri-${INSTANCE}-pod"
  kubectl logs ${POD}
done

Yields:

[zeroconf:main] Entered
[zeroconf:new] Entered
[zeroconf:new]
  Kind: _http._tcp
  Name: 18628d4f173f
  Host: 18628d4f173f.local
  Addr: 172.17.0.3
  Port: 8080
[zeroconf:main] Service: kind: _http._tcp
name: 18628d4f173f
host: 18628d4f173f.local
addr: 172.17.0.3
port: 8080
[zeroconf:main:loop] Sleep
[zeroconf:main:loop] check_device(Service { kind: "_http._tcp", name: "18628d4f173f", host: "18628d4f173f.local", addr: "172.17.0.3", port: 8080 })
[zeroconf:read_device] Entered: Service { kind: "_http._tcp", name: "18628d4f173f", host: "18628d4f173f.local", addr: "172.17.0.3", port: 8080 }
[zeroconf:main:loop] Sleep
[zeroconf:main:loop] check_device(Service { kind: "_http._tcp", name: "18628d4f173f", host: "18628d4f173f.local", addr: "172.17.0.3", port: 8080 })
[zeroconf:read_device] Entered: Service { kind: "_http._tcp", name: "18628d4f173f", host: "18628d4f173f.local", addr: "172.17.0.3", port: 8080 }
[zeroconf:main:loop] Sleep

And, we can examine the Instance's environment:

kubectl exec --stdin --tty ${POD} -- env | grep ^AKRI

Yields:

AKRI_ZEROCONF_DEVICE_KIND=_http._tcp
AKRI_ZEROCONF=zeroconf
AKRI_ZEROCONF_DEVICE_HOST=18628d4f173f.local
AKRI_ZEROCONF_DEVICE_NAME=18628d4f173f
AKRI_ZEROCONF_DEVICE_PORT=8080
AKRI_ZEROCONF_DEVICE_ADDR=172.17.0.3

@DazWilkin
Copy link
Contributor Author

@DazWilkin DazWilkin reopened this Nov 23, 2020
@DazWilkin
Copy link
Contributor Author

Thanks to @kate-goldenring's suggestion, I've written a ZeroConf filter parser (using Pest for ZeroConf.

I've encountered some issues with astro-dnssd e.g. AstroHQ/astro-dnssd#17 and am currently continuing to use zeroconf because it works well on Linux and is easier for me to test.

I continue to be challenged to understand how I'll be able to easily test x-platform.

I wonder whether it would be better if I were to focus on building this protocol as a distinct (standalone repo) project as a proof-of-concept and not submit a PR here instead of continuing to bug you both? I'm happy to proceed either way but want to continue to be respectful of your time.

@kate-goldenring
Copy link
Contributor

Hey Daz! Great to hear that the filter parsing is working out. That will be a really nice way for people to customize discovery. I think it would be great to see your Zeroconf implementation added into main. I think it is fine to go with the zeroconf crate and if the astro-dnssd folks respond to the issue thread, we can think about adding in more x-platform support in the future. And someone else can pick that up later too if that is something they are really wanting support for.

How does this sound for a plan of action:

  1. Submit PR for Zeroconf proposal. Some of the adjustments I'd make to your current zeroconf proposal would be to readdress the outstanding questions and feature request sections. Per our conversation on slack, some notes could be added on how the K8s service doesn't have to be the same protocol as the zeroconf connection and maybe give an example broker scenario from our discussion over slack. These might go in a new/different section. Also, if you have any ideas about the security considerations -- do creds need to be passed? Those notes would be great to add. A note on x-platform support would also be great. I can make comments on it in the PR, too. We can try to get this merged in by EOW.
  2. Put in a PR for your zeroconf discovery handler with a zeroconf-configuration.md file that explains how to customize a zeroconf helm template and filter for specific devices using your parser. I can help demystify helm templating if you need any help. I think we can add in zeroconf support without a custom broker created yet. Do you think a sample broker is necessary right away? @bfjelds what do you think?
  3. Someone creates a custom broker. If you're up for it, great! If you are feeling zeroconfed out, we can pass the broker work off to someone else.

@DazWilkin
Copy link
Contributor Author

OK, the protocol is working, so I'll submit that PR today and kick-off that review... I've really tried (again) to keep the documents current with the code but, it's tough (and more so because of the manual Agent updates, not criticizing, just is).

BTW Do you have a process for automating the generation of Markdown files to include code snippets, image references etc.? It seems (!?) as though many READMEs should be templated so that, underlying code changes are just reflected in documents when rebuilt; corollary: it's tough to keep docs sync'd with code manually.

I'll review the proposal and your feedback on Slack and here and incorporate your recommendations. I'll submit that PR today too.

I've done some Helm templating previously. I think the current approach should suffice but I'm always open to feedback.

I've a sample broker in the implementation and am using avahi-publish to create a test service (easier).

@DazWilkin
Copy link
Contributor Author

Sorry, please add a branch to deislabs/akri perhaps zeroconf-extensibility and I'll submit the PR there.

@DazWilkin DazWilkin mentioned this issue Dec 7, 2020
8 tasks
@kate-goldenring
Copy link
Contributor

Sorry, please add a branch to deislabs/akri perhaps zeroconf-extensibility and I'll submit the PR there.

Can you submit the PR into main instead of a branch? I dont see the reason for having it live in a branch.

@kate-goldenring
Copy link
Contributor

kate-goldenring commented Dec 7, 2020

For the Markdown templating, are you thinking of having a template for all protocols: <protocol>-configuration.md? or wondering if there is a way to point to code snippets from within the Markdown so that if the code updates the Markdown automatically updates to display the new code? Both would be cool. I am not sure how to do the latter, though.

@DazWilkin
Copy link
Contributor Author

We decided to add the HTTP protocol to a branch (http-extensibility), so for ZeroConf, thinking similar for consistency!?

@DazWilkin
Copy link
Contributor Author

For Markdown templating, yes, the latter.

I think it would be really useful to be able to write (assuming this comment is a Markdown doc in the project):

E.g.

You may revise the Kubernetes configuration to include a filter:

${{ https://github.com/DazWilkin/akri/blob/.../samples/brokers/zeroconf/kubernetes/zeroconf.yaml#L6-8 }}

And have the pre-processor replace that with:

  protocol:
    zeroconf:
      filter: 'kind="_http._tcp"'

Then, if the underlying configuration changes, forgetful me doesn't have to remember to come back and change the doc reference.

Also (and as common), updating docker container image references:

E.g.

Run the following command to start a device:

docker run ghcr.io/deislabs/akri/http-device:${{ git rev-parse HEAD }}

And have the pre-processor replace that with:

docker run ghcr.io/deislabs/akri/http-device:1234567890abcdef1234567890abcdef12345

@kate-goldenring
Copy link
Contributor

We decided to add the HTTP protocol to a branch (http-extensibility), so for ZeroConf, thinking similar for consistency!?

The HTTP protocol seemed to be intended as a replacement for Nessie, or an example of how to extend Akri that was more applicable. In my head, if people want to learn how to extend Akri, they walk through extensibility.md and create the HTTP implementation. And this walk through will hopefully be easier when we get a new (more modularized) extensibility model. So, HTTP is ultimately a markdown file (extensibility.md) and a branch that exists for "checking your work".

I didn't think of HTTP as being intended to be a full-fledged protocol implementation that Akri supported since it's occurrence stemmed from the issue of Nessie not making sense or being clear enough. Maybe it should be the example and also a supported protocol? Or could be built out into being one.

Zeroconf I saw as being a protocol implementation intended to be merged into main from the start.

@DazWilkin
Copy link
Contributor Author

Your call... just let me know and I'll submit the PR.

@kate-goldenring
Copy link
Contributor

When you feel like it is in a good state, go ahead and put in a PR into main with your zeroconf implementation. Thanks for all the work!

@DazWilkin
Copy link
Contributor Author

It's working.

I added the refinement that we discussed in the proposal discussion so that services are filtered by any term.

I'll sync w/ main and then if it still builds, I'll submit the PR today

Thank you for all your support!

@DazWilkin DazWilkin mentioned this issue Dec 8, 2020
8 tasks
@DazWilkin
Copy link
Contributor Author

PR: #163

@bfjelds bfjelds added the enhancement New feature or request label Dec 14, 2020
@DazWilkin
Copy link
Contributor Author

This is in a working state and I have not received any response to the astro-dnssd isssue.

I'm at an impasse pending progress deconstructing the Agent monolith and slightly concerned at having to live with bit rot on this pending PR. Thoughts?

I'm considering a new, personal project in January but don't want to lose touch with Akri, are there other things that you think could keep me busy without me building up too much 'debt'?

@bfjelds
Copy link
Collaborator

bfjelds commented Dec 19, 2020

Generally, I don't have an issue with the PR hanging out waiting for the monolith-breaker to get designed. But I get your concern!

As for other contributions, a few things you've proposed jump to mind:

  • capacity&allocate cleanup
  • add podspec and servicespec validation to create

And the big fish would be designing a gRPC interface that would allow your zeroconf to exist outside of the agent container (number 3 here: https://github.com/deislabs/akri/blob/main/docs/proposals/simple-protocol-extension.md)

@DazWilkin
Copy link
Contributor Author

Thanks for taking the time to respond.

Interesting that you propose the ... Agent gRPC protocol. I think this would be very interesting.

I thought about genericising (word?) handlers recently after chatting with @kate-goldenring and tried to conceptualize how it would work.

The Agent gRPC client and server interface is straightforward but it seems it would need to marshall DiscoveryHandlerConfig as pb.Any to be generic and we'd lose strong-typing at the Agent and in the Configuration spec.

Or it needs to introspect.

Unless you all have thought through this already, perhaps I could prototype and see where it breaks?

@bfjelds
Copy link
Collaborator

bfjelds commented Dec 20, 2020

Agreed. My thoughts for the CRD protocol configs are that they would become something generically structured as a map like:

"<protocol name>": 
     "<key>": "<value>"

The agent would send the config blob to all gRPC handlers registered for the <protocol name>.

I would love to see the prototype or your take on how to do this!

@DazWilkin
Copy link
Contributor Author

OK, I was thinking about the gRPC protocol this morning.

I plan to submit a PR for the Webhook (hopefully) this week and then to start working on the gRPC protocol implementation.

I assume this remains a green-field and there's no reason why I should not explore (will use the map proposal).

@kate-goldenring
Copy link
Contributor

kate-goldenring commented Jan 11, 2021

OK, I was thinking about the gRPC protocol this morning.

I plan to submit a PR for the Webhook (hopefully) this week and then to start working on the gRPC protocol implementation.

I assume this remains a green-field and there's no reason why I should not explore (will use the map proposal).

Hey Daz! I started working on (the design for) gRPC protocol implementation today. I should have created an issue earlier to track the progress. Will do that now! I plan to have the implementation done by the end of the month. Maybe I can put up a design doc for it on HackMD and we can brainstorm there?

@DazWilkin
Copy link
Contributor Author

@kate-goldenring thanks for the update... I was just reading the thread on Slack too.

I'll provide input but won't start on this.

@github-actions
Copy link
Contributor

Issue has been automatically marked as stale due to inactivity for 45 days. Update the issue to remove label, otherwise it will be automatically closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request stale
Projects
Status: Done
Development

No branches or pull requests

3 participants