Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Add Kuma service mesh SD mechanism #7919

Closed
austince opened this issue Sep 9, 2020 · 23 comments · Fixed by #8844
Closed

Proposal: Add Kuma service mesh SD mechanism #7919

austince opened this issue Sep 9, 2020 · 23 comments · Fixed by #8844

Comments

@austince
Copy link
Contributor

austince commented Sep 9, 2020

Proposal: Add Kuma SD

Use case. Why is this important?

Kuma is an Envoy-based service mesh and CNCF Sandbox project originally created at Kong. Kuma has been an implementor of the file_sd-based SD mechanism for quite a while. This is implemented as a standalone adapter binary kuma-prometheus-sd and has worked generally well from a tech perspective though this has proven quite difficult for users to adopt, especially when integrating into an existing Prometheus cluster. Even when provided with detailed instructions, this method of integrating Kuma Service Discovery makes the barrier to entry quite high for new users. We think for users to really get the benefit of Prometheus when using Kuma, native integration is by far the best way.

Given this, and since the moratorium on new SD mechanisms has been lifted, we'd love to commit resources and maintain this integration. We would like to use this issue to look for a member that would consider sponsoring this effort, and happy to bring in more of the community to talk about the use case if needed.

/cc @subnetmarco
/cc @jakubdyszkiewicz

@subnetmarco
Copy link

👋 Marco from Kong.

As a maintainer myself I know that it's painful to maintain a third-party integration because there is always a risk that the original contributors stop investing into it, this is why I would like to let you know that we are fully committed to build and improve the Kuma native integration into Prometheus with our own resources since it is very strategic for the Kuma project itself.

@roidelapluie
Copy link
Member

roidelapluie commented Sep 9, 2020

Thanks for your interest in Prometheus.

In general Prometheus should not work with service mesh as we should directly connect to the scraped services. We do not retry and we need to contact every target, and without load balancing (even if the client-side tls is interesting).

@jakubdyszkiewicz
Copy link

Hey @roidelapluie

Kuma has a functionality to access instances directly without load balancing and still securing connections via TLS
https://kuma.io/docs/0.7.1/documentation/dps-and-data-model/#direct-access-to-services

@roidelapluie
Copy link
Member

How does that compare to envoy's own api's? #6484 How many envoy-based Service meshes are out there?

@austince
Copy link
Contributor Author

Here's a non-exhaustive list:

It looks like #6484 was actually filed by a Kuma contributor. Either he (@yskopets) @jakubdyszkiewicz should be able to comment on how it compares to the Envoy APIs.

@jakubdyszkiewicz
Copy link

Generic gRPC / HTTP SD mechanism should also work.

I see there was some discussion about http_sd here https://groups.google.com/g/prometheus-developers/c/3B3jBsErK5M/discussion is any work on this in-progress @roidelapluie ?

@austince
Copy link
Contributor Author

Hey @roidelapluie, sorry to ping you, just wondering if you have any places we can start looking at if we decide to go the gRPC/ HTTP SD route.

@roidelapluie
Copy link
Member

So far there is no consensus for an agnostic "invented" HTTP/gRPC SD.

Concerning Kuma, it looks like you would need to find a sponsor within Prometheus-team to sponsor this service discovery.
I am not personally using Kuma and I am not sure it would fit as a service discovery - without doing to much business logic inside the discoverer, so I will pass.

I also note that even if that's not the case for Kuma, most of service discoveries work or are used inside of Kubernetes-which we support natively. I would suspect that a lot of users use Kuma inside Kubernetes and could use the kubernetes discovery (or kuma could produce CRD consumed by the prometheus operator?).

Last, did you try the DNS-SD ? What would be missing here to have it working properly with kuma DNS discovery?

Thanks.

@austince
Copy link
Contributor Author

Thanks for the quick reply and totally understand your position. One question: what's the best way to look for sponsorship? The mailing list? We've tried the IRC but didn't succeed in getting much conversation going.

Kubernetes might be an option, though Kuma is strongly targets being a universal mesh, so we'll need a solution for both either way.

I'm not sure about the DNS - initially I would guess that might not work in a hybrid solution, but unsure here and again will defer to @jakubdyszkiewicz.

@roidelapluie
Copy link
Member

Yes, you can try the -developers mailing list.

@subnetmarco
Copy link

Just to recap, today Prometheus can leverage Kubernetes or file_sd as the service discovery mechanism.

We are suggesting to introduce either a http_sd or grpc_sd discovery that points to a third-party server (in our case, the kuma-cp control plane) that can dynamically provide the most updated list of targets to Prometheus itself.

@austince maybe we can rename the title of this issue to: "Proposal: Add http_sd or grpc_sd service discovery mechanism".

@austince
Copy link
Contributor Author

austince commented Sep 17, 2020

@subnetmarco, that proposal sounds good to me, though I think we should just close this issue if that's the route we're taking and move the conversation back to #6484 where the grpc_sd is proposed. There's a conversation on the dev list we could try to revive as well here. It looks like the mailing list convo ended around the fact that there is already the generic file_sd -- I think we'll need to show that integrating a current prometheus installation with a file_sd is not as simple as brian's suggested cronjob curl/ sidecar, which is why another more accessible generic solution might be valid.

@roidelapluie
Copy link
Member

As you are proposing in the last comment, let's close it in favour of #6484.

@austince
Copy link
Contributor Author

Looping back from the mailing for completeness, here's the reasoning the DNS SD mechanism is insufficient:

Since the DNS SD only supports basic A, AAAA, and SRV records and a small set of meta labels, the amount of information that can be transparently stored and parsed is limited. Of the three available meta labels, the only field that would make sense to use for holding information is __meta_dns_name. This is limiting when many pieces of info are necessary, as they'd have to be encoded and parsed with some decently complex relabeling steps.

In terms of a service mesh like Kuma, there are quite a lot of labels that are valuable to an end user, for example: the mesh name, the dataplane name, the service name, user-defined tags about the service, etc. These would get quite complex to encode and parse in a single (or even if spread across the three) label supported by the DNS SD mechanism.

@roidelapluie roidelapluie reopened this Feb 19, 2021
@roidelapluie
Copy link
Member

I am willing to investigate a Kuma SD.

@austince
Copy link
Contributor Author

Hey @roidelapluie, thanks for re-opening this. For some reason, my messages back to you on the mailing list are being deleted. I'm happy to do the work to set up an HTTP xDS client and then use that as a base for a Kuma SD. And no worries about "free" work, we're all invested 🙂 . Thanks for all your time on this issue.

@roidelapluie
Copy link
Member

It seems that kuma is not exposing xds over http ?

@austince
Copy link
Contributor Author

Yes, it's internal-only right now so only gRPC has been implemented, but we're already moving forward with an HTTP variant and an API touchup to get ready to expose it publicly. I hope to have this done within the next ~ 2 weeks.

@roidelapluie
Copy link
Member

roidelapluie commented Feb 19, 2021

[[ HTTP would be especially valuable if we can also keep the json structs in this project, which would not add the kuma dependencies. -- not sure how it would play out? is that realistic ? at leats until go 1.17 ]]

@austince
Copy link
Contributor Author

Hmm, that miiiight be possible -- I'll keep you updated. I think at the very least we could hardcode these structs, since they are small and will not change once the API is finalized.

@austince
Copy link
Contributor Author

austince commented Mar 23, 2021

Hi there, writing to give you an update on the status of this. I've implemented a base xDS HTTP discovery mechanism using refresh, as well as the kuma_sd using that base, in austince/prometheus/discovery-xds. I was able to implement it without the kuma/ gRPC dependency and just include the compiled protobuf, stripped of the gRPC service. A docker image is available at docker.io/austince/prometheus:kuma-xds-sd built from this branch as well. The only thing left to do is the documentation, which I'll do before opening a PR.

The Kuma community has been focused on a recent, large 1.1 release and has not been able to devote time to get in the HTTP API to the main branch, but it is implemented in kumahq/kuma/feat/mads/v1.

I have waited to open a PR for the SD, as we should make sure it works against a stable kuma release before adding it into prometheus, but wanted to share the progress with you and make sure you like the direction the SD is headed in.

@austince
Copy link
Contributor Author

I was not able to just use a simple JSON struct, unfortunately, as protobuf-encoded JSON is a special form that must be parsed with a protobuf lib. gogo/protobuf is still used here (though the official golang/protobuf is used by vendored packages), which makes consuming external protos difficult. I've raised a discussion on the dev mailing list here, to see what the community thinks/ would be ok shifting back to the official lib: https://groups.google.com/u/1/g/prometheus-developers/c/uFWRyqZaQis

@austince
Copy link
Contributor Author

@roidelapluie somewhat related to the generic http_sd proposal, though xDS provides the ability to only send updates when required which makes for a more efficient implementation.

@prometheus prometheus locked as resolved and limited conversation to collaborators Jan 19, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants