Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues reaching metrics-api pod from web pod when accessing the dashboard #7339

Closed
pankajmt opened this issue Nov 23, 2021 · 8 comments
Closed
Milestone

Comments

@pankajmt
Copy link

Bug Report

What is the issue?

When accessing the dashboard, see issues connecting to the metrics-api

image

How can it be reproduced?

2.11.1 install in standard namespace. Not we run a patched version of proxy-init and controller images to be able to run workload pods with a psp which is not run as root, but that should not affect this finding.

Logs, error output, etc

(If the output is long, please create a gist and
paste the link here.)

web pod logs have
time="2021-11-23T02:27:12Z" level=error msg="HTTP error, status Code [403] (unexpected API response)"

metrics api logs show
[ 32346.561203s] INFO ThreadId(01) inbound:server{port=8085}:rescue{client.addr=10.213.65.7:44320}: linkerd_app_core::errors::respond: Request failed error=unauthorized connection on server metrics-api

kubectl get pods -o wide

NAME                                   READY   STATUS    RESTARTS   AGE   IP              NODE                                                 NOMINATED NODE   READINESS GATES
grafana-79f7549cc8-znrqt               2/2     Running   0          9h    10.213.65.137   gke-dev-apps-1-dev-apps-1-np-default-35e60981-xs7d   <none>           <none>
linkerd-oauth2-proxy-b8567ddd6-mm6zb   2/2     Running   0          8h    10.213.65.10    gke-dev-apps-1-dev-apps-1-np-default-c3af4f1f-199g   <none>           <none>
metrics-api-57788b88d5-l7lbl           2/2     Running   0          9h    10.213.64.139   gke-dev-apps-1-dev-apps-1-np-default-35e60981-q82d   <none>           <none>
prometheus-54fdfbbbc8-4wg2q            2/2     Running   0          9h    10.213.64.140   gke-dev-apps-1-dev-apps-1-np-default-35e60981-q82d   <none>           <none>
tap-5f7b7d5d6d-778vg                   2/2     Running   0          9h    10.213.65.146   gke-dev-apps-1-dev-apps-1-np-default-35e60981-xs7d   <none>           <none>
tap-5f7b7d5d6d-84n7l                   2/2     Running   0          9h    10.213.64.134   gke-dev-apps-1-dev-apps-1-np-default-35e60981-q82d   <none>           <none>
tap-5f7b7d5d6d-c6lp8                   2/2     Running   0          8h    10.213.65.9     gke-dev-apps-1-dev-apps-1-np-default-c3af4f1f-199g   <none>           <none>
tap-injector-7d676db75-7ddb8           2/2     Running   0          9h    10.213.65.138   gke-dev-apps-1-dev-apps-1-np-default-35e60981-xs7d   <none>           <none>
web-566bc79fd8-rlj2s                   2/2     Running   0          8h    10.213.65.7     gke-dev-apps-1-dev-apps-1-np-default-c3af4f1f-199g   <none>           <none>

linkerd check output

linkerd check
Linkerd core checks
===================

kubernetes-api
--------------
√ can initialize the client
√ can query the Kubernetes API

kubernetes-version
------------------
√ is running the minimum Kubernetes API version
√ is running the minimum kubectl version

linkerd-existence
-----------------
√ 'linkerd-config' config map exists
√ heartbeat ServiceAccount exist
√ control plane replica sets are ready
√ no unschedulable pods
√ control plane pods are ready
√ cluster networks contains all node podCIDRs

linkerd-config
--------------
√ control plane Namespace exists
√ control plane ClusterRoles exist
√ control plane ClusterRoleBindings exist
√ control plane ServiceAccounts exist
√ control plane CustomResourceDefinitions exist
√ control plane MutatingWebhookConfigurations exist
√ control plane ValidatingWebhookConfigurations exist

linkerd-identity
----------------
√ certificate config is valid
√ trust anchors are using supported crypto algorithm
√ trust anchors are within their validity period
√ trust anchors are valid for at least 60 days
√ issuer cert is using supported crypto algorithm
√ issuer cert is within its validity period
√ issuer cert is valid for at least 60 days
√ issuer cert is issued by the trust anchor

linkerd-webhooks-and-apisvc-tls
-------------------------------
√ proxy-injector webhook has valid cert
√ proxy-injector cert is valid for at least 60 days
√ sp-validator webhook has valid cert
√ sp-validator cert is valid for at least 60 days
√ policy-validator webhook has valid cert
√ policy-validator cert is valid for at least 60 days

linkerd-version
---------------
√ can determine the latest version
√ cli is up-to-date

control-plane-version
---------------------
√ can retrieve the control plane version
√ control plane is up-to-date
√ control plane and cli versions match

linkerd-control-plane-proxy
---------------------------
√ control plane proxies are healthy
‼ control plane proxies are up-to-date
    some proxies are not running the current version:
	* linkerd-destination-859c6f4958-4rwg6 (stable-2.11.1..patched.1)
	* linkerd-destination-859c6f4958-72k7f (stable-2.11.1..patched.1)
	* linkerd-destination-859c6f4958-cn5rl (stable-2.11.1..patched.1)
	* linkerd-identity-68885f4fd9-bp575 (stable-2.11.1..patched.1)
	* linkerd-identity-68885f4fd9-cc8hl (stable-2.11.1..patched.1)
	* linkerd-identity-68885f4fd9-ptnpt (stable-2.11.1..patched.1)
	* linkerd-proxy-injector-5bb56577b6-fckf2 (stable-2.11.1..patched.1)
	* linkerd-proxy-injector-5bb56577b6-h5pwh (stable-2.11.1..patched.1)
	* linkerd-proxy-injector-5bb56577b6-v4p4l (stable-2.11.1..patched.1)
    see https://linkerd.io/2.11/checks/#l5d-cp-proxy-version for hints
‼ control plane proxies and cli versions match
    linkerd-destination-859c6f4958-4rwg6 running stable-2.11.1..patched.1 but cli running stable-2.11.1
    see https://linkerd.io/2.11/checks/#l5d-cp-proxy-cli-version for hints

linkerd-ha-checks
-----------------
√ pod injection disabled on kube-system

Status check results are √

Linkerd extensions checks
=========================

linkerd-viz
-----------
√ linkerd-viz Namespace exists
√ linkerd-viz ClusterRoles exist
√ linkerd-viz ClusterRoleBindings exist
√ tap API server has valid cert
√ tap API server cert is valid for at least 60 days
√ tap API service is running
√ linkerd-viz pods are injected
√ viz extension pods are running
√ viz extension proxies are healthy
‼ viz extension proxies are up-to-date
    some proxies are not running the current version:
	* grafana-79f7549cc8-znrqt (stable-2.11.1..patched.1)
	* linkerd-oauth2-proxy-b8567ddd6-mm6zb (stable-2.11.1..patched.1)
	* metrics-api-57788b88d5-l7lbl (stable-2.11.1..patched.1)
	* prometheus-54fdfbbbc8-4wg2q (stable-2.11.1..patched.1)
	* tap-5f7b7d5d6d-778vg (stable-2.11.1..patched.1)
	* tap-5f7b7d5d6d-84n7l (stable-2.11.1..patched.1)
	* tap-5f7b7d5d6d-c6lp8 (stable-2.11.1..patched.1)
	* tap-injector-7d676db75-7ddb8 (stable-2.11.1..patched.1)
	* web-566bc79fd8-rlj2s (stable-2.11.1..patched.1)
    see https://linkerd.io/2.11/checks/#l5d-viz-proxy-cp-version for hints
‼ viz extension proxies and cli versions match
    grafana-79f7549cc8-znrqt running stable-2.11.1..patched.1 but cli running stable-2.11.1
    see https://linkerd.io/2.11/checks/#l5d-viz-proxy-cli-version for hints
√ prometheus is installed and configured correctly
√ can initialize the client
√ viz extension self-check

Status check results are √

Environment

  • Kubernetes Version:
    kubectl version Client Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.2", GitCommit:"8b5a19147530eaac9476b0ab82980b4088bbc1b2", GitTreeState:"clean", BuildDate:"2021-09-15T21:31:32Z", GoVersion:"go1.16.8", Compiler:"gc", Platform:"darwin/amd64"} Server Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.5-gke.1302", GitCommit:"639f3a74abf258418493e9b75f2f98a08da29733", GitTreeState:"clean", BuildDate:"2021-10-21T21:35:48Z", GoVersion:"go1.16.7b7", Compiler:"gc", Platform:"linux/amd64"}
  • Cluster Environment: (GKE, AKS, kops, ...)
    GKE
  • Host OS:
    Container OS
  • Linkerd version:
    2.11.1 patched

Possible solution

Additional context

On running a locally built instance of linkerd proxy,

[13:42] [ptolani@Pankajs-MacBook-Pro:~/greaterbank/poc/linkerd-upgrade-2.10.2-2.11.1/github/linkerd2-proxy]$ [._.] git diff
diff --git a/linkerd/app/inbound/src/policy.rs b/linkerd/app/inbound/src/policy.rs
index 315020a3..088cccc4 100644
--- a/linkerd/app/inbound/src/policy.rs
+++ b/linkerd/app/inbound/src/policy.rs
@@ -20,6 +20,8 @@ pub use linkerd_server_policy::{Authentication, Authorization, Protocol, ServerP
 use thiserror::Error;
 use tokio::sync::watch;
 
+pub use tracing::{debug, error, info, warn};
+
 #[derive(Clone, Debug, Error)]
 #[error("unauthorized connection on unknown port {0}")]
 pub struct DeniedUnknownPort(pub u16);
@@ -118,19 +120,25 @@ impl AllowPolicy {
         client_addr: Remote<ClientAddr>,
         tls: &tls::ConditionalServerTls,
     ) -> Result<Permit, DeniedUnauthorized> {
+        info!("server");
         let server = self.server.borrow();
+        info!("authz");
         for authz in server.authorizations.iter() {
             if authz.networks.iter().any(|n| n.contains(&client_addr.ip())) {
+                info!("match");
                 match authz.authentication {
                     Authentication::Unauthenticated => {
+                        info!("Unauthenticated");
                         return Ok(Permit::new(self.dst, &*server, authz));
                     }
 
                     Authentication::TlsUnauthenticated => {
+                        info!("TlsUnauthenticated");
                         if let tls::ConditionalServerTls::Some(tls::ServerTls::Established {
                             ..
                         }) = tls
                         {
+                            info!("Ok(Permit::new(self.dst, &*server, authz));");
                             return Ok(Permit::new(self.dst, &*server, authz));
                         }
                     }
@@ -139,11 +147,14 @@ impl AllowPolicy {
                         ref identities,
                         ref suffixes,
                     } => {
+                        info!("TlsAuthenticated");
                         if let tls::ConditionalServerTls::Some(tls::ServerTls::Established {
                             client_id: Some(tls::server::ClientId(ref id)),
                             ..
                         }) = tls
                         {
+                            info!("identities = {:?}", identities);
+                            info!("id = {}", id.as_str());
                             if identities.contains(id.as_str())
                                 || suffixes.iter().any(|s| s.contains(id.as_str()))
                             {

See following in logs which seems to suggest identity matches never happen because of the if statement?

[ 32346.561102s]  INFO ThreadId(01) inbound:server{port=8085}: linkerd_app_inbound::policy: server
[ 32346.561163s]  INFO ThreadId(01) inbound:server{port=8085}: linkerd_app_inbound::policy: authz
[ 32346.561168s]  INFO ThreadId(01) inbound:server{port=8085}: linkerd_app_inbound::policy: match
[ 32346.561172s]  INFO ThreadId(01) inbound:server{port=8085}: linkerd_app_inbound::policy: TlsAuthenticated
[ 32346.561177s]  INFO ThreadId(01) inbound:server{port=8085}: linkerd_app_inbound::policy::authorize::http: Request denied server=metrics-api tls=None(NoClientHello) client=10.213.65.7:44320
[ 32346.561203s]  INFO ThreadId(01) inbound:server{port=8085}:rescue{client.addr=10.213.65.7:44320}: linkerd_app_core::errors::respond: Request failed error=unauthorized connection on server metrics-api
@pankajmt
Copy link
Author

pankajmt commented Nov 23, 2021

Server Manifest

kind: Server
metadata:
  annotations:
    linkerd.io/created-by: linkerd/helm stable-2.11.1
    meta.helm.sh/release-name: linkerd-viz
    meta.helm.sh/release-namespace: linkerd-viz
  labels:
    app.kubernetes.io/managed-by: Helm
    component: metrics-api
    linkerd.io/extension: viz
  name: metrics-api
  namespace: linkerd-viz
spec:
  podSelector:
    matchLabels:
      component: metrics-api
      linkerd.io/extension: viz
  port: http
  proxyProtocol: HTTP/1

Server Authorization Manifest

apiVersion: policy.linkerd.io/v1beta1
kind: ServerAuthorization
metadata:
  annotations:
    linkerd.io/created-by: linkerd/helm stable-2.11.1
    meta.helm.sh/release-name: linkerd-viz
    meta.helm.sh/release-namespace: linkerd-viz
  labels:
    app.kubernetes.io/managed-by: Helm
    component: metrics-api
    linkerd.io/extension: viz
  name: metrics-api
  namespace: linkerd-viz
spec:
  client:
    meshTLS:
      serviceAccounts:
      - name: web
      - name: prometheus
  server:
    name: metrics-api

On deleting the Server Manifest, dashboard is happy

@olix0r
Copy link
Member

olix0r commented Nov 23, 2021

Do we have any idea why the client (web?) is not initializing mTLS? Is the destination controller not returning an identity for the pod? I'd try turning up proxy logs on the client pod.

@pankajmt
Copy link
Author

The client does get an identity. PFA debug logs for the linkerd-proxy for the client (web)
linkerd-proxy.log
.

@ElvinEfendi
Copy link
Contributor

ElvinEfendi commented Dec 11, 2021

I ran into the same issue with stable-2.11.1 and deleting the metrics-api Server resource as suggested by @pankajmt resolved the issue.

EDIT: I actually had to delete all Server resources in linkerd-viz namespace.

@tabnul
Copy link

tabnul commented Jan 7, 2022

Same issue, stable 2.11.1 , helm chart deployment using default namespaces.
Removed the above mentioned server resource from the helm chart as a workaround.

@gnunu
Copy link

gnunu commented Mar 8, 2022

Same issue. another walkaround:

apiVersion: policy.linkerd.io/v1beta1
kind: Server
metadata:
  namespace: linkerd-viz
  name: metrics-api
  labels:
    linkerd.io/extension: viz
    component: metrics-api
  annotations:
    linkerd.io/created-by: linkerd/helm stable-2.11.1
spec:
  podSelector:
    matchLabels:
      linkerd.io/extension: viz
      component: metrics-api
# CHANGE
  port: tap
  proxyProtocol: TLS
apiVersion: policy.linkerd.io/v1beta1
kind: ServerAuthorization
metadata:
  namespace: linkerd-viz
  name: metrics-api
  labels:
    linkerd.io/extension: viz
    component: metrics-api
  annotations:
    linkerd.io/created-by: linkerd/helm stable-2.11.1
spec:
  server:
    name: metrics-api
  client:
# CHANGE, also done for other servers
    networks:
      - cidr: 0.0.0.0/0
    meshTLS:
      serviceAccounts:
      - name: web
      - name: prometheus

@adleong adleong added this to the stable-2.12.0 milestone Mar 8, 2022
@jtnz
Copy link

jtnz commented Apr 28, 2022

We were experiencing the same issue and updated to 2.11.2 and it (so far) appears to be fixed!!! 🎉

@kleimkuhler
Copy link
Contributor

Great thanks for the update on this! If this isn't something that is fixed by 2.11.2 we can reopen.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jun 3, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants