Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No stats on grafana Linkerd Services dashboard for our namespace #1451

Closed
willtrking opened this issue Aug 14, 2018 · 9 comments
Closed

No stats on grafana Linkerd Services dashboard for our namespace #1451

willtrking opened this issue Aug 14, 2018 · 9 comments

Comments

@willtrking
Copy link

Hey there,

We're testing out linkerd2 and aren't seeing any stats for the services in our Kubernetes namespace (networktest) on the Linkerd Services grafana dashboard. However, we are seeing stats on the Linkerd Services dashboard for the services created by linkerd2 installation in the linkerd namespace.

Traffic is being routed OK, and we are seeing stats in other Grafana dashboards as expected.

See below screen shots.

linkerd namespace

linkerd ok

networktest namespace

ours not ok

@klingerf
Copy link
Member

@willtrking Interesting -- thanks for reporting this. I tested that dashboard using the emojivoto service from the Getting Started guide, and it's successfully populating with stats. Are you sure that the services in your networktest namespace are receiving traffic? If they're not receiving any traffic then no stats will be reported, and I'd expect the dashboard to look like it does.

You can use the linkerd stat command to query stats for that namespace. What does this command output for you:

linkerd -n networktest stat svc --from pods

@stale
Copy link

stale bot commented Nov 22, 2018

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label Nov 22, 2018
@ctaggart
Copy link

I ran into this yesterday with trying linkerd2 for the first time. I set up a AKS cluster and deployed linkerd2 to it along with the emojivoto service. I clicked around the port-forwarded web app a bunch, but no stats showed up. All I got was:

cameron@Azure:~$ linkerd -n emojivoto stat deploy
NAME       MESHED   SUCCESS   RPS   LATENCY_P50   LATENCY_P95   LATENCY_P99   TLS
emoji         1/1         -     -             -             -             -     -
vote-bot      1/1         -     -             -             -             -     -
voting        1/1         -     -             -             -             -     -
web           1/1         -     -             -             -             -     -

I'm going to try again today with linkerd2 edge on AKS.

@klingerf
Copy link
Member

@ctaggart Thanks for the additional info. I would definitely expect to see stats in that output. In fact the emojivoto app includes a traffic generator (vote-bot) that sends traffic to each of the components automatically -- it shouldn't require additional clicking around in the web frontend.

When I run the app in my cluster (docker-for-desktop), I get:

$ linkerd stat deploy -n emojivoto
NAME       MESHED   SUCCESS      RPS   LATENCY_P50   LATENCY_P95   LATENCY_P99   TLS
emoji         1/1   100.00%   1.6rps           1ms          10ms          18ms    0%
vote-bot      1/1         -        -             -             -             -     -
voting        1/1    86.05%   0.7rps           4ms          25ms          29ms    0%
web           1/1    94.32%   1.5rps          17ms          37ms          39ms    0%

In terms of debugging, I think it would be worth looking at the raw prometheus stats exported from the linkerd-proxy container in the web pod. Can you port-forward to the metrics port of that pod:

kubectl -n emojivoto port-forward $(kubectl -n emojivoto get po -oname | grep web) 4191

And the grab the response_total stats:

curl -s localhost:4191/metrics | grep '^response_total'

Using linkerd2 edge-18.11.2, I get:

response_total{authority="web-svc.emojivoto",direction="inbound",tls="disabled",status_code="200",classification="success"} 362
response_total{authority="web-svc.emojivoto",direction="inbound",tls="disabled",status_code="500",classification="failure"} 44
response_total{authority="emoji-svc.emojivoto:8080",direction="outbound",dst_control_plane_ns="linkerd",dst_deployment="emoji",dst_namespace="emojivoto",dst_pod="emoji-d55fb89f7-vq8hz",dst_pod_template_hash="811964593",dst_service="emoji-svc",tls="no_identity",no_tls_reason="not_provided_by_service_discovery",status_code="200",classification="success",grpc_status="0"} 406
response_total{authority="voting-svc.emojivoto:8080",direction="outbound",dst_control_plane_ns="linkerd",dst_deployment="voting",dst_namespace="emojivoto",dst_pod="voting-7779989797-cxdnd",dst_pod_template_hash="3335545353",dst_service="voting-svc",tls="no_identity",no_tls_reason="not_provided_by_service_discovery",status_code="200",classification="success",grpc_status="0"} 159
response_total{authority="voting-svc.emojivoto:8080",direction="outbound",dst_control_plane_ns="linkerd",dst_deployment="voting",dst_namespace="emojivoto",dst_pod="voting-7779989797-cxdnd",dst_pod_template_hash="3335545353",dst_service="voting-svc",tls="no_identity",no_tls_reason="not_provided_by_service_discovery",status_code="200",classification="failure",grpc_status="2"} 44

@ctaggart
Copy link

ctaggart commented Nov 27, 2018

I set up another AKS cluster today with linkerd edge and I am able to get some stats. I used the az command line app this time which defaulted to an older version of k8s (1.9.11 instead of 1.11.4). Security defaulted to RBAC this time. I was using linkerd 18.11.2 instead of 2.0.0.

cameron@Azure:~$ Forwarding from 127.0.0.1:4191 -> 4191
Forwarding from [::1]:4191 -> 4191

cameron@Azure:~$ curl -s localhost:4191/metrics | grep '^response_total'
Handling connection for 4191
response_total{authority="web-svc.emojivoto",direction="inbound",tls="disabled",status_code="200",classification="success"} 6223
response_total{authority="web-svc.emojivoto",direction="inbound",tls="disabled",status_code="500",classification="failure"} 695
response_total{authority="voting-svc.emojivoto:8080",direction="outbound",dst_control_plane_ns="linkerd",dst_deployment="voting",dst_namespace="emojivoto",dst_pod="voting-6c8c66d7f-r4b9h",dst_pod_template_hash="274722839",dst_service="voting-svc",tls="no_identity",no_tls_reason="not_provided_by_service_discovery",status_code="200",classification="success",grpc_status="0"} 2778
response_total{authority="voting-svc.emojivoto:8080",direction="outbound",dst_control_plane_ns="linkerd",dst_deployment="voting",dst_namespace="emojivoto",dst_pod="voting-6c8c66d7f-r4b9h",dst_pod_template_hash="274722839",dst_service="voting-svc",tls="no_identity",no_tls_reason="not_provided_by_service_discovery",status_code="200",classification="failure",grpc_status="2"} 696
response_total{authority="emoji-svc.emojivoto:8080",direction="outbound",dst_control_plane_ns="linkerd",dst_deployment="emoji",dst_namespace="emojivoto",dst_pod="emoji-5f67d95b6c-s8k5b",dst_pod_template_hash="1923851627",dst_service="emoji-svc",tls="no_identity",no_tls_reason="not_provided_by_service_discovery",status_code="200",classification="success",grpc_status="0"} 7425
response_total{direction="inbound",tls="disabled",status_code="200",classification="success"} 12
cameron@Azure:~$ linkerd stat deploy -n emojivoto
NAME       MESHED   SUCCESS      RPS   LATENCY_P50   LATENCY_P95   LATENCY_P99   TLS
emoji         1/1   100.00%   2.0rps           1ms           2ms           4ms    0%
vote-bot      1/1         -        -             -             -             -     -
voting        1/1    71.19%   1.0rps           1ms           1ms           1ms    0%
web           1/1    88.03%   1.9rps           2ms          15ms          19ms    0%

I was hoping to also see the stats for the paths and methods, but they are not:
image

Initially without any requests made, I was getting an HTTP 500 in the dashboard with this URL:
http://localhost:8001/api/v1/namespaces/linkerd/services/web:http/proxy/api/tps-reports?resource_type=namespace&all_namespaces=true&window=1m

@klingerf
Copy link
Member

@ctaggart Path and method stats should be there. Here's what it looks like in my setup:

image

That table's populated with live traffic data after the page loads, so it may take some time before rows appear, and the data in the table is just a sampling of all requests. The data is supplied via websockets to the frontend, so if you're still not seeing any rows after leaving the page open for a minute or so, please check to make sure your browser is properly handling the websocket connection.

#1737 tracks adding full-fidelity, historical path stats that are stored in Prometheus, rather than stats that are sampled from live traffic as it's currently setup. That feature is almost ready for primetime -- stayed tuned for the next few releases.

@ctaggart
Copy link

ctaggart commented Nov 28, 2018

The data is supplied via websockets to the frontend, so if you're still not seeing any rows after leaving the page open for a minute or so, please check to make sure your browser is properly handling the websocket connection.

I tried in both Firefox and Chrome. Chrome console shows this websockets request:
image

Just in case it was an issue with kubectl proxy, I tried port-forward too.
kubectl -n linkerd port-forward svc/web 8084:8084
http://localhost:8084/namespaces/emojivoto/deployments/voting

PS C:\Users\taggac> kubectl version
Client Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.3", GitCommit:"2bba0127d85d5a46ab4b778548be28623b32d0b0", GitTreeState:"clean", BuildDate:"2018-05-21T09:17:39Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"windows/amd64"}
Server Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.11", GitCommit:"1bfeeb6f212135a22dc787b73e1980e5bccef13d", GitTreeState:"clean", BuildDate:"2018-09-28T21:35:22Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}

Happy to give your access to my test cluster in Azure to look at.

@klingerf
Copy link
Member

@ctaggart Ok, thanks for investigating. It would be worth trying to see if you can tap that deployment from the command line. Here's what I see in my env:

$ linkerd -n emojivoto tap deploy/voting
req id=0:6483 proxy=in  src=10.1.26.54:35716 dst=10.1.26.52:8080 tls=disabled :method=POST :authority=voting-svc.emojivoto:8080 :path=/emojivoto.v1.VotingService/VotePointUp2
rsp id=0:6483 proxy=in  src=10.1.26.54:35716 dst=10.1.26.52:8080 tls=disabled :status=200 latency=1066µs
end id=0:6483 proxy=in  src=10.1.26.54:35716 dst=10.1.26.52:8080 tls=disabled grpc-status=OK duration=174µs response-length=5B
req id=0:6484 proxy=in  src=10.1.26.54:35716 dst=10.1.26.52:8080 tls=disabled :method=POST :authority=voting-svc.emojivoto:8080 :path=/emojivoto.v1.VotingService/VoteMrsClaus
rsp id=0:6484 proxy=in  src=10.1.26.54:35716 dst=10.1.26.52:8080 tls=disabled :status=200 latency=2040µs
end id=0:6484 proxy=in  src=10.1.26.54:35716 dst=10.1.26.52:8080 tls=disabled grpc-status=OK duration=76µs response-length=5B

That's the same data that populates the table in the web UI. So if it's working from the CLI, then that suggests that the issue in your environment is stemming from the web component.

I can probably find some time tomorrow to poke at your test cluster if it's still not working. Feel free to ping me in the Linkerd slack -- I'm "kl".

@stale
Copy link

stale bot commented Feb 26, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label Feb 26, 2019
@stale stale bot closed this as completed Mar 12, 2019
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jul 18, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants