Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metrics: Reduce pubsub cardinality #13378

Merged
merged 2 commits into from Oct 1, 2021

Conversation

SimonRichardson
Copy link
Member

@SimonRichardson SimonRichardson commented Sep 30, 2021

When the cardinality of the pubsub metrics get too large it causes juju
to crash with lots of data. Instead we swap out the topic name from a
UUID to a name that we can track easily.

QA steps

$ juju bootstrap lxd test
$ juju add-user prometheus
$ juju grant prometheus read controller
$ juju change-user-password prometheus
<PASSWORD>
$  juju show-controller test --format=json | jq '.test.details."api-endpoints"[0]' | xargs -I % echo "https://%/introspection/metrics" | xargs curl -k -s -u user-prometheus:<PASSWORD>

The output should include lease.request.callback instead of a UUID.

The new:

# HELP juju_pubsub_published Number of published message per topic
# TYPE juju_pubsub_published gauge
juju_pubsub_published{topic="apiserver.agent-connect"} 3
juju_pubsub_published{topic="apiserver.details"} 3
juju_pubsub_published{topic="apiserver.details-request"} 4
juju_pubsub_published{topic="lease.request"} 6
juju_pubsub_published{topic="lease.request.callback"} 3

The old:

# HELP juju_pubsub_published Number of published message per topic
# TYPE juju_pubsub_published gauge
juju_pubsub_published{topic="00152368-0aa6-4178-84c6-bb6ea3334f41"} 1
juju_pubsub_published{topic="01a150d8-2ba7-44f8-852a-b137b8aa11a2"} 1
juju_pubsub_published{topic="02066c99-0617-4777-88fd-e94aa3642df7"} 1
juju_pubsub_published{topic="039b5842-8f72-4110-8b03-3095c780a292"} 1
juju_pubsub_published{topic="09fd31d7-4fa5-4f2b-88d0-7541b5e06c92"} 1
juju_pubsub_published{topic="09ff6fea-8ea5-4605-8690-08e6b225a42f"} 1
juju_pubsub_published{topic="16a3d039-7a80-4a95-8421-5460afe4d693"} 1
juju_pubsub_published{topic="1a21d9e8-6447-4d06-8ab3-857e1c01716b"} 1
juju_pubsub_published{topic="1a7575a0-2d57-4286-85a0-df122cbb3a11"} 1
juju_pubsub_published{topic="1a8e5f67-7f36-48fb-8446-a1eec64a6668"} 1
juju_pubsub_published{topic="1b1b58b0-a626-4d68-843e-21b00413b4f4"} 1
juju_pubsub_published{topic="1d890ebf-4012-4899-879f-1ecb57036b83"} 1
juju_pubsub_published{topic="1dd3cd39-eb3b-4ee0-83a8-bce31fc6b7ff"} 1
juju_pubsub_published{topic="1ddfcceb-1ce2-4a3c-89eb-ee5b2fedea25"} 1
juju_pubsub_published{topic="2293dd08-1199-48f0-8776-33d5ae5f8b8d"} 1
juju_pubsub_published{topic="23eb26d9-9ea0-48af-85f5-1d27a12aea89"} 1
juju_pubsub_published{topic="25aba6fa-c5f3-4701-8e62-5793c4b2a3f9"} 1
juju_pubsub_published{topic="2665f713-045d-4e30-8dc5-5656e8442afd"} 1
juju_pubsub_published{topic="269c0d1d-9d63-47e1-8578-f3601adf571e"} 1
juju_pubsub_published{topic="29993503-cf25-4c75-8d08-c1f3f46ffba8"} 1
juju_pubsub_published{topic="2a3f232f-a754-4acd-8faf-4834a24c6ccd"} 1
juju_pubsub_published{topic="2bd6fbbd-4cbe-46d0-896c-68479d235b4a"} 1
juju_pubsub_published{topic="2e3cf0c1-ec12-4875-8054-848b58391aa7"} 1
juju_pubsub_published{topic="2fd1d52a-7913-4d0e-8c21-8a5b86c49b5a"} 1
juju_pubsub_published{topic="31775e0d-e97e-48e3-8211-39bef0f04b60"} 1
juju_pubsub_published{topic="3f0fa993-ca8c-4575-8e43-30a88da02d74"} 1
juju_pubsub_published{topic="3fc3aa8f-c19e-4ed3-8ea6-fb9de3e049b2"} 1
juju_pubsub_published{topic="429c6374-f302-4e01-8298-5cf6a0614cb5"} 1
juju_pubsub_published{topic="46094219-ae36-42a8-8e22-a4d11491c27d"} 1
juju_pubsub_published{topic="47b9e977-7973-4a7b-8574-6ad2b3f76261"} 1
juju_pubsub_published{topic="49ce2dc1-56c3-4a11-8b9a-b4d8b6f1c2a5"} 1
juju_pubsub_published{topic="4a7c77dc-e639-4d24-8078-fb6580b03a54"} 1
juju_pubsub_published{topic="4c0d9918-3216-4182-81d0-ea517c4d6382"} 1
juju_pubsub_published{topic="50339662-2a07-44bf-8a29-e89e8ec6dba3"} 1
juju_pubsub_published{topic="52436973-250b-4116-845a-4498ee3f10a8"} 1
juju_pubsub_published{topic="524bf687-1186-4f7f-83ed-1f790bb104f1"} 1
juju_pubsub_published{topic="52742779-516c-4971-8bc1-507ec7f4b462"} 1
juju_pubsub_published{topic="5380758f-a335-472e-8fc7-06cf4747be29"} 1
juju_pubsub_published{topic="548f4682-bbbe-43d1-893d-4716a72ef3d1"} 1
juju_pubsub_published{topic="56376ab2-d776-4116-8e92-bf2ecabde97c"} 1
juju_pubsub_published{topic="57e5dea6-f92f-4a3f-885a-6344c324708a"} 1
juju_pubsub_published{topic="596a6ef4-85e9-4e6d-804d-4376edb11443"} 1
juju_pubsub_published{topic="5ae49748-76a1-4913-887b-b5f7196995c6"} 1
juju_pubsub_published{topic="5ae50c4b-72f6-4905-8703-8e179706d46e"} 1
juju_pubsub_published{topic="5fbc1a2c-2065-4d6c-8f89-84ef04037a88"} 1
juju_pubsub_published{topic="635867c2-980c-441a-8584-9333287d31b0"} 1
juju_pubsub_published{topic="6487a6f6-958a-4439-82ae-1cd9ffa10f44"} 1
juju_pubsub_published{topic="6522470e-23ec-4f6a-8c0e-fa124c9ec3ae"} 1
juju_pubsub_published{topic="68a9e9fa-2dc3-4818-864a-81a969d8ff5a"} 1
juju_pubsub_published{topic="69d7c860-2341-4f8f-85c5-b5b68b4dbe12"} 1
juju_pubsub_published{topic="6c8c4bd4-4267-4b76-8078-5a1d43429c30"} 1
juju_pubsub_published{topic="6fd87627-ed2c-4db5-8298-d8aa2d3d976b"} 1
juju_pubsub_published{topic="7017ef98-1c5f-4b59-88a9-90fa1311c991"} 1
juju_pubsub_published{topic="7239f465-4541-492a-83eb-0a43b0fe0204"} 1
juju_pubsub_published{topic="7242fcbf-f8e4-4caa-89df-2614cd2b89b2"} 1
juju_pubsub_published{topic="73511272-aec0-4e8e-8d87-46df0f4f77af"} 1
juju_pubsub_published{topic="743fad59-d058-4403-8e36-ec49a3362d32"} 1
juju_pubsub_published{topic="75440a9c-6d9e-41e3-8b09-2a462e18bd46"} 1
juju_pubsub_published{topic="76ec8704-ea11-47c6-89bd-b56be0516c5d"} 1
juju_pubsub_published{topic="848b40f4-3fdc-4193-8dc7-20f22b829d17"} 1
juju_pubsub_published{topic="88397710-4c1f-4c2a-8ad1-473127bedaae"} 1
juju_pubsub_published{topic="8993fa2b-5192-41df-8660-0666bb2438bc"} 1
juju_pubsub_published{topic="8aae973e-d56b-4137-8c8b-1f30ab07a541"} 1
juju_pubsub_published{topic="8c4f26d2-348d-4d3c-85e0-a0965043b93a"} 1
juju_pubsub_published{topic="9385a3a2-931f-4e42-8dea-4d33df28270d"} 1
juju_pubsub_published{topic="94562ec5-5c01-474c-807e-5d2a74a15447"} 1
juju_pubsub_published{topic="9465dd3c-70a9-4e11-8fba-1c770b738c73"} 1
juju_pubsub_published{topic="976acbc1-2d4d-4f05-8e52-e148c1d0d851"} 1
juju_pubsub_published{topic="98fd47ed-e294-40bc-8351-13411f9f9c53"} 1
juju_pubsub_published{topic="99ec1adc-b2e8-44eb-8ef9-730a2897a27f"} 1
juju_pubsub_published{topic="9a6ca9e4-6ddf-481c-8f48-e3a6a7d3cedb"} 1
juju_pubsub_published{topic="9cf1f064-6ef9-4729-8c4b-3f7cfe23fd7a"} 1
juju_pubsub_published{topic="9f23c49b-aad3-4b37-8b0c-8348f05f1fde"} 1
juju_pubsub_published{topic="9f3e155a-5a53-453f-83a7-5a8e53963384"} 1
juju_pubsub_published{topic="a050eb5f-2667-412d-8640-b31aaa380d23"} 1
juju_pubsub_published{topic="a1aa7878-1f0c-45c8-8c2c-49bcbb7b46a2"} 1
juju_pubsub_published{topic="a40e2ad1-c9e7-41a5-8179-c6d7846a62b5"} 1
juju_pubsub_published{topic="a8020b4a-3aed-4eb1-8e1e-d7b1ec6a5fa2"} 1
juju_pubsub_published{topic="a8c505e7-af9d-41ef-806f-cda05aa57da3"} 1
juju_pubsub_published{topic="aa8ab583-53fd-4063-8cfe-6723aac99fa5"} 1
juju_pubsub_published{topic="apiserver.agent-connect"} 3
juju_pubsub_published{topic="apiserver.details"} 3
juju_pubsub_published{topic="apiserver.details-request"} 4
juju_pubsub_published{topic="b769f9d7-eaf1-4558-860e-766840057289"} 1
juju_pubsub_published{topic="b7fed74e-12ed-4139-887e-e0b83b7181d0"} 1
juju_pubsub_published{topic="b9b1f8d6-409b-42c2-85c5-cd8adb845f0a"} 1
juju_pubsub_published{topic="bb4fedc6-3cb2-4e1c-8666-7825cd098965"} 1
juju_pubsub_published{topic="bc2b6b04-4dc6-49d4-818c-282091dba044"} 1
juju_pubsub_published{topic="be19d71c-7689-4bb5-8668-af8513cc4201"} 1
juju_pubsub_published{topic="bee4c2ce-db70-4e1d-89be-eefcbde88d02"} 1
juju_pubsub_published{topic="bfaf52e4-2c74-416f-8a0d-c5b070bb7cfe"} 1
juju_pubsub_published{topic="c1df2fec-42d2-4f74-8118-733771f7c0ab"} 1
juju_pubsub_published{topic="c3241eb2-be6f-48d8-8013-8e9049efc11a"} 1
juju_pubsub_published{topic="c36c72a3-ffc5-4f23-806d-4458e92cab8a"} 1
juju_pubsub_published{topic="c3ffe8c9-38ae-45f2-87a4-360a72c46789"} 1
juju_pubsub_published{topic="c7cbff36-2ada-4ae4-8e61-2b03efc58aad"} 1
juju_pubsub_published{topic="c8549b25-f51e-4a95-8c7b-42bca58c8b83"} 1
juju_pubsub_published{topic="cbb7f189-8b63-4e4e-8ea7-1fed7196f886"} 1
juju_pubsub_published{topic="cf1a419d-536f-40cb-8577-ffb6ad0a227c"} 1
juju_pubsub_published{topic="d031288a-509f-4b6f-811f-01dbab23f583"} 1
juju_pubsub_published{topic="d75d1122-276a-47d0-883e-b839bc3a71b0"} 1
juju_pubsub_published{topic="d7efa3b6-b0a2-4c65-8ab5-fbc6c5fa8f76"} 1
juju_pubsub_published{topic="dbbaf850-7319-4500-88c1-da92c6c5d87e"} 1
juju_pubsub_published{topic="dc065e48-a252-4738-8dd0-ec926c71fa7b"} 1
juju_pubsub_published{topic="de954b88-53e2-48c1-86a3-66f2023a4111"} 1
juju_pubsub_published{topic="e5f3f687-cbd5-42b8-8777-9307eb08cdc2"} 1
juju_pubsub_published{topic="e8915c8b-67c4-46ef-8741-500db3762141"} 1
juju_pubsub_published{topic="e8f35d24-386b-435a-8135-50bf6d9bcba3"} 1
juju_pubsub_published{topic="e9fbf895-f714-4444-8be7-a286aee0f035"} 1
juju_pubsub_published{topic="ebe18595-e9d6-4755-86b7-69c1e16ffe57"} 1
juju_pubsub_published{topic="ec7c0b62-d8e1-4be2-86e3-f7d5b3a5daa7"} 1
juju_pubsub_published{topic="edc2d6fc-8af2-4c03-830d-5e1eb512a632"} 1
juju_pubsub_published{topic="f2dad952-7111-47d5-8ea8-07ea607260d7"} 1
juju_pubsub_published{topic="f3cc2cbc-7f97-4f55-858b-10ed69241443"} 1
juju_pubsub_published{topic="fc10c975-6479-438a-82f7-40a51ef2e05b"} 1
juju_pubsub_published{topic="fe4b3dd8-6d9f-4ce9-8793-c85a265851ac"} 1
juju_pubsub_published{topic="fe8c9ae3-5495-4349-811b-fefdac233478"} 1
juju_pubsub_published{topic="lease.request"} 117

Fixes:
https://bugs.launchpad.net/juju/+bug/1945644

When the cardinality of the pubsub metrics get too large it causes juju
to crash with lots of data. Instead we swap out the topic name from a
UUID to a name that we can track easily.
@jameinel
Copy link
Member

Tom confirmed that this looks like what they are seeing on PS5:
ubuntu@juju-3128a8-controller-0:$ # wget http://localhost:19090/metrics
ubuntu@juju-3128a8-controller-0:
$ ls -lh metrics
-rw-rw-r-- 1 ubuntu ubuntu 61M Sep 30 16:42 metrics
ubuntu@juju-3128a8-controller-0:~$ wc -l metrics
906131 metrics

Copy link
Member

@jameinel jameinel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is problematic to change the labels for the responses, then this is ok. But if we can clean that up while we're here, we might as well.

@@ -75,51 +81,56 @@ func (m *PubsubMetrics) Unsubscribed() {
m.subscriptions.Dec()
}

var leaseRequestRegex = regexp.MustCompile("lease.request.[0-9a-f]+.[0-9]+")
var (
leaseRequestRegex = regexp.MustCompile("lease.request.[0-9a-f]+.[0-9]+")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the prefix in the one but not the other? I guess we are doing responses as UUIDs, but that makes me a bit sad that we aren't using any sort of identifier in the response topic for anything more than just an opaque UUID.

Is there any reason why we can't change the response to:
lease.request.callback.$UUID and then only filter those instead of arbitrary UUIDS?
I don't think we care what the keys are, and prefixing them rather than just UUIDs is cleaner anyway.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I've namespaced the response topic. These of course will all go away once the raft over API lands.

The lease callback topics conformed to UUID, but the problem with them,
is that they weren't namespaced, so it was hard to know where they came
from. The change here, was to ensure that they are namespaced and that
we can identify their origin.
@SimonRichardson
Copy link
Member Author

$$merge$$

@jujubot jujubot merged commit c896378 into juju:2.9 Oct 1, 2021
@SimonRichardson SimonRichardson deleted the reduce-cardinality branch October 1, 2021 11:32
@wallyworld wallyworld mentioned this pull request Oct 5, 2021
jujubot added a commit that referenced this pull request Oct 5, 2021
#13385

Merge 2.9

#13363 Use a mock for upgradedatabase test clock
#13347 Update oracle api and fix bootstrap
#13343 Bootstrap test refactor
#13372 Pass HTTP Client through
#13364 Make upgrade smoke test workflow more robust
#13374 For older machine agents, there needs to be a symlink for the series based tools
#13371 Stick the secret revision in the URL path not as a query parameter
#13375 CLI: NO_COLOR support
#13365 Pin kubeflow test to a fixed sha;
#13376 Rename secret status pending to staged
#13380 Add secret grant/revoke hook command CLI
#13350 Implement elasticContainerRegistry;
#13378 Metrics: Reduce pubsub cardinality
#13377 State: Logs already exist
#13381 Update Pebble to add replan, wait-change, and one-shot commands
#13383 LXD network retrieval efficiency

```
# Conflicts:
# caas/kubernetes/provider/bootstrap_test.go
# go.mod
# go.sum
#
```
## QA steps

See PRs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants