-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFE] Get Flatcar added to supported operating systems for Google Ops Agent #560
Comments
Hello @cpswan, please pardon the long silence. We have taken a closer look at GCP's ops-agent and would like to share below findings. We'd also like to ask if you could elaborate on the scenario you're using Flatcar in and what benefits ops-agent would bring over other monitoring tools? We've taken a closer look at https://github.com/GoogleCloudPlatform/ops-agent and its potential relevance for Flatcar. Ops-agent, while shipping basic host metrics support (cpu, network, disk, and mem usage stats) also includes support for gathering metrics from a significant number of on-host applications like JVM apps, apache, memcached, etc.. Supporting these on the host level feels irrelevant for Flatcar since we're a container-focused OS; application workloads are expected to ship in containers instead of running in the host OS directly. Flatcar also does not ship a JVM, and the SDK does not include a java runtime - so even a native build of ops-agent for Flatcar would be challenging. Support for these apps pulls in a large amount of dependencies and seems to be baked in, i.e. it indeed seems not possible to ship a stripped-down version of ops-agent that does not include said support without a significant code change in ops-agent. Furthermore, ops-agent appears to require a number of runtime dependencies to metrics gateways (fluentd, collectd, opentelemetry) which are not readily available on Flatcar, and would need to be added, e.g. as container images. Alternatively, significant code changes would need to be required to ops-agent to make these runtime requirements optional. Thirdly, to make the installer script work, we would need to have GPC either produce or ingest the code changes called out above, and produce a build. This is more on the Google side of things which we don't have control over. Could you please elaborate on your scenario so we can better help find a path forward? |
Thanks @t-lo Basic host metrics are what we use the agent for. We have a custom dashboard in GCP Monitoring that shows us:
This first group seems to get its data from the agent, as the Flatcar hosts we have running don't show up. We also have these on the dashboard, and the Flatcar hosts are already included, as the metrics come from the VM rather than the OS.
Digging around in the Metric explorer I can see that the first group come from We can probably substitute
We're not at all interested in metrics from on-host applications, because we run everything in containers. If you can suggest another way of getting CPU load from Flatcar into GCP Monitoring we won't need the agent. |
Closing this as I've solved the problem with a containerised Python script to send CPU load average as a GCP custom metric. Dockerfile FROM python:3.10.3-slim
WORKDIR /usr/src/app
RUN pip3 install --no-cache-dir google-cloud-monitoring
COPY docker_send_data.py .
CMD [ "python3", "./docker_send_data.py" ] docker_send_data.py #!/usr/bin/env python3
from google.cloud import monitoring_v3
import time
import os
import requests
metadata_server = "http://metadata/computeMetadata/v1/"
metadata_flavor = {'Metadata-Flavor' : 'Google'}
#gce_id = requests.get(metadata_server + 'instance/id', headers = metadata_flavor).text
gce_name = requests.get(metadata_server + 'instance/hostname', headers = metadata_flavor).text
gce_project = requests.get(metadata_server + 'project/project-id', headers = metadata_flavor).text
split_gce_name=gce_name.split(".",2)
client = monitoring_v3.MetricServiceClient()
project_id = gce_project
project_name = f"projects/{project_id}"
series = monitoring_v3.TimeSeries()
series.metric.type = "custom.googleapis.com/at_swarm_node_load"
series.resource.type = "gce_instance"
series.resource.labels["instance_id"] = split_gce_name[0]
series.resource.labels["zone"] = split_gce_name[1]
while True:
load1, load5, load15 = os.getloadavg()
now = time.time()
seconds = int(now)
nanos = int((now - seconds) * 10 ** 9)
interval = monitoring_v3.TimeInterval(
{"end_time": {"seconds": seconds, "nanos": nanos}}
)
point = monitoring_v3.Point({"interval": interval, "value": {"double_value": load5}})
series.points = [point]
client.create_time_series(request={"name": project_name, "time_series": [series]})
time.sleep(60) |
Amazing work @cpswan I just read your blog about this: https://blog.atsign.dev/google-cloud-custom-metrics-cl16j2q2b05gujonv0h9x2dg5 Just out of curiosity, would you be interested in adding your approach (and your script) to our GCP platform docs as a work-around for our lack of ops agent? |
Sure @t-lo I've done a docs PR here before so I'll dive in for another... |
Awesome, thank you! |
Current situation
The install script for ops agent (
curl -sSO https://dl.google.com/cloudagents/add-google-cloud-ops-agent-repo.sh && sudo bash add-google-cloud-ops-agent-repo.sh --also-install
) fails with the messageUnidentifiable or unsupported platform. See https://cloud.google.com/stackdriver/docs/solutions/ops-agent/#supported_operating_systems for a list of supported platforms.
Flatcar is not listed at https://cloud.google.com/stackdriver/docs/solutions/ops-agent/#supported_operating_systems
Impact
GCE VMs running Flatcar can't be added to monitoring dashboards.
Ideal future situation
Flatcar will become a supported OS for the Ops Agent, and future versions of the script will install it.
**Implementation options
The Ops Agent could be baked into future GCE images
The text was updated successfully, but these errors were encountered: