feat: add curated telegraf 1.38.2 build for Azure Linux 4.0 (#20399)#17798
feat: add curated telegraf 1.38.2 build for Azure Linux 4.0 (#20399)#17798WithEnoughCoffee wants to merge 1 commit into
Conversation
There was a problem hiding this comment.
Pull request overview
This PR re-introduces Telegraf (missing from Azure Linux 4.0, shipped in 3.0) as a new local component (base/comps/telegraf/). Because Telegraf's default build links ~400 plugins and the full transitive dependency tree, the spec uses a curated ("Balanced") plugin set (~104 Go build tags via GO_BUILDTAGS) to shrink the binary and its CVE/dependency surface, while still vendoring the full tree per Fedora Go packaging guidelines (%gometa, Go Vendor Tools, %gobuild). It adds systemd/sysusers/logrotate integration and a %check that validates the license expression and runs the binary. It resolves #20399.
Changes:
- Adds a hand-maintained local
telegraf.specwith a curated%global buildtagsplugin policy, Go Vendor Tools license macros, sysusers, systemd unit, logrotate, and default-config generation. - Adds the component definition (
telegraf.comp.toml, manual release),go-vendor-tools.toml,telegraf.sysusers, a reproducible vendor-tarball generator script, and the rendered specs/lock/sources. - The vendor tarball source URI is currently a
127.0.0.1placeholder pending lookaside upload (noted as a known follow-up).
Reviewed changes
Copilot reviewed 9 out of 10 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
base/comps/telegraf/telegraf.comp.toml |
Local-spec component def, manual release, two source-files (upstream + vendor); vendor URI is a placeholder. |
base/comps/telegraf/telegraf.spec |
Curated Go build spec (buildtags, license macros, sysusers, systemd, %check). |
base/comps/telegraf/go-vendor-tools.toml |
askalono detector + manual SPDX entries for files the detector can't classify. |
base/comps/telegraf/telegraf.sysusers |
Declarative telegraf system user. |
base/comps/telegraf/generate_source_tarball.sh |
Reproducible go mod vendor tarball generator; comment references a stale macro name. |
specs/t/telegraf/* |
Rendered spec/sysusers/go-vendor-tools/sources (body matches base sources). |
locks/telegraf.lock |
Generated input-fingerprint lock. |
Key findings: the helper script's comment cross-references a non-existent %{plugin_tags} macro (spec uses %{buildtags}), and the vendor source URI is an unresolved 127.0.0.1 placeholder that blocks CI fetch/build until replaced. Because this introduces a brand-new forked local spec (a long-term maintenance commitment) for a vendored Go package with a large curated plugin policy, license-expression tracking, and an unresolved source URI, it warrants human review.
c23dfcb to
a8edaec
Compare
a8edaec to
6a6f026
Compare
|
issue(blocking): While having a list of required follow-ups is good, make sure to remove it before we merge. |
tobiasb-ms
left a comment
There was a problem hiding this comment.
question(blocking): I left a couple specific comments about this, but what are the practical differences between this and how we packaged it for AZL3? I know we've changed to use a fedora-blessed way of packaging, and that seems righteous. And of course we bumped the version. But I think there are more semantic differences here and we need to know what they are and why we're making them before taking this change.
|
issue(blocking): This isn't in |
|
rpm-layout, build (blocking): Can you please confirm this builds on koji? I also saw Suse was the reference for defaults. We should lean towards fedora/centos/redhat. There homedir is /etc/telegraf. dnf repoquery -l telegraf | grep -v build-id |
|
Good question — here's the full inventory of what changed from AZL3 ( A. Semantic / behavioral changes (the ones to sign off on)
B. Mechanical / compliance changes (no behavioral impact)
Net: the only changes that affect runtime behavior are the curated plugin set (#1–2), the no-user-deletion policy (#3), tighter config perms (#4), and the sandboxing drop-in (#5). Everything else is version/toolchain/compliance hygiene. Happy to expand the plugin set or relax any of these if they conflict with a known consumer. |
Agreed good call out. I will keep that in mind. |
Home directory — you're right, Switched the telegraf user's home from the SUSE-style /var/lib/telegraf to /etc/telegraf to match upstream InfluxData and Fedora/RHEL (their useradd -d /etc/telegraf ). The rest of the layout already follows that convention ( /etc/telegraf config, /var/log/telegraf logs, /usr/bin/telegraf ). One nuance worth documenting (and I've left a comment in the sysusers file): /etc/telegraf is root-owned and, under our hardening ( ProtectSystem=full ), read-only at runtime — so it's not a writable home. Some plugin SDKs (e.g. the Azure SDK credential cache) write under $HOME , so the service drop-in sets Environment=HOME=/var/lib/telegraf (writable; /var stays writable under ProtectSystem=full ). Net result for the supported path — running as the systemd service — config is read from /etc/telegraf , SDK caches land in /var/lib/telegraf , everything works. The only caveat: if telegraf is run outside systemd (manual sudo -u telegraf debugging), that override isn't applied, $HOME falls back to /etc/telegraf , and writes under $HOME would fail with EACCES. This is identical to upstream's design (they ship the same root-owned /etc/telegraf home); workaround is HOME=/var/lib/telegraf for manual runs. Flagged in a sysusers comment so it doesn't surprise a future maintainer. |
b71823f to
5a4c9e6
Compare
5a4c9e6 to
a8d52fe
Compare
a8d52fe to
60bcfe7
Compare
fixed |
df6f439 to
2b4414f
Compare
2b4414f to
0d96465
Compare
0d96465 to
bd9c5b8
Compare
We spoke privately and I addressed these comments. |
bd9c5b8 to
6e032b0
Compare
🔒❌ Lock files are out of dateFIX: — run this and commit the result: azldev component update -p telegrafOr download the fix patch and apply it: gh run download 28628005570 -R microsoft/azurelinux -n locks-patch
git apply locks.patchChanged components (1)
|
tobiasb-ms
left a comment
There was a problem hiding this comment.
Changes look good. Once you rebase down to one commit I'll approve.
febb0a7 to
af21b20
Compare
…inux 4.0 Restore telegraf (absent from AzL 4.0) as a curated metrics agent, built via upstream's 'custom' build tag with a curated plugin set (full first-party Azure plugins + github input) to shrink the linked dependency/CVE surface while retaining the full vendor tree. Follows the Fedora Go guidelines (Go Vendor Tools, %gobuild). Ships the upstream systemd unit unmodified, sysusers with home /etc/telegraf, logrotate, generated default config, and a %check (license check + binary smoke test). The vendor tarball is produced by generate_source_tarball.sh, a reproducible out-of-band tool (not invoked during rpmbuild): inputs are hard-coded, the source is fetched from the pinned URI (optional path override), verified against a hard-coded SHA512, and emitted as a deterministic tarball.
af21b20 to
98bed80
Compare
Summary
Telegraf shipped in Azure Linux 3.0 but is missing from 4.0. This restores it as a general-purpose, plugin-driven agent for collecting, processing, aggregating, and writing metrics. Resolves #20399.
Why a curated build
Built upstream-default, telegraf links ~400 plugins and the full transitive dependency tree — a large CVE surface, vendor footprint, and binary for a distro we maintain. Instead we compile a curated ("Balanced", 108 build tags) set: 63 inputs, 15 outputs, 7 processors, 4 aggregators, 12 parsers, 7 serializers. The rest are absent from the binary at build time.
azure_monitor(in/out),azure_storage_queue,eventhub_consumer,azure_data_explorer, and thegithubinput are all included, since AzL is an Azure/Microsoft + GitHub product and these should work by default.go list -deps ./cmd/telegraflinks 1,877 packages vs 3,386 for the full build (~45% dropped), across 344 distinct third-party modules vs 592 (~248 fewer). Note: this reduces the linked/runtime-reachable surface only — the full vendor tree is still shipped (Fedora requires it), so the source-level CVE-scan footprint is unchanged.%global buildtags). Adding (or removing) a plugin is a one-line change — append its tag to the macro, e.g.:inputs.cpu inputs.disk inputs.diskio inputs.mem inputs.net inputs.netstat \ + inputs.redis \%build/%install/%fileschanges are needed, so curation stays easy to audit and evolve as requirements change.Packaging (Fedora Go guidelines)
Uses the
go2rpm --profile vendorscaffold as the baseline (Go Vendor Tools, vendored deps,%gobuildwithGO_BUILDTAGS/GO_LDFLAGS), so it can be upstreamed to Fedora and matches the vendored-Go pattern AzL already uses (rootlesskit,git-lfs). Divergences are marked# AzL:. The full vendor tree is retained (Fedora requires it); curation only affects what is compiled. The cumulative SPDXLicensetag is computed withgo_vendor_licenseand enforced by%go_vendor_license_check;bundled(golang(...))provides are auto-generated.systemd unit
The upstream systemd unit is shipped unmodified (runs as
User=telegraf). We intentionally add no sandboxing drop-in: telegraf is a whole-system monitoring agent, and the curated inputs include hardware collectors that shell out viasudo -n(smart, smartctl, ipmi_sensor) or needCAP_NET_RAW(ping) —NoNewPrivileges/Protect*would break them. This matches upstream InfluxData and AzL 3.0. (An earlier revision shipped an openSUSE-derived50-hardening.conf; it was dropped after review because it diverged from upstream and conflicted with the curated hardware inputs. See the PR discussion for the full rationale.)Contents
telegraf.spec—%gometa, curated%global buildtags, Go Vendor Tools license macros, sysusers (nouserdelon uninstall), upstream systemd unit (unmodified), logrotate, generated default config, state dir,%check(license check + binary smoke test).go-vendor-tools.toml— askalono detector + manual license entries.telegraf.comp.toml— upstream source plus the full vendor tarball.telegraf.sysusers,telegraf.default,generate_source_tarball.sh,locks/telegraf.lock.Verification
Full mock build passes every phase including
%check. Confirmed in mock:Telegraf 1.38.2(branch stampedazurelinux); functional collection works (cpuinput loads and emits).azure_monitorin/out,azure_data_explorer,github,eventhub_consumer, docker, prometheus, snmp, …); non-curated absent (cloudwatch, sqlserver, nats, clickhouse).telegraf.conf0644 root:root(world-readable, as on Fedora); state dir/var/lib/telegraf0770 root:telegraf(matching upstream InfluxDatapost-install.sh).telegrafuser with home/etc/telegraf(matching upstream InfluxDatauseradd -r -M -d /etc/telegraf; config is read via the unit's explicit-configflag, independent of$HOME); the unit installs; on erase the user is intentionally retained.systemd-analyze verifyaccepts the unit; debuginfo is split into its own subpackage.Why 1.38.2 (not 1.39.0)
telegraf 1.39.0's
go.modrequires Go 1.26; AzL 4.0 currently ships Go 1.25.8, and 1.38.2 is the latest release that builds on it. We can bump to 1.39.0 once AzL golang reaches ≥ 1.26 (which also drops the logzio azure-monitor dependency).Known follow-up
generate_source_tarball.sh, SHA5121108fe48086a7051c5cb89935c6de1c675c3ea8212a979d147ad0c03aef327c6234fa9eee292e4f9594ba9ec2cb757fc9eff46630aea43551bca3d948b30b27f) must be uploaded to the lookaside store before CI source checks and package builds can fetch it; thecomp.tomlsource URI already points at its final published path.