Skip to content

feat: native OCI (Oracle Cloud Infrastructure) compute service discovery #18707

@amaanx86

Description

@amaanx86

Summary

Add native service discovery for Oracle Cloud Infrastructure (OCI) compute instances, following the same pattern as existing cloud SD integrations (Hetzner, Vultr, IONOS, etc.).

Prior art

#10226 was opened in 2022 by an Oracle employee and closed without a PR landing. The closure note recommended HTTP SD as the preferred path. I have been running oci-prometheus-sd-proxy - an HTTP SD implementation for OCI - in production and it works well. However, native SD is still the better long-term experience for users: no extra service to deploy, no token management overhead, and consistent __meta_oci_* labels alongside other cloud providers in the same Prometheus config.

Why now / why me

  • I run OCI compute at scale and can validate the implementation against a real fleet, not mocked fixtures.
  • Oracle offers an always-free tier with 4 Arm cores + 24 GB RAM - I am happy to wire up a dedicated OCI account for CI-level integration tests so the maintainers do not need OCI access themselves.
  • I will take full ownership of the implementation, tests, docs, and ongoing maintenance. If OCI APIs change, I will send follow-up PRs.
  • OCI is the only major hyperscaler without native Prometheus SD. AWS (EC2), GCP (GCE), Azure, and OpenStack are all covered.

Binary size concern

The previous attempt reportedly added ~12 MB to the binary. I plan to address this upfront:

  • Audit which OCI SDK packages are actually needed (Compute + VirtualNetwork + Identity only - no broad SDK import).
  • Vendor only those packages and their transitive deps, similar to how other cloud SDs scope their imports.
  • Measure the size delta before opening a PR and document it explicitly.

If the delta remains too large, I will propose a build-tag approach (--tags=oci) consistent with how some integrations are conditionally compiled, and raise that trade-off with maintainers before the PR is ready.

Proposed labels

Consistent with existing cloud SD labels, the integration would expose:

__meta_oci_instance_id
__meta_oci_instance_name
__meta_oci_instance_state
__meta_oci_instance_shape
__meta_oci_availability_domain
__meta_oci_fault_domain
__meta_oci_region
__meta_oci_tenancy_id
__meta_oci_compartment_id
__meta_oci_private_ip
__meta_oci_public_ip
__meta_oci_tag_<key>             (freeform tags)
__meta_oci_defined_tag_<ns>_<key> (defined tags)

Auth support

  • API key (user OCID + RSA key) - works anywhere
  • Instance principal - works when Prometheus runs on OCI compute (no credentials needed, uses IMDS)

Maintenance commitment

I am committing to:

  • Responding to issues and PRs touching discovery/oci within a reasonable time
  • Keeping the OCI SDK dependency up to date
  • Providing a real OCI environment for integration test validation

Happy to start a draft PR once there is a signal that this is welcome. If binary size is a blocker, I will open a focused discussion on that before writing the full implementation.

/cc @roidelapluie @bboreham

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions