feat: add gRPC ToolQuery service for observability tools integration#3230
Conversation
- Introduced ToolQuery protobuf definitions for metrics, logs, and traces queries. - Implemented ToolQueryService to handle gRPC requests for external observability tools. - Added new dependencies for DataDog, ElasticSearch, and Prometheus integrations. - Updated command-line arguments and documentation for PostgreSQL-backed features.
|
Review the following changes in direct dependencies. Learn more about Socket for GitHub.
|
|
Warning Review the following alerts detected in dependencies. According to your organization's Security Policy, it is recommended to resolve "Warn" alerts. Learn more about Socket for GitHub.
|
…-workbench-tool-grpc-interface
- Included the workflow file path in PR triggers. - Modified the version tagging format to follow semver conventions for PRs and commits. - Ensured tags are consistent for `latest` releases.
Greptile SummaryThis PR adds a new gRPC ToolQuery service for querying external observability tools (Prometheus, Loki, Tempo, Datadog, Elasticsearch) and makes the database-backed CloudQuery service optional via configuration. Key changes:
Minor observation:
Confidence Score: 5/5
|
| Filename | Overview |
|---|---|
| go/cloud-query/internal/tools/provider_datadog.go | New Datadog provider implementation with proper context handling for authentication and correct time range configuration for all query types |
| go/cloud-query/internal/tools/provider_loki.go | Loki provider correctly uses UnixNano() for timestamp conversion to nanosecond-precision Unix timestamps |
| go/cloud-query/internal/tools/provider.go | Provider factory correctly propagates Elasticsearch initialization errors to callers |
| go/cloud-query/api/proto/toolquery.proto | Clean protobuf definitions for ToolQuery gRPC service with minor field numbering gap in LokiConnection (field 2 skipped) |
| go/cloud-query/cmd/main.go | Added conditional service registration based on database-enabled flag, ToolQuery service always registered |
| go/cloud-query/internal/service/toolquery.go | ToolQuery gRPC service with proper input validation and error mapping |
| charts/console/templates/cloud-query/deployment.yaml | Deployment template conditionally includes database container and environment variables based on database.enabled flag |
Last reviewed commit: 6d0bd4b
- Changed version tags for PRs from `0.0.0-pr.<number>` to `0.0.0+pr-<number>`. - Adjusted SHA-based tags in the same format for consistency.
- Deleted `helpers.go` to remove redundant utilities for time parsing and URL manipulation. - Enhanced loki provider by using UnixNano for logs interval. - Improved provider initializations with error handling. - Adjusted Datadog client creation to pass context alongside API client. - Simplified step duration parsing in Prometheus provider. - Updated proto for adjusted field numbers in connections. - Removed unnecessary time range parsing in Tempo provider.
- Introduced `GetPluralEnvBool` for boolean environment variable management. - Updated command-line argument `database-enabled` to utilize the new environment handling function. - Simplified Kubernetes deployment YAML by improving indentation consistency.
- Introduced `validateInput` and `validateTimeRange` functions to ensure inputs are not nil and have valid values. - Enhanced validation for connection, query, and time range parameters in `Metrics`, `Logs`, and `Traces` methods. - Improved handling of tags in `provider_datadog.go` by deriving labels from series scope when missing.
…uest size handling - Removed redundant limit determination logic in Elasticsearch provider. - Simplified query string query with more concise construction. - Adjusted request size setup to follow query construction.
- Added `prometheus.go` for Prometheus HTTP client creation with authentication support. - Introduced `elastic.go` for transforming Elasticsearch source data into log entries. - Updated `provider_elastic.go` to integrate `ElasticSource` for improved data handling. - Refactored `provider_prometheus.go` to utilize new Prometheus HTTP client and removed `resty` dependency.
… integration - Moved `TempoTraceResponse` struct to `datasource` package for improved modularity. - Updated `TempoClient` to use `datasource.TempoTraceResponse`. - Refactored `provider_tempo.go` to leverage new `TraceResponse` handling with reduced complexity.
- Removed `OTLPBatch` struct in favor of using `OTLPResourceSpans` directly. - Streamlined `TempoTraceResponse` construction by appending `Batches` to `ResourceSpans`. - Simplified data structure by removing redundant batch processing logic.
- Changed `site` to optional and `appKey` to required in `DatadogConnection` definitions. - Updated API and client connection details. - Enhanced README and reference documentation with compatibility matrix and detailed client and endpoint information for ToolQuery integrations. - Refined command-line arguments and environment variables documentation for Cloud-Query configuration.
| repeated LogEntry logs = 1; | ||
| } | ||
|
|
||
| message TraceSpan { |
There was a problem hiding this comment.
i don't feel like this is intuitively the right datastructure, what does tempo use (or opentelemetry)?
This pull request introduces significant enhancements to the observability and configuration flexibility of the CloudQuery service, most notably by adding a new ToolQuery gRPC API for querying external observability tools (metrics, logs, traces), and by making the database-backed CloudQuery service optional via a new configuration flag. It also updates Helm chart templates and workflows to improve deployment and versioning. Below are the most important changes:
1. Observability: New ToolQuery gRPC API
ToolQuerygRPC service for querying external observability tools (Prometheus, Loki, Tempo, Datadog, Elastic) with support for metrics, logs, and traces. This includes new proto definitions (toolquery.proto,timestamp.proto) and detailed API documentation with usage examples.2. CloudQuery Service Configuration
--database-enabledflag (and corresponding Helm value) to optionally disable the PostgreSQL-backed CloudQuery service. When disabled, only the ToolQuery service is registered and database health checks are skipped. Documentation updated to explain this behavior.3. Helm Chart Improvements
cloud-querydeployment template to conditionally include the database container and relevant environment variables based on the.Values.cloudQuery.database.enabledflag. The secret template is also now conditional on this flag.4. CI/CD and Versioning Enhancements
0.0.0+pr-<number>,0.0.0+sha-<sha>,${{ steps.chart_version.outputs.version }}+latest) for PR and release builds.Test Plan
Locally + using
https://console.plrl-dev-aws.onplural.sh.Checklist
Plural Flow: console