Skip to content

Conversation

@amirbenun
Copy link
Contributor

Adds GCP Infrastructure Manager Terraform modules for deploying Elastic Agent on GCP, providing two deployment options: elastic-agent with a service account for VM-based deployments, and credentials-json for credential-based authentication. The implementation includes modular Terraform configurations with compute instance, service account, and startup validation components, along with comprehensive deploy scripts that handle environment setup, terraform initialization, and deployment automation with proper error handling.

### Summary of your changes
Replaces deprecated GCP Deployment Manager with modern Infrastructure
Manager (Terraform) for deploying Elastic Agent CSPM integration.
Provides identical resources with improved tooling and user experience.

#### New Directory: deploy/infrastructure-manager/gcp-elastic-agent/
Files Added:
main.tf - Main infrastructure configuration (compute instance, network,
service account, IAM bindings)
variables.tf - Input variable definitions
outputs.tf - Deployment outputs
service_account.tf - Standalone service account deployment for agentless
mode
terraform.tfvars.example - Example configuration for main deployment
service_account.tfvars.example - Example configuration for SA-only
deployment
README.md - Comprehensive deployment guide

#### Resources Created
Identical to Deployment Manager implementation:
Compute instance (Ubuntu, n2-standard-4, 32GB disk) with Elastic Agent
pre-installed
Service account with roles/cloudasset.viewer and roles/browser
VPC network with auto-created subnets
IAM bindings (project or organization scope)
Optional SSH firewall rule

#### Compatibility
The new deployment script `infrastructure-manager/deploy.sh` is
compatible with kibana deployment command of the form:
```bash
gcloud config set project elastic-security-test && \
FLEET_URL=https://a6f784d2fb4d48bea7724fbe41ef17d3.fleet.us-central1.gcp.qa.elastic.cloud:443 \
ENROLLMENT_TOKEN=<REDUCTED> \
STACK_VERSION=9.2.3 \
./deploy.sh
```

### Related Issues
- Resolves: elastic#3132

(cherry picked from commit fdf76cc)
### Summary of your changes
Adds a new method for deploying GCP service account credentials using
GCP Infrastructure Manager (Terraform-based) as an alternative to the
existing Deployment Manager approach in deploy/deployment-manager/. The
key improvement is that service account keys are now stored securely in
Secret Manager rather than being exposed in deployment outputs. The
script creates a service account with cloudasset.viewer and browser
roles, stores the JSON key in Secret Manager, and retrieves it locally
to KEY_FILE.json for use in the Elastic Agent GCP integration. Supports
both project-level and organization-level deployments via the ORG_ID
environment variable.

### Screenshot/Data
<!--
If this PR adds a new feature, please add an example screenshot or data
(findings json for example).
-->

### Related Issues
<!--
- Related: https://github.com/elastic/security-team/issues/
- Fixes: https://github.com/elastic/security-team/issues/
-->

### Checklist
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added the necessary README/documentation (if appropriate)

#### Introducing a new rule?

- [ ] Generate rule metadata using [this
script](https://github.com/elastic/cloudbeat/tree/main/security-policies/dev#generate-rules-metadata)
- [ ] Add relevant unit tests
- [ ] Generate relevant rule templates using [this
script](https://github.com/elastic/cloudbeat/tree/main/security-policies/dev#generate-rule-templates),
and open a PR in
[elastic/packages/cloud_security_posture](https://github.com/elastic/integrations/tree/main/packages/cloud_security_posture)

(cherry picked from commit e20e115)
…y script (elastic#3865)

Refactors the error handling in the GCP Elastic Agent Infrastructure
Manager deployment script to use a more idiomatic shell pattern. The
change replaces the explicit exit code capture and conditional check
with a direct if-not pattern for the gcloud command, making the script
more readable and following shell scripting best practices.

(cherry picked from commit 59c61ae)
@amirbenun amirbenun requested a review from a team as a code owner January 22, 2026 08:02
Copilot AI review requested due to automatic review settings January 22, 2026 08:02
@amirbenun amirbenun changed the title feat: add GCP Infrastructure Manager Terraform modules feat: add GCP Infrastructure Manager Terraform modules [9.2] Jan 22, 2026
@amirbenun amirbenun linked an issue Jan 22, 2026 that may be closed by this pull request
2 tasks
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds GCP Infrastructure Manager Terraform modules for deploying Elastic Agent on Google Cloud Platform. It provides two deployment approaches: a VM-based deployment with integrated service account authentication (gcp-elastic-agent), and a standalone service account credential generator (gcp-credentials-json). The implementation includes comprehensive automation scripts, modular Terraform configurations, and startup validation capabilities.

Changes:

  • Added complete Terraform module for VM-based Elastic Agent deployment with service account, compute instance, and startup validation submodules
  • Implemented credentials-json module for generating service account keys stored in Secret Manager
  • Included automated deployment scripts with prerequisite setup, error handling, and user guidance

Reviewed changes

Copilot reviewed 23 out of 23 changed files in this pull request and generated 21 comments.

Show a summary per file
File Description
deploy/infrastructure-manager/gcp-elastic-agent/main.tf Root module orchestrating service account, compute instance, and startup validation
deploy/infrastructure-manager/gcp-elastic-agent/variables.tf Input variable definitions including fleet URL, tokens, and configuration options
deploy/infrastructure-manager/gcp-elastic-agent/outputs.tf Module outputs exposing instance details and validation status
deploy/infrastructure-manager/gcp-elastic-agent/deploy.sh Deployment automation script with GCP Infrastructure Manager integration
deploy/infrastructure-manager/gcp-elastic-agent/deploy_service_account.sh Wrapper script delegating to credentials-json deployment
deploy/infrastructure-manager/gcp-elastic-agent/setup.sh Prerequisites setup enabling APIs and configuring service accounts
deploy/infrastructure-manager/gcp-elastic-agent/README.md Comprehensive documentation for deployment options and troubleshooting
deploy/infrastructure-manager/gcp-elastic-agent/modules/compute_instance/main.tf Compute instance with VPC network and Elastic Agent installation startup script
deploy/infrastructure-manager/gcp-elastic-agent/modules/compute_instance/variables.tf Compute instance module inputs for configuration
deploy/infrastructure-manager/gcp-elastic-agent/modules/compute_instance/outputs.tf Instance metadata outputs
deploy/infrastructure-manager/gcp-elastic-agent/modules/service_account/main.tf Service account creation with project/org-level IAM bindings
deploy/infrastructure-manager/gcp-elastic-agent/modules/service_account/variables.tf Service account module configuration inputs
deploy/infrastructure-manager/gcp-elastic-agent/modules/service_account/outputs.tf Service account email output
deploy/infrastructure-manager/gcp-elastic-agent/modules/startup_validation/main.tf Local-exec provisioner validating startup completion via guest attributes
deploy/infrastructure-manager/gcp-elastic-agent/modules/startup_validation/variables.tf Validation configuration inputs
deploy/infrastructure-manager/gcp-elastic-agent/modules/startup_validation/outputs.tf Validation status output
deploy/infrastructure-manager/gcp-elastic-agent/modules/startup_validation/validate_startup.sh Bash script polling guest attributes to verify installation success
deploy/infrastructure-manager/gcp-credentials-json/main.tf Service account key generation with Secret Manager storage
deploy/infrastructure-manager/gcp-credentials-json/variables.tf Credentials module input variables with scope validation
deploy/infrastructure-manager/gcp-credentials-json/outputs.tf Service account and secret outputs
deploy/infrastructure-manager/gcp-credentials-json/deploy.sh Deployment script retrieving and saving credentials locally
deploy/infrastructure-manager/gcp-credentials-json/setup.sh Prerequisites configuration for credentials deployment
deploy/infrastructure-manager/gcp-credentials-json/README.md Documentation for credentials-based authentication setup

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

INSTANCE_NAME="$3"
TIMEOUT="$4"

MAX_ATTEMPTS=$((TIMEOUT / 10))
Copy link

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The validation script computes MAX_ATTEMPTS by dividing TIMEOUT by 10, but there's an edge case where if TIMEOUT is less than 10 seconds, MAX_ATTEMPTS would be 0, causing the while loop to never execute. Consider adding a minimum value check or ensuring at least 1 attempt is made.

Suggested change
MAX_ATTEMPTS=$((TIMEOUT / 10))
MAX_ATTEMPTS=$((TIMEOUT / 10))
if [ "$MAX_ATTEMPTS" -lt 1 ]; then
MAX_ATTEMPTS=1
fi

Copilot uses AI. Check for mistakes.
response=$(curl -s -H "Authorization: Bearer $TOKEN" \
"https://compute.googleapis.com/compute/v1/projects/${PROJECT_ID}/zones/${ZONE}/instances/${INSTANCE_NAME}/getGuestAttributes?queryPath=elastic-agent/$key" \
2>/dev/null || echo '{}')
echo "$response" | sed -n 's/.*"value":[[:space:]]*"\([^"]*\)".*/\1/p' | head -1
Copy link

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sed command to extract the JSON value from the API response is fragile and could fail if the JSON format changes or if there are multiple 'value' fields in the response. Consider using a proper JSON parser like 'jq' which is commonly available in Cloud Shell and would make the parsing more reliable and maintainable.

Suggested change
echo "$response" | sed -n 's/.*"value":[[:space:]]*"\([^"]*\)".*/\1/p' | head -1
if command -v jq >/dev/null 2>&1; then
echo "$response" | jq -r '.queryValue.value // empty'
else
echo "$response" | sed -n 's/.*"value":[[:space:]]*"\([^"]*\)".*/\1/p' | head -1
fi

Copilot uses AI. Check for mistakes.
ARTIFACT_URL="${var.elastic_artifact_server}/$ElasticAgentArtifact.tar.gz"
log "Downloading Elastic Agent from $ARTIFACT_URL"
if ! curl -f -L -O --connect-timeout 30 --max-time 300 "$ARTIFACT_URL"; then
Copy link

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The download command uses --connect-timeout 30 and --max-time 300, but if the download is slow (e.g., on a poor network connection), the max-time could be exceeded even though the download is progressing. This could cause false failures. Consider either increasing the timeout for larger artifacts or implementing a more sophisticated progress-based timeout that allows for slow but steady downloads.

Suggested change
if ! curl -f -L -O --connect-timeout 30 --max-time 300 "$ARTIFACT_URL"; then
# Use a connection timeout but no overall max-time to avoid failing on slow but progressing downloads
if ! curl -f -L -O --connect-timeout 30 "$ARTIFACT_URL"; then

Copilot uses AI. Check for mistakes.
Comment on lines +73 to +77
# Store the service account key in Secret Manager
resource "google_secret_manager_secret_version" "sa_key" {
secret = google_secret_manager_secret.sa_key.id
secret_data = google_service_account_key.elastic_agent_key.private_key
}
Copy link

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The script creates a service account key and stores it in Secret Manager, but there's no validation that the key was successfully created before attempting to store it. If the key creation fails silently, the Secret Manager resource will be created with invalid data. Consider adding explicit validation or error handling between key creation and storage steps.

Copilot uses AI. Check for mistakes.
Comment on lines +40 to +42
INPUT_VALUES="${INPUT_VALUES},fleet_url=${FLEET_URL}"
INPUT_VALUES="${INPUT_VALUES},enrollment_token=${ENROLLMENT_TOKEN}"
INPUT_VALUES="${INPUT_VALUES},elastic_agent_version=${STACK_VERSION}"
Copy link

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The script depends on environment variables like FLEET_URL, ENROLLMENT_TOKEN, and STACK_VERSION being set (lines 40-42), but there's no validation to ensure these required variables are non-empty before attempting deployment. If any of these are unset or empty, the deployment will fail with unclear error messages from gcloud. Consider adding validation checks at the beginning of the script to provide clear error messages for missing required variables.

Copilot uses AI. Check for mistakes.
Comment on lines +5 to +9
# Token from environment variable (sensitive)
TOKEN="$GCP_ACCESS_TOKEN"

# Arguments passed from Terraform
PROJECT_ID="$1"
Copy link

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ACCESS_TOKEN is passed as an environment variable but used directly in the curl command with 'set +x' disabled. However, the token could still appear in error messages or in the process table. Consider using a more secure method such as reading from a file descriptor or using a credentials helper. Additionally, the comment says "(sensitive)" but the token will be visible in the terraform state and could be logged if the script fails.

Copilot uses AI. Check for mistakes.
Comment on lines +20 to +24
local response
response=$(curl -s -H "Authorization: Bearer $TOKEN" \
"https://compute.googleapis.com/compute/v1/projects/${PROJECT_ID}/zones/${ZONE}/instances/${INSTANCE_NAME}/getGuestAttributes?queryPath=elastic-agent/$key" \
2>/dev/null || echo '{}')
echo "$response" | sed -n 's/.*"value":[[:space:]]*"\([^"]*\)".*/\1/p' | head -1
Copy link

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error handling doesn't distinguish between different failure modes when calling 'curl'. A network error, authentication failure, or API error would all result in the same '{}' response. Consider checking the HTTP status code or curl exit code to provide more specific error messages and potentially retry on transient failures.

Suggested change
local response
response=$(curl -s -H "Authorization: Bearer $TOKEN" \
"https://compute.googleapis.com/compute/v1/projects/${PROJECT_ID}/zones/${ZONE}/instances/${INSTANCE_NAME}/getGuestAttributes?queryPath=elastic-agent/$key" \
2>/dev/null || echo '{}')
echo "$response" | sed -n 's/.*"value":[[:space:]]*"\([^"]*\)".*/\1/p' | head -1
local response http_code body
# Call the GCE guest attributes API and capture both body and HTTP status code.
# Using -w '%{http_code}' appends the status code to the end of the output.
if ! response=$(curl -s -H "Authorization: Bearer $TOKEN" \
-w '%{http_code}' \
"https://compute.googleapis.com/compute/v1/projects/${PROJECT_ID}/zones/${ZONE}/instances/${INSTANCE_NAME}/getGuestAttributes?queryPath=elastic-agent/$key" \
2>/dev/null); then
echo "Warning: Failed to call GCE guest attributes API for key '$key' (curl error)." >&2
# Return an empty string so callers treat this as an unknown value.
echo ""
return
fi
# Separate HTTP status code from the response body.
http_code="${response: -3}"
body="${response::-3}"
case "$http_code" in
200)
# Successful response; extract the guest attribute value.
echo "$body" | sed -n 's/.*"value":[[:space:]]*"\([^"]*\)".*/\1/p' | head -1
;;
401|403)
echo "Warning: Authentication/authorization failure when fetching guest attribute '$key' (HTTP $http_code)." >&2
echo ""
;;
5??)
echo "Warning: Server error from GCE guest attributes API when fetching '$key' (HTTP $http_code)." >&2
echo ""
;;
*)
echo "Warning: Unexpected HTTP status $http_code from GCE guest attributes API when fetching '$key'." >&2
echo ""
;;
esac

Copilot uses AI. Check for mistakes.
Comment on lines +50 to +51
# Remove trailing slash if present
ELASTIC_ARTIFACT_SERVER="${ELASTIC_ARTIFACT_SERVER%/}"
Copy link

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ELASTIC_ARTIFACT_SERVER is stripped of trailing slashes if present, but if the user provides a URL with multiple trailing slashes (e.g., "https://example.com///"), only one will be removed. Consider using a more robust approach like parameter expansion pattern matching (${ELASTIC_ARTIFACT_SERVER%%/}) to remove all trailing slashes, or using bash string manipulation to handle this edge case properly.

Suggested change
# Remove trailing slash if present
ELASTIC_ARTIFACT_SERVER="${ELASTIC_ARTIFACT_SERVER%/}"
# Remove all trailing slashes if present
ELASTIC_ARTIFACT_SERVER="${ELASTIC_ARTIFACT_SERVER%%/}"

Copilot uses AI. Check for mistakes.
cd "$ElasticAgentArtifact"
# Install Elastic Agent
log "Installing Elastic Agent with command: ${local.install_command}"
Copy link

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The startup script enables shell xtrace with set -x and then executes the Elastic Agent installation command including the sensitive --enrollment-token=${var.enrollment_token} argument. With xtrace enabled, the full command line (including the enrollment token) will be written to the instance’s console output and system logs, which are collected by GCE/Cloud Logging and accessible to anyone with log viewing permissions, effectively leaking the enrollment token. Remove or disable set -x around commands that include var.enrollment_token, or otherwise ensure that the enrollment token is never printed to stdout/stderr or system logs during startup.

Suggested change
log "Installing Elastic Agent with command: ${local.install_command}"
log "Installing Elastic Agent with command: ${local.install_command}"
# Disable xtrace to avoid leaking sensitive enrollment token in logs
set +x

Copilot uses AI. Check for mistakes.
Comment on lines +83 to +113
# Download Elastic Agent
ElasticAgentArtifact=elastic-agent-${var.elastic_agent_version}-linux-x86_64
ARTIFACT_URL="${var.elastic_artifact_server}/$ElasticAgentArtifact.tar.gz"
log "Downloading Elastic Agent from $ARTIFACT_URL"
if ! curl -f -L -O --connect-timeout 30 --max-time 300 "$ARTIFACT_URL"; then
report_failure "Failed to download Elastic Agent from $ARTIFACT_URL"
fi
log "Download successful"
# Verify download
if [ ! -f "$ElasticAgentArtifact.tar.gz" ]; then
report_failure "Downloaded file not found: $ElasticAgentArtifact.tar.gz"
fi
# Extract archive
log "Extracting $ElasticAgentArtifact.tar.gz"
if ! tar xzvf "$ElasticAgentArtifact.tar.gz"; then
report_failure "Failed to extract $ElasticAgentArtifact.tar.gz"
fi
# Verify extraction
if [ ! -d "$ElasticAgentArtifact" ]; then
report_failure "Extracted directory not found: $ElasticAgentArtifact"
fi
cd "$ElasticAgentArtifact"
# Install Elastic Agent
log "Installing Elastic Agent with command: ${local.install_command}"
if ! ${local.install_command} --url=${var.fleet_url} --enrollment-token=${var.enrollment_token}; then
Copy link

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The startup script downloads and executes the Elastic Agent binary from var.elastic_artifact_server using curl and tar without any integrity verification (no checksum or signature check) before running ./elastic-agent install with root privileges. If the artifact server, DNS, or network path is compromised—or if elastic_artifact_server is pointed at a malicious host—an attacker can supply a tampered agent archive and gain code execution on the VM via the startup script. Add a cryptographic integrity check (e.g., vendor-published checksum or signature verification) for the downloaded tarball before extracting and executing it, and consider restricting elastic_artifact_server to trusted HTTPS endpoints only.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Replace GCP deployment-manager scripts with infrastructure-manager

1 participant