Add AWS IAM auth for RDS MySQL/MariaDB and ElastiCache Redis/Valkey#31075
Add AWS IAM auth for RDS MySQL/MariaDB and ElastiCache Redis/Valkey#31075titanous wants to merge 12 commits intofleetdm:mainfrom
Conversation
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the 📝 WalkthroughWalkthroughAdds IAM-based authentication for MySQL (RDS) and Redis (ElastiCache). Introduces optional STS assume role and external ID config for both. Implements token generation, caching, and connection flows, updates configs and docs, adds tests for endpoint parsing, updates dependencies, and provides test tooling with CloudFormation templates and CLI utilities. Changes
Sequence Diagram(s)sequenceDiagram
participant Fleet
participant Config
participant AWS as AWS SDK
participant STS as STS
participant RDS as RDS MySQL
Fleet->>Config: Load MySQL config (addr, user, sts_assume_role_arn, external_id)
alt RDS endpoint and no password
Fleet->>AWS: Load config (region)
opt sts_assume_role_arn provided
AWS->>STS: AssumeRole(external_id?)
STS-->>AWS: Temp credentials
end
Fleet->>AWS: Build RDS IAM auth token
Fleet->>RDS: Connect using token (TLS)
else otherwise
Fleet->>RDS: Connect using standard DSN
end
sequenceDiagram
participant Fleet
participant Config
participant AWS as AWS SDK
participant STS as STS
participant EC as ElastiCache Redis
Fleet->>Config: Load Redis config (addr, username, sts_assume_role_arn, external_id)
alt ElastiCache endpoint and no password
Fleet->>AWS: Load config (region)
opt sts_assume_role_arn provided
AWS->>STS: AssumeRole(external_id?)
STS-->>AWS: Temp credentials
end
Fleet->>AWS: Presign elasticache connect URL (token)
Fleet->>EC: TLS connect using IAM token as password
else otherwise
Fleet->>EC: Connect with configured password or no auth
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Assessment against linked issues
Assessment against linked issues: Out-of-scope changes
✨ Finishing Touches🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #31075 +/- ##
==========================================
- Coverage 63.99% 63.93% -0.06%
==========================================
Files 1911 1914 +3
Lines 188185 188508 +323
Branches 5594 5594
==========================================
+ Hits 120428 120528 +100
- Misses 58302 58511 +209
- Partials 9455 9469 +14
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Thanks @titanous! Just a heads up that we prioritized these changes and the changes in your separate PR (#31134). We peeled off a user story for these changes. This story is now going through drafting. All changes to Fleet are brought through drafting as user stories. Once the story goes through drafting, we'll have more info you for you on proposed changes/tweaks (if any). Thanks for your patience. In the meantime, setting this PR to draft. You can follow along on drafting progress in the user story: #31321 |
|
@noahtalerman thanks, note that this PR (unlike the separate one) covers two issues (linked at the top) that have already been through at least some of your process. |
|
|
||
| For the address of the Redis server that Fleet should connect to, include the hostname and port. | ||
|
|
||
| If an AWS ElastiCache endpoint is specified and `redis_password` isn't specified, Identity and Access Management (IAM) authentication is automatically used. |
There was a problem hiding this comment.
@titanous just checking, is this a breaking change for AWS users that don't set redis_password today?
Put differently, will this force these users to add a role in AWS or make another change when they upgrade Fleet?
Same question for MySQL: https://github.com/fleetdm/fleet/pull/31075/files#diff-66314062a452d1931a2ffbfa8b76a0f1e3e2e81c5290c451313194bbc59f902aR26
There was a problem hiding this comment.
Ah, interesting. I think it may be possible that there are Elasticache clusters without a password enabled. I don't think it's possible to configure RDS with no password but I'm not 100% certain.
There was a problem hiding this comment.
I think we can mitigate this on the Redis side by checking if redis_username is populated with no redis_password, because that config shouldn't exist in the wild and IAM auth requires a username.
There was a problem hiding this comment.
I think it may be possible that there are Elasticache clusters without a password enabled
There are! My understanding is that Fleet is doing this for managed-cloud offerings. @rfairburn @BCTBB please correct me if I'm wrong.
Assuming this is true, I don't think we want to force a password (breaking change).
Instead, I think we want to update this behavior: IAM is only used if redis_sts_assume_role_arn is specified.
Meaning, If an AWS ElastiCache endpoint is specified and redis_password isn't specified, Fleet functions as it does today. Identity and Access Management (IAM) authentication isn't automatically used.
What do y'all think?
There was a problem hiding this comment.
We can't key off of redis_sts_assume_role_arn, because role assumption is not required (eg. instance/container/pod roles that are ambiently in the environment do not require additional role assumption). I think just adding a check to make sure there is a username specified should handle this case.
There was a problem hiding this comment.
Why not just be specific about it and have a "use IAM authentication" config var that will be set to true if it is intended to be used? Assumptions of intention can lead to confusion.
There was a problem hiding this comment.
Yeah, makes sense. I was trying to make the UX nice with a happy path but not a big deal to just have a flag.
There was a problem hiding this comment.
How does it work today for all other IAM Fleet server config options? ( firehose, kinesis, lambda, etc.)
It's IAM (if IAM is set), and if IAM isn't set fall back to basic auth. Right? We don't have a use IAM auth. I think we want to be consistent for now. I updated the docs to clarify this.
For example:
There was a problem hiding this comment.
@noahtalerman currently those will use the environmental credentials (aka IAM auth) if nothing is configured
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Actionable comments posted: 16
🔭 Outside diff range comments (1)
server/aws_common/iam_auth.go (1)
83-101: Propagate ctx into ConfigureAssumeRoleProvider (fix required)I verified server/aws_common/aws_common.go: ConfigureAssumeRoleProvider calls aws_config.LoadDefaultConfig(context.Background(), ...), so caller cancellation/deadlines are lost. Thread a context.Context through ConfigureAssumeRoleProvider and update all callers to pass ctx.
Files to change
- server/aws_common/aws_common.go — change signature and use ctx when calling LoadDefaultConfig.
- Update callers to pass ctx:
- server/aws_common/iam_auth.go
- server/mail/ses.go
- server/logging/firehose.go
- server/logging/lambda.go
- server/logging/kinesis.go
- server/datastore/s3/s3.go
Suggested minimal changes (illustrative)
- aws_common.go: add ctx param and use it
-func ConfigureAssumeRoleProvider( - conf aws.Config, - opts []func(*aws_config.LoadOptions) error, - stsAssumeRoleARN, - stsExternalID string, -) (aws.Config, error) { +func ConfigureAssumeRoleProvider( + ctx context.Context, + conf aws.Config, + opts []func(*aws_config.LoadOptions) error, + stsAssumeRoleARN, + stsExternalID string, +) (aws.Config, error) { @@ - conf, err := aws_config.LoadDefaultConfig(context.Background(), opts...) + conf, err := aws_config.LoadDefaultConfig(ctx, opts...)
- Call sites (example in iam_auth.go)
- cfg, err = ConfigureAssumeRoleProvider(cfg, opts, assumeRoleArn, stsExternalID) + cfg, err = ConfigureAssumeRoleProvider(ctx, cfg, opts, assumeRoleArn, stsExternalID)Reasoning and reference
- Propagating context preserves cancellation/timeouts from the caller. See Go context guidelines: https://pkg.go.dev/context
Would you like me to open a small follow-up issue/PR to thread ctx through these call sites?
🧹 Nitpick comments (23)
tools/mysql-tests/rds/README.md (3)
15-16: Clarify region environment variable for AWS CLIAWS CLI honors AWS_DEFAULT_REGION in addition to AWS_REGION. To avoid confusion across shells and tooling, consider mentioning both.
Apply this diff to expand the example:
-# Set AWS credentials and region -export AWS_PROFILE=your-profile # or use AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY -export AWS_REGION=us-east-2 +# Set AWS credentials and region +export AWS_PROFILE=your-profile # or use AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY +# The CLI and SDK honor either of these. Set one or both for clarity. +export AWS_REGION=us-east-2 +export AWS_DEFAULT_REGION=us-east-2
21-26: Add cost and quota disclaimerA short note helps folks avoid surprises when provisioning multiple RDS instances and an EC2 host.
You could append:
This will: - Create all RDS instances with IAM authentication enabled - Create an EC2 instance with necessary IAM permissions - Create database users for IAM authentication - Output connection information + +Note: This creates paid AWS resources. Be mindful of account quotas and costs. Clean up when finished.
27-33: Mention cleanup prerequisites (profile and region)Make it explicit the same AWS_PROFILE/region must be set to delete the right stack.
A small addition keeps users safe:
### Clean Up ```bash +# Ensure the same AWS_PROFILE and region used for deployment are set ./cleanup-test-env.sh</blockquote></details> <details> <summary>server/config/config.go (1)</summary><blockquote> `1051-1053`: **Edge case: external ID without an assume role ARN** If users set sts_external_id without sts_assume_role_arn, the external ID is unused. Consider a gentle validation or log to inform users. If you want, I can add a small validation step that logs a warning when sts_external_id is set but sts_assume_role_arn is empty. Also applies to: 1086-1088 </blockquote></details> <details> <summary>tools/redis-tests/elasticache/README.md (4)</summary><blockquote> `5-9`: **Clarify prerequisites (permissions, tooling, and cost notice).** Small additions improve setup reliability and avoid surprises. Apply this diff to expand the prerequisites: ```diff ## Prerequisites -- AWS CLI configured with appropriate credentials (via `AWS_PROFILE` or `AWS_ACCESS_KEY_ID`/`AWS_SECRET_ACCESS_KEY`) +- AWS CLI v2 installed and configured with appropriate credentials (via `AWS_PROFILE` or `AWS_ACCESS_KEY_ID`/`AWS_SECRET_ACCESS_KEY`) +- An AWS region set (e.g., `AWS_REGION` or `AWS_DEFAULT_REGION`) +- Sufficient IAM permissions to create and delete resources used by the stack (CloudFormation, EC2/VPC, ElastiCache, IAM, and CloudWatch Logs) +- Note: Deploying this stack will incur AWS charges. Remember to run cleanup when you are done.
11-20: Add a quick “verify environment” step before deployment.Helps catch misconfigured credentials or regions early.
Apply this diff to add a short preflight:
### 1. Deploy Test Environment ```bash # Set AWS credentials and region export AWS_PROFILE=your-profile # or use AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY export AWS_REGION=us-east-2 + +# Verify identity and region +aws sts get-caller-identity +echo "Region: ${AWS_REGION:-$AWS_DEFAULT_REGION}" # Deploy the test environment ./deploy-test-env.sh--- `22-28`: **Document how to find and use the outputs.** Readers will need endpoints/hostnames to run the test tools; point them to where deploy script surfaces these. Apply this diff to clarify outputs: ```diff This creates: - VPC with subnets and security groups - EC2 instance for running tests - ElastiCache Serverless instance with IAM auth - ElastiCache Standalone replication group with IAM auth - IAM users and roles for authentication + +On success, the deploy script prints the Redis endpoints and instance details. It also writes the outputs to a local file for convenience (the script logs the path). Use those values with the test tools in this directory.
29-35: Set expectations for teardown timing and completeness.Make it clear cleanup can take several minutes.
Apply this diff to expand cleanup notes:
### 2. Clean Up ```bash ./cleanup-test-env.sh-This will destroy the AWS resources.
+This will destroy the AWS resources. Teardown can take 10–20 minutes depending on service state.</blockquote></details> <details> <summary>server/datastore/mysql/common_mysql/aws_iam_auth_test.go (1)</summary><blockquote> `9-121`: **Parallelize subtests and add a couple of edge cases.** Running subtests in parallel speeds things up. Adding us-gov-east-1 and a China endpoint with port improves resiliency. Apply this diff to parallelize and add cases: ```diff func TestExtractRDSRegion(t *testing.T) { + t.Parallel() tests := []struct { @@ for _, tt := range tests { t.Run(tt.name, func(t *testing.T) { + t.Parallel() got, err := extractRDSRegion(tt.endpoint) @@ }) } }Add these additional cases to the
testsslice:@@ { name: "GovCloud region", endpoint: "database.abc123.us-gov-west-1.rds.amazonaws.com", want: "us-gov-west-1", wantErr: false, }, + { + name: "GovCloud us-gov-east-1", + endpoint: "db123.abc123.us-gov-east-1.rds.amazonaws.com", + want: "us-gov-east-1", + wantErr: false, + }, @@ { name: "Endpoint with port should fail", endpoint: "mydb.abc123.us-east-1.rds.amazonaws.com:3306", wantErr: true, }, + { + name: "China endpoint with port should fail", + endpoint: "mydb.abc123.cn-north-1.rds.amazonaws.com.cn:3306", + wantErr: true, + },server/datastore/redis/redis_test.go (1)
115-250: Parallelize subtests and cover China and GovCloud partitions.This widens confidence that parseElastiCacheEndpoint handles all AWS partitions.
Apply this diff to parallelize:
func TestParseElastiCacheEndpoint(t *testing.T) { + t.Parallel() @@ for _, tt := range tests { t.Run(tt.name, func(t *testing.T) { + t.Parallel() gotRegion, gotCacheName, err := parseElastiCacheEndpoint(tt.endpoint)Add these cases to the
testsslice:@@ // Error cases + { + name: "China partition serverless", + endpoint: "prod-cache-xyz789.serverless.cnn1.cache.amazonaws.com.cn", + wantRegion: "cn-north-1", + wantCacheName: "prod-cache", + wantErr: false, + }, + { + name: "China partition standalone", + endpoint: "master.cn-cache.abc123.cnnw1.cache.amazonaws.com.cn:6379", + wantRegion: "cn-northwest-1", + wantCacheName: "cn-cache", + wantErr: false, + }, + { + name: "GovCloud partition", + endpoint: "gov-cache.abc123.us-gov-west-1.cache.amazonaws.com", + wantRegion: "us-gov-west-1", + wantCacheName: "gov-cache", + wantErr: false, + },tools/redis-tests/elasticache/iam_auth.go (1)
12-17: Let -addr be portless (default 6379) and reflect that in the help text.This makes the tool friendlier with AWS-provided endpoints that often omit ports.
Apply this diff to adjust the flag usage:
-var ( - addrFlag = flag.String("addr", "", "ElastiCache endpoint address, including port") +var ( + addrFlag = flag.String("addr", "", "ElastiCache endpoint address (optionally with :port, defaults to 6379)") userFlag = flag.String("user", "", "Username for IAM authentication") assumeRoleFlag = flag.String("assume-role", "", "STS assume role ARN (optional)") externalIDFlag = flag.String("external-id", "", "STS external ID (optional)") )tools/mysql-tests/rds/iam_auth.go (1)
54-57: Consider logging the final normalized address.Helps when users pass host:port and the tool overrides the provided -port.
Apply this diff to improve visibility:
- log.Printf("Connecting to RDS at %s:%s with IAM auth for user %s", *endpointFlag, *portFlag, *userFlag) + log.Printf("Connecting to RDS at %s with IAM auth for user %s", mysqlConfig.Address, *userFlag)tools/redis-tests/elasticache/deploy-test-env.sh (1)
5-7: Remove or use unused variables (ShellCheck SC2034)VPC_CIDR and SUBNET_CIDR are declared but never used. Either pass them as CloudFormation parameters or remove them to avoid confusion.
Apply one of these diffs.
Option A: remove them
-VPC_CIDR="10.0.0.0/16" -SUBNET_CIDR="10.0.1.0/24"Option B: plumb them into the stack (only if cf-template.yaml supports these parameters)
aws cloudformation create-stack \ --stack-name "$STACK_NAME" \ --template-body file://cf-template.yaml \ - --parameters ParameterKey=KeyName,ParameterValue="$KEY_NAME" ParameterKey=InstanceType,ParameterValue="$INSTANCE_TYPE" \ + --parameters \ + ParameterKey=KeyName,ParameterValue="$KEY_NAME" \ + ParameterKey=InstanceType,ParameterValue="$INSTANCE_TYPE" \ + ParameterKey=VPCCIDR,ParameterValue="$VPC_CIDR" \ + ParameterKey=SubnetCIDR,ParameterValue="$SUBNET_CIDR" \ --capabilities CAPABILITY_NAMED_IAMserver/datastore/redis/redis.go (1)
334-345: Consider logging when IAM auth is used (optional)A debug log can help operators confirm IAM auth is active. Example: “Using Redis IAM auth for (region )”. This is optional since the code already behaves correctly.
server/datastore/redis/gen_aws_region_map.go (2)
7-8: Fix run instructions in the commentThe filename is gen_aws_region_map.go, but the comment says gen_region_map.go.
-// To run: go run gen_region_map.go +// To run: go run gen_aws_region_map.go
143-147: Avoid external gofmt invocation; format using go/format insteadCalling the gofmt binary may fail if it’s not available in PATH. Use go/format to format the buffer before writing it.
Example change:
- // Run gofmt on the generated file - cmd := exec.Command("gofmt", "-w", "aws_region_map_generated.go") - if err := cmd.Run(); err != nil { - log.Printf("Warning: Failed to run gofmt: %v", err) - } + // Optionally format with go/format for robustness + // buf := new(bytes.Buffer) + // if err := tmpl.Execute(buf, entries); err != nil { + // log.Fatalf("Failed to execute template: %v", err) + // } + // formatted, err := format.Source(buf.Bytes()) + // if err != nil { + // log.Printf("Warning: Failed to format generated code: %v", err) + // formatted = buf.Bytes() + // } + // if _, err := outputFile.Write(formatted); err != nil { + // log.Fatalf("Failed to write output file: %v", err) + // } + // returnIf you prefer to keep the current approach, make the gofmt step best-effort and non-fatal (as you already do).
server/datastore/mysql/common_mysql/aws_iam_auth.go (1)
25-50: Avoid context.Background() for AWS config loadingUsing context.Background() prevents cancellation/deadline propagation during STS calls and credential resolution. Prefer accepting a context and passing it through.
-func newAWSIAMAuthTokenGenerator(dbEndpoint, dbUsername, dbPort, region, assumeRoleArn, stsExternalID string) (*awsIAMAuthTokenGenerator, error) { - // Load AWS configuration - cfg, err := aws_common.LoadAWSConfig(context.Background(), region, assumeRoleArn, stsExternalID) +func newAWSIAMAuthTokenGenerator(ctx context.Context, dbEndpoint, dbUsername, dbPort, region, assumeRoleArn, stsExternalID string) (*awsIAMAuthTokenGenerator, error) { + // Load AWS configuration + cfg, err := aws_common.LoadAWSConfig(ctx, region, assumeRoleArn, stsExternalID)Follow-ups: update call sites to pass a context (e.g., from Connect(ctx) or initialization path).
server/datastore/redis/aws_iam_auth.go (2)
110-149: Handle ports and China regions; add tests for endpoint parsingParsing removes the port (good) and locates region via “cache.amazonaws”. Consider explicitly allowing .com.cn suffix (parsing continues to work, but an early validation can improve error messages). Also, rely on a lowercased hostname to avoid case edge cases.
func parseElastiCacheEndpoint(endpoint string) (region, cacheName string, err error) { // Remove port if present hostname := endpoint if idx := strings.LastIndex(hostname, ":"); idx != -1 { hostname = hostname[:idx] } + hostname = strings.ToLower(hostname) // Extract region from the ElastiCache endpoint // Format: name.region.cache.amazonaws.com or name.serverless.region.cache.amazonaws.com parts := strings.Split(hostname, ".") if len(parts) < 4 { return "", "", fmt.Errorf("invalid ElastiCache endpoint format for IAM auth: %s", endpoint) } // Find the region code - it's always before "cache.amazonaws.com" var regionCode string for i := 0; i < len(parts)-2; i++ { if parts[i+1] == "cache" && parts[i+2] == "amazonaws" { regionCode = parts[i] break } }I can add table-driven tests for:
- serverless endpoint
- cluster node names with numeric suffix
- endpoints with ports
- China regions (.com.cn)
Would you like me to open a follow-up PR with tests?
151-204: Name extraction heuristics could false-positive on hyphenated IDsFor replication group IDs that legitimately end with -NNN not representing node suffix, this could chop too much. If this becomes an issue, consider checking the next label for exact replication group match only for cluster-mode patterns; else, do not strip.
Add unit tests demonstrating:
- replication-group-id-202 read as full ID when not followed by same ID in next label
- cluster mode replication-group-id-001.replication-group-id... collapses to replication-group-id
tools/mysql-tests/rds/deploy-test-env.sh (1)
81-91: Be careful storing plaintext DB credentialstest-env-info.txt stores DB_PASSWORD. If this is only for ephemeral testing, that’s okay, but please add a prominent warning and .gitignore protection for this file.
Add test-env-info.txt to .gitignore and print a reminder that it contains secrets.
tools/mysql-tests/rds/cf-template.yaml (2)
406-425: Pinning AMI via SSM is good; consider installing mysql-client instead of mariadb105 on AL2023Not critical, but mysql client may more closely match some user environments. mariadb105 is fine for tests.
426-465: Fix YAML linting issues (trailing spaces, EOF newline)There are trailing spaces at several lines and no newline at EOF per YAMLlint. Please clean these up to reduce noise in diffs.
You can auto-fix with a formatter (e.g., yamlfmt) or an editor setting to trim trailing whitespace and ensure a final newline.
server/aws_common/iam_auth.go (1)
49-51: Optional: avoid predictable jitter without global rand seedingmath/rand without seeding yields a deterministic sequence per process. It’s not a security issue here, but if you want more distributed refresh across pods, consider seeding once or using time-based jitter.
- jitter := time.Duration(rand.Int63n(int64(maxJitter))) //nolint:gosec // jitter doesn't need cryptographic randomness + // Not security-sensitive; seed once in init() or derive from time for better distribution. + jitter := time.Duration(time.Now().UnixNano()%int64(maxJitter)) //nolint:gosec
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
go.sumis excluded by!**/*.sum
📒 Files selected for processing (25)
changes/aws-elasticache-iam-auth(1 hunks)changes/aws-rds-iam-auth(1 hunks)cmd/fleet/serve.go(1 hunks)docs/Configuration/fleet-server-configuration.md(14 hunks)go.mod(4 hunks)server/aws_common/iam_auth.go(1 hunks)server/config/config.go(6 hunks)server/datastore/mysql/common_mysql/aws_iam_auth.go(1 hunks)server/datastore/mysql/common_mysql/aws_iam_auth_test.go(1 hunks)server/datastore/mysql/common_mysql/common.go(5 hunks)server/datastore/redis/aws_iam_auth.go(1 hunks)server/datastore/redis/aws_region_map_generated.go(1 hunks)server/datastore/redis/gen_aws_region_map.go(1 hunks)server/datastore/redis/redis.go(3 hunks)server/datastore/redis/redis_test.go(1 hunks)tools/mysql-tests/rds/README.md(1 hunks)tools/mysql-tests/rds/cf-template.yaml(1 hunks)tools/mysql-tests/rds/cleanup-test-env.sh(1 hunks)tools/mysql-tests/rds/deploy-test-env.sh(1 hunks)tools/mysql-tests/rds/iam_auth.go(1 hunks)tools/redis-tests/elasticache/README.md(1 hunks)tools/redis-tests/elasticache/cf-template.yaml(1 hunks)tools/redis-tests/elasticache/cleanup-test-env.sh(1 hunks)tools/redis-tests/elasticache/deploy-test-env.sh(1 hunks)tools/redis-tests/elasticache/iam_auth.go(1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-08-08T07:40:05.301Z
Learnt from: getvictor
PR: fleetdm/fleet#31726
File: server/datastore/mysql/labels_test.go:2031-2031
Timestamp: 2025-08-08T07:40:05.301Z
Learning: Fleet repo targets Go 1.24.5 (root go.mod), which supports testing.T.Context(). Do not flag usage of t.Context() or suggest replacing it with context.Background() in tests (e.g., server/datastore/mysql/labels_test.go Line 2031 and similar).
Applied to files:
tools/mysql-tests/rds/iam_auth.go
📚 Learning: 2025-08-08T07:40:05.301Z
Learnt from: getvictor
PR: fleetdm/fleet#31726
File: server/datastore/mysql/labels_test.go:2031-2031
Timestamp: 2025-08-08T07:40:05.301Z
Learning: In fleetdm/fleet repository tests (server/datastore/mysql/labels_test.go and similar), using testing.T.Context() is valid because the project targets a recent Go version where testing.T.Context() exists. Do not suggest replacing t.Context() with context.Background() in this codebase.
Applied to files:
tools/mysql-tests/rds/iam_auth.go
🧬 Code Graph Analysis (5)
server/datastore/mysql/common_mysql/aws_iam_auth.go (1)
server/aws_common/iam_auth.go (3)
IAMAuthTokenManager(30-37)LoadAWSConfig(84-100)NewIAMAuthTokenManager(40-44)
tools/redis-tests/elasticache/iam_auth.go (1)
server/datastore/redis/redis.go (2)
NewPool(96-116)PoolConfig(63-92)
tools/mysql-tests/rds/iam_auth.go (2)
server/config/config.go (1)
MysqlConfig(37-55)server/datastore/mysql/common_mysql/common.go (2)
DBOptions(20-30)NewDB(32-117)
server/datastore/redis/aws_iam_auth.go (1)
server/aws_common/iam_auth.go (3)
IAMAuthTokenManager(30-37)LoadAWSConfig(84-100)NewIAMAuthTokenManager(40-44)
server/aws_common/iam_auth.go (1)
server/aws_common/aws_common.go (1)
ConfigureAssumeRoleProvider(19-40)
🪛 Shellcheck (0.10.0)
tools/redis-tests/elasticache/deploy-test-env.sh
[warning] 5-5: VPC_CIDR appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 6-6: SUBNET_CIDR appears unused. Verify use (or export if used externally).
(SC2034)
tools/mysql-tests/rds/deploy-test-env.sh
[warning] 100-100: engine appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 106-106: Quote 'EOF' to make here document expansions happen on the server side rather than on the client.
(SC2087)
🪛 Checkov (3.2.334)
tools/mysql-tests/rds/cf-template.yaml
[HIGH] 288-307: Ensure all data stored in Aurora is securely encrypted at rest
(CKV_AWS_96)
[HIGH] 321-336: Ensure all data stored in Aurora is securely encrypted at rest
(CKV_AWS_96)
🪛 YAMLlint (1.37.1)
tools/mysql-tests/rds/cf-template.yaml
[error] 213-213: trailing spaces
(trailing-spaces)
[error] 214-214: trailing spaces
(trailing-spaces)
[error] 217-217: trailing spaces
(trailing-spaces)
[error] 220-220: trailing spaces
(trailing-spaces)
[error] 223-223: trailing spaces
(trailing-spaces)
[error] 260-260: trailing spaces
(trailing-spaces)
[error] 262-262: trailing spaces
(trailing-spaces)
[error] 265-265: trailing spaces
(trailing-spaces)
[error] 268-268: trailing spaces
(trailing-spaces)
[error] 271-271: trailing spaces
(trailing-spaces)
[error] 375-375: trailing spaces
(trailing-spaces)
[error] 465-465: no new line character at the end of file
(new-line-at-end-of-file)
🔇 Additional comments (29)
go.mod (4)
24-24: Adding aws-sdk-go-v2 RDS auth feature is appropriateThis is the right module for generating IAM auth tokens for RDS/MySQL. It aligns with the rest of the v2 SDK usage in the repo.
65-65: Upgrade to go-sql-driver/mysql v1.9.1 looks goodThis version is compatible and stable. No known breaking changes with common DSN options used by Fleet.
122-122: rdsmysql/v2 addition fits the RDS IAM flowThis is a good choice when pairing with go-sql-driver/mysql and AWS SDK v2. Please ensure it’s not duplicating token generation paths with feature/rds/auth (it usually wraps SDK calls under the hood).
74-74: go-tpm is imported in securehw_linux.go — direct dependency is justifiedI ran the repo search and found direct imports of github.com/google/go-tpm in ee/orbit/pkg/securehw/securehw_linux.go, so keeping the module in go.mod is appropriate.
- ee/orbit/pkg/securehw/securehw_linux.go
- lines 18–20: imports "github.com/google/go-tpm/tpm2", "github.com/google/go-tpm/tpm2/transport", "github.com/google/go-tpm/tpm2/transport/linuxtpm"
- go.mod: line 74 includes github.com/google/go-tpm v0.9.5 (go.sum also contains matching entries)
No change required here.
server/config/config.go (6)
38-55: MySQL config gains STS assume role settings — looks goodThe new fields and yaml tags are consistent with existing AWS config patterns.
64-65: Redis config gains STS assume role settings — looks goodThis mirrors the MySQL settings and other AWS service configs.
1051-1053: Expose mysql.sts_assume_role_arn and mysql.sts_external_id — good additionNaming is consistent and descriptions are clear.
1086-1088: Expose redis.sts_assume_role_arn and redis.sts_external_id — good additionThis matches the config style of other AWS integrations.
1482-1499: Wire STS fields into loaded MySQL config — goodThe loader picks up both fields for primary and read replica configs.
1529-1531: Wire STS fields into loaded Redis config — goodThis enables IAM auth flows to access assume-role details.
tools/redis-tests/elasticache/cf-template.yaml (1)
1-302: CloudFormation template looks well-structured for testing IAM authentication.This template provides a comprehensive test environment for ElastiCache IAM authentication with both serverless and standalone configurations. The IAM roles and policies are properly configured to test different authentication scenarios including direct access and role assumption.
A few observations:
- The security group setup correctly allows Redis traffic between EC2 and ElastiCache
- Both user types (fleet-test-user and test-assume-user) are configured for IAM authentication
- The assume role configuration includes proper trust relationships and conditions
docs/Configuration/fleet-server-configuration.md (5)
26-26: Clear documentation for automatic IAM usage.The documentation clearly explains when IAM authentication is automatically used for MySQL connections, which helps users understand the behavior without needing to configure additional flags.
201-223: Well-documented MySQL STS configuration options.The new MySQL STS configuration options are clearly documented with appropriate examples and descriptions. The pairing guidance for external ID usage is helpful for security best practices.
242-242: Important note about automatic IAM usage for Redis.Based on the past review comments, there was discussion about whether this automatic behavior could be breaking for existing users. The current approach seems reasonable - if an ElastiCache endpoint is specified without a password, IAM auth is used automatically.
However, I noticed from the past comments that there was consideration for making this behavior more explicit. You might want to verify this doesn't break existing deployments where ElastiCache clusters exist without passwords but users don't intend to use IAM auth.
Can you confirm whether there are any managed cloud offerings or existing deployments that might be affected by this automatic IAM usage when redis_password is not specified?
300-323: Comprehensive Redis STS configuration documentation.The Redis STS configuration options are well-documented with clear examples and consistent formatting that matches the MySQL options. This provides users with the flexibility to configure role assumption for ElastiCache authentication.
1403-1428: Consistent STS configuration across all AWS services.The STS assume role and external ID configuration has been consistently added across all AWS services (Firehose, Kinesis, Lambda, SES, S3). The documentation follows the same pattern and provides clear examples for each service.
Also applies to: 1551-1576, 1699-1724, 2038-2061, 2146-2169
changes/aws-rds-iam-auth (1)
1-1: Clear and concise changelog entry.The changelog entry accurately describes the new RDS IAM authentication support and mentions the key configuration option for role assumption.
cmd/fleet/serve.go (1)
269-270: Redis pool configuration updated with STS fields.The addition of StsAssumeRoleArn and StsExternalID fields to the Redis pool configuration properly wires the new IAM authentication options from the config into the Redis connection setup.
changes/aws-elasticache-iam-auth (1)
1-1: Clear changelog entry for ElastiCache IAM support.The changelog entry accurately describes the new ElastiCache IAM authentication feature and mentions the STS role assumption capability.
server/datastore/mysql/common_mysql/aws_iam_auth_test.go (1)
9-121: LGTM: solid coverage across RDS, Aurora, proxy, China, and GovCloud.Nice job covering common and tricky endpoint shapes, plus strict error assertions.
server/datastore/redis/redis_test.go (1)
115-250: LGTM: practical endpoint coverage with serverless, standalone, cluster, and hyphenated names.The table tests map short region codes to full regions well and cover port/no-port forms.
server/datastore/redis/aws_region_map_generated.go (1)
5-40: Generated region map looks goodThe mapping covers current regions and follows the intended zone-prefix-to-region approach. Keeping this unexported inside the package is appropriate.
server/datastore/mysql/common_mysql/common.go (2)
136-152: Good TLS and cleartext handling for IAM/RDSSetting tls=rdsmysql for direct RDS and tls=true for RDS Proxy, plus allowCleartextPasswords=true, matches AWS guidance for IAM auth with go-sql-driver/mysql.
142-149: No change needed — isRDSProxyEndpoint accepts host:portI verified the detector: isRDSProxyEndpoint checks substrings in the full address, so passing conf.Address (which may include :port) still works. The suggested change is unnecessary.
- Location: server/datastore/mysql/common_mysql/aws_iam_auth.go (lines ~137–141)
- Function:
// isRDSProxyEndpoint checks if the given address is an RDS Proxy endpoint func isRDSProxyEndpoint(address string) bool { // RDS Proxy endpoints have the format: proxy-name.proxy-xxxxxxxxx.region.rds.amazonaws.com return strings.Contains(address, ".proxy-") && strings.Contains(address, ".rds.amazonaws.") }Action: ignore the proposed diff; no code change required.
Likely an incorrect or invalid review comment.
server/datastore/redis/redis.go (1)
278-296: Solid IAM detection and TLS enforcement for ElastiCacheAuto-enabling TLS for IAM auth and generating the token at dial time is correct. Appending redis.DialPassword(token) with redis.DialUsername set earlier ensures AUTH happens as expected.
server/datastore/mysql/common_mysql/aws_iam_auth.go (2)
58-66: LGTM on token generationUsing rds/auth.BuildAuthToken with host:port, region, username, and credentials is correct. Error wrapping is clear.
137-141: Proxy endpoint detection looks goodThe pattern check for .proxy- and .rds.amazonaws. is a pragmatic and safe detection.
server/datastore/redis/aws_iam_auth.go (1)
62-92: Presigning flow for ElastiCache IAM auth looks correctYou’re building a GET with Action=connect and User=..., setting X-Amz-Expires=900, and presigning with service “elasticache”. Trimming the https:// prefix matches AWS guidance.
server/aws_common/iam_auth.go (1)
46-81: Token caching with jitter looks solidThe double-checked lock with RWMutex is correct, and the 10m+jitter refresh window is appropriate for 15m tokens.
| // extractRDSRegion extracts the AWS region from an RDS endpoint | ||
| func extractRDSRegion(endpoint string) (string, error) { | ||
| // RDS endpoint formats: | ||
| // - instance-name.abcdefg.region.rds.amazonaws.com (regular RDS) | ||
| // - cluster-name.cluster-abcdefg.region.rds.amazonaws.com (Aurora cluster endpoint) | ||
| // - cluster-name.cluster-ro-abcdefg.region.rds.amazonaws.com (Aurora read-only endpoint) | ||
| // - proxy-name.proxy-abcdefg.region.rds.amazonaws.com (RDS Proxy endpoint) | ||
| // - instance-name.abcdefg.region.rds.amazonaws.com.cn (China regions) | ||
|
|
||
| parts := strings.Split(endpoint, ".") | ||
| if len(parts) < 5 { | ||
| return "", fmt.Errorf("invalid RDS endpoint format for IAM auth: %s", endpoint) | ||
| } | ||
|
|
||
| // Check if it's an RDS endpoint | ||
| if !strings.Contains(endpoint, ".rds.amazonaws.com") { | ||
| return "", fmt.Errorf("endpoint does not appear to be an RDS endpoint: %s", endpoint) | ||
| } | ||
|
|
||
| // Find the region - it's always before "rds.amazonaws.com" | ||
| for i := 0; i < len(parts)-3; i++ { | ||
| if parts[i+1] == "rds" && parts[i+2] == "amazonaws" && (parts[i+3] == "com" || parts[i+3] == "com.cn") { | ||
| return parts[i], nil | ||
| } | ||
| } | ||
|
|
||
| return "", fmt.Errorf("could not extract region from RDS endpoint: %s", endpoint) | ||
| } |
There was a problem hiding this comment.
💡 Verification agent
🧩 Analysis chain
Harden region parsing; strip port if present and validate suffixes
If a host:port is passed, region parsing will fail. Also, you only validate against .com/.com.cn later but not before stripping the port.
func extractRDSRegion(endpoint string) (string, error) {
+ // Remove port if present
+ if idx := strings.LastIndex(endpoint, ":"); idx != -1 {
+ endpoint = endpoint[:idx]
+ }
// RDS endpoint formats:
// - instance-name.abcdefg.region.rds.amazonaws.com (regular RDS)
// - cluster-name.cluster-abcdefg.region.rds.amazonaws.com (Aurora cluster endpoint)
// - cluster-name.cluster-ro-abcdefg.region.rds.amazonaws.com (Aurora read-only endpoint)
// - proxy-name.proxy-abcdefg.region.rds.amazonaws.com (RDS Proxy endpoint)
// - instance-name.abcdefg.region.rds.amazonaws.com.cn (China regions)
parts := strings.Split(endpoint, ".")
if len(parts) < 5 {
return "", fmt.Errorf("invalid RDS endpoint format for IAM auth: %s", endpoint)
}
// Check if it's an RDS endpoint
- if !strings.Contains(endpoint, ".rds.amazonaws.com") {
+ if !(strings.Contains(endpoint, ".rds.amazonaws.com") || strings.Contains(endpoint, ".rds.amazonaws.com.cn")) {
return "", fmt.Errorf("endpoint does not appear to be an RDS endpoint: %s", endpoint)
}
// Find the region - it's always before "rds.amazonaws.com"
for i := 0; i < len(parts)-3; i++ {
- if parts[i+1] == "rds" && parts[i+2] == "amazonaws" && (parts[i+3] == "com" || parts[i+3] == "com.cn") {
+ if parts[i+1] == "rds" && parts[i+2] == "amazonaws" && (parts[i+3] == "com" || parts[i+3] == "com.cn") {
return parts[i], nil
}
}To confirm coverage, consider adding tests with host:port and .com.cn variants.
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Find the file containing extractRDSRegion
rg -n "func extractRDSRegion" || true
# Print the file if it exists
FILE=$(rg -n --files-with-match "func extractRDSRegion" || true)
if [ -n "$FILE" ]; then
echo "---- $FILE ----"
sed -n '1,240p' "$FILE"
else
echo "extractRDSRegion not found in repository."
fiLength of output: 797
🏁 Script executed:
#!/bin/bash
set -euo pipefail
FILE="server/datastore/mysql/common_mysql/aws_iam_auth.go"
if [ -f "$FILE" ]; then
echo "---- $FILE (lines 1-240) ----"
sed -n '1,240p' "$FILE"
else
echo "file not found: $FILE"
exit 1
fiLength of output: 4984
Harden region parsing; strip port and accept .com.cn
If a host:port is passed the current parsing can fail (the port ends up in the last label). Also the function only checks for ".rds.amazonaws.com" up front, rejecting China endpoints even though the loop accepts "com.cn".
- Files to change
- server/datastore/mysql/common_mysql/aws_iam_auth.go — extractRDSRegion
Suggested patch (keep it small and safe):
func extractRDSRegion(endpoint string) (string, error) {
+ // Remove port if present
+ if idx := strings.LastIndex(endpoint, ":"); idx != -1 {
+ endpoint = endpoint[:idx]
+ }
// RDS endpoint formats:
// - instance-name.abcdefg.region.rds.amazonaws.com (regular RDS)
// - cluster-name.cluster-abcdefg.region.rds.amazonaws.com (Aurora cluster endpoint)
// - cluster-name.cluster-ro-abcdefg.region.rds.amazonaws.com (Aurora read-only endpoint)
// - proxy-name.proxy-abcdefg.region.rds.amazonaws.com (RDS Proxy endpoint)
// - instance-name.abcdefg.region.rds.amazonaws.com.cn (China regions)
parts := strings.Split(endpoint, ".")
if len(parts) < 5 {
return "", fmt.Errorf("invalid RDS endpoint format for IAM auth: %s", endpoint)
}
// Check if it's an RDS endpoint
- if !strings.Contains(endpoint, ".rds.amazonaws.com") {
+ if !(strings.Contains(endpoint, ".rds.amazonaws.com") || strings.Contains(endpoint, ".rds.amazonaws.com.cn")) {
return "", fmt.Errorf("endpoint does not appear to be an RDS endpoint: %s", endpoint)
}Please add unit tests covering:
- host:port endpoints (e.g. example.region.rds.amazonaws.com:3306)
- China regions (.rds.amazonaws.com.cn)
- RDS Proxy / cluster endpoint shapes
Thanks — this will make IAM token generation robust for ports and China regions.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // extractRDSRegion extracts the AWS region from an RDS endpoint | |
| func extractRDSRegion(endpoint string) (string, error) { | |
| // RDS endpoint formats: | |
| // - instance-name.abcdefg.region.rds.amazonaws.com (regular RDS) | |
| // - cluster-name.cluster-abcdefg.region.rds.amazonaws.com (Aurora cluster endpoint) | |
| // - cluster-name.cluster-ro-abcdefg.region.rds.amazonaws.com (Aurora read-only endpoint) | |
| // - proxy-name.proxy-abcdefg.region.rds.amazonaws.com (RDS Proxy endpoint) | |
| // - instance-name.abcdefg.region.rds.amazonaws.com.cn (China regions) | |
| parts := strings.Split(endpoint, ".") | |
| if len(parts) < 5 { | |
| return "", fmt.Errorf("invalid RDS endpoint format for IAM auth: %s", endpoint) | |
| } | |
| // Check if it's an RDS endpoint | |
| if !strings.Contains(endpoint, ".rds.amazonaws.com") { | |
| return "", fmt.Errorf("endpoint does not appear to be an RDS endpoint: %s", endpoint) | |
| } | |
| // Find the region - it's always before "rds.amazonaws.com" | |
| for i := 0; i < len(parts)-3; i++ { | |
| if parts[i+1] == "rds" && parts[i+2] == "amazonaws" && (parts[i+3] == "com" || parts[i+3] == "com.cn") { | |
| return parts[i], nil | |
| } | |
| } | |
| return "", fmt.Errorf("could not extract region from RDS endpoint: %s", endpoint) | |
| } | |
| // extractRDSRegion extracts the AWS region from an RDS endpoint | |
| func extractRDSRegion(endpoint string) (string, error) { | |
| // Remove port if present | |
| if idx := strings.LastIndex(endpoint, ":"); idx != -1 { | |
| endpoint = endpoint[:idx] | |
| } | |
| // RDS endpoint formats: | |
| // - instance-name.abcdefg.region.rds.amazonaws.com (regular RDS) | |
| // - cluster-name.cluster-abcdefg.region.rds.amazonaws.com (Aurora cluster endpoint) | |
| // - cluster-name.cluster-ro-abcdefg.region.rds.amazonaws.com (Aurora read-only endpoint) | |
| // - proxy-name.proxy-abcdefg.region.rds.amazonaws.com (RDS Proxy endpoint) | |
| // - instance-name.abcdefg.region.rds.amazonaws.com.cn (China regions) | |
| parts := strings.Split(endpoint, ".") | |
| if len(parts) < 5 { | |
| return "", fmt.Errorf("invalid RDS endpoint format for IAM auth: %s", endpoint) | |
| } | |
| // Check if it's an RDS endpoint | |
| if !(strings.Contains(endpoint, ".rds.amazonaws.com") || strings.Contains(endpoint, ".rds.amazonaws.com.cn")) { | |
| return "", fmt.Errorf("endpoint does not appear to be an RDS endpoint: %s", endpoint) | |
| } | |
| // Find the region - it's always before "rds.amazonaws.com" | |
| for i := 0; i < len(parts)-3; i++ { | |
| if parts[i+1] == "rds" && parts[i+2] == "amazonaws" && (parts[i+3] == "com" || parts[i+3] == "com.cn") { | |
| return parts[i], nil | |
| } | |
| } | |
| return "", fmt.Errorf("could not extract region from RDS endpoint: %s", endpoint) | |
| } |
🤖 Prompt for AI Agents
In server/datastore/mysql/common_mysql/aws_iam_auth.go around lines 73-100, the
extractRDSRegion function fails for host:port inputs and currently rejects China
endpoints up front; update it to first strip any ":port" (use net.SplitHostPort
or if that fails, trim at last ':' when there are no IPv6 brackets) so parsing
works with host:port, broaden the upfront endpoint check to accept both
".rds.amazonaws.com" and ".rds.amazonaws.com.cn" (or check labels instead of a
single substring), then split the cleaned host by '.' and locate the region
label immediately before the "rds","amazonaws","com" or
"rds","amazonaws","com","cn" sequence as the loop already does; add unit tests
for host:port (example.region.rds.amazonaws.com:3306), China endpoints
(example.region.rds.amazonaws.com.cn), and proxy/cluster shapes to verify the
region extraction.
| var db *sqlx.DB | ||
| if useIAMAuth { | ||
| connector := &awsIAMAuthConnector{ | ||
| driverName: driverName, | ||
| baseDSN: generateMysqlConnectionString(*conf), | ||
| tokenGen: iamTokenGen, | ||
| logger: opts.Logger, | ||
| } | ||
| db = sqlx.NewDb(sql.OpenDB(connector), driverName) | ||
| } else { |
There was a problem hiding this comment.
💡 Verification agent
🧩 Analysis chain
IAM connector likely bypasses sqlmw/APM/OTEL driver instrumentation
Switching to sqlx.NewDb(sql.OpenDB(connector), driverName) means the registered wrapper drivers (mysql-mw for sqlmw, apm/mysql, or the OpenTelemetry driver) may not wrap connections created by awsIAMAuthConnector. Passing driverName to sqlx.NewDb only affects sqlx’s dialect selection, not runtime instrumentation.
Risk:
- Loss of DB query traces/metrics/logging on the IAM path.
- sqlmw interceptors won’t run.
Paths to fix:
- Ensure awsIAMAuthConnector internally builds a driver.Connector using the wrapped driver (e.g., the otel/apm/sqlmw registered one), not the bare mysql driver.
- If you use otelsql, prefer using its connector-aware APIs (e.g., OpenDB/NewConnector) to wrap your connector.
- Alternatively, refactor awsIAMAuthConnector to delegate to the wrapped driver name via OpenConnector on the registered driver.
I can help adapt awsIAMAuthConnector once I see its implementation.
Run this quick scan to assess whether wrapped drivers are still used on the IAM path:
🏁 Script executed:
#!/bin/bash
# Look for otel/apm/sqlmw registration and usage patterns
rg -n "otelsql|apm/mysql|sqlmw\.Driver|Register\(" -A 2
# Inspect the IAM connector implementation
fd aws_iam_auth\.go | xargs -I {} sh -c 'echo "--- {} ---"; sed -n "1,200p" "{}"'Length of output: 18296
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "----- server/datastore/mysql/mysql.go -----"
sed -n '1,420p' server/datastore/mysql/mysql.go || true
echo
echo "----- server/datastore/mysql/common_mysql/common.go -----"
sed -n '1,240p' server/datastore/mysql/common_mysql/common.go || trueLength of output: 20131
🌐 Web query:
XSAM/otelsql NewConnector otelsql.NewConnector
otelsql NewConnector driver.Connector
otelsql OpenDB NewConnector
ngrok sqlmw connector OpenConnector NewConnector
sqlmw NewConnector
github apm/mysql connector OpenConnector
database/sql sql.OpenDB driver.Connector behavior
sqlx.NewDb driverName effect instrumentation
💡 Result:
otelsql.NewConnector (from XSAM/otelsql) creates a new OpenTelemetry-instrumented database connector that implements the Go driver.Connector interface, allowing its use with otelsql.OpenDB just as standard connectors are used with database/sql.OpenDB[3][9][1]. This enables tracing and observability for any Go SQL driver that is instrumented using otelsql. The patterns for connector support, wrapping, and general driver.Connector usage are as follows:
Instrumentation Flow for otelsql and Connectors
- otelsql.NewConnector returns a
driver.Connector-compatible object that wraps an underlying DSN and driver with OpenTelemetry instrumentation. This is similar to how standard Go SQL driver connectors behave, but traces DB operations[3][9]. - otelsql.OpenDB(connector) returns an instrumented
*sql.DBobject using the provided OpenTelemetry-wrapped connector[9][3]. - otelsql.Open(driverName, dsn, ...opts) is a convenience for direct OpenTelemetry-instrumented DB access. This mirrors the behavior of
sql.Openbut with otelsql tracing, and does not require manual connector creation[7][9].
How driver.Connector, sql.OpenDB, and Wrappers Interact
- driver.Connector: The Go interface for advanced connection pooling and configuration reuse. It allows DSN parsing only once, and enables (but does not require) context support for dialing[1].
- sql.OpenDB(connector): This Go function establishes an
*sql.DBinstance from adriver.Connector, rather than from a driver name and DSN string[1]. - All instrumented database wrappers (e.g., otelsql, sqlmw, APM/otel, etc.) that wish to fully support OpenTelemetry, interception, or other middleware need to implement, or wrap, this interface.
Middleware Example: ngrok/sqlmw
- ngrok/sqlmw provides middleware/wrapping for Go SQL drivers, but does not expose a dedicated
NewConnectorhelper. Instead, you wrap the base driver and register it under a new name (e.g. "postgres-mw") withsql.Register. Instrumentation/interception is handled at the driver level, not via a connector creation helper[4][6]. - With sqlmw, you use
sql.Open(notOpenDB(connector)) because the driver is already wrapped and registered under a custom name[4].
APM/Mysql Connector Pattern
- Many APM SQL wrappers (e.g., OpenTelemetry, Elastic APM) also provide either a custom
OpenConnectorimplementation or expose their own connector-creation helper. Usage pattern:- Create a connector via e.g.
apmmysql.NewConnector(...) - Pass this connector to standard
sql.OpenDB(connector)for maximum composition and flexibility.
- Create a connector via e.g.
Key behaviors and recommendations
| Approach | Connector Support | Usage | OpenTelemetry Instrumented? | Notes |
|---|---|---|---|---|
| otelsql.OpenDB | Yes, via Connector | db := otelsql.OpenDB(connector) |
Yes | Connector is OTel-instrumented and composable. |
| otelsql.Open | Yes, via DSN | db := otelsql.Open(driver, dsn, ...) |
Yes | Easiest, but less composable than OpenDB. |
| sqlmw.Driver | No Connector API | Register wrapped driver; sql.Open |
Interceptor/middleware | Used for function-level interception. |
| apmmysql.OpenDB/Connector | Yes | db := sql.OpenDB(connector) |
Yes | For monitoring/metrics, follows same pattern. |
| database/sql | Yes | db := sql.OpenDB(connector) |
No | Standard library without telemetry. |
| sqlx.NewDb | Yes, but pass driver name and *sql.DB | sqlx.NewDb(db, driverName) |
Follows underlying *sql.DB | Use after wrapping or instrumenting. |
Special Notes on sqlx.NewDb
sqlx.NewDb(db *sql.DB, driverName string)does not itself perform instrumentation—instead, it adapts an already-opened, possibly instrumented,*sql.DBfor use with sqlx extensions.- The driverName is only for sqlx's internal metadata/tracing; the actual DB behavior (instrumented or not) depends on how you created the
*sql.DB(i.e., viaotelsql.OpenDB,otelsql.Open,sqlmw, etc.)[1][9].
Summary: To instrument SQL drivers with tracing and metrics using otelsql or similar libraries, use their connector construction (e.g., otelsql.NewConnector) or driver wrapping helper, create a connector or registered driver, and use either OpenDB(connector) or Open(driver, dsn) as appropriate. Avoid using raw driver connections if you want consistent instrumentation and metrics across all operations[3][9][7][4].
Citations:
- 1: https://golang.bg/pkg/database/sql/driver/
- 2: https://www.juniper.net/us/en/threatlabs/ips-signatures/detail.HTTP:SQL:INJ:DLNK-WIFI-MNGR-CMW.html
- 3: https://pkg.go.dev/github.com/XSAM/otelsql
- 4: https://github.com/ngrok/sqlmw
- 5: https://opentelemetry.io/docs/collector/building/connector/
- 6: https://groups.google.com/g/golang-nuts/c/Vkvf-Rrl0aQ
- 7: https://opentelemetry.io/blog/2024/getting-started-with-otelsql/
- 8: https://seatunnel.apache.org/docs/connector-v2/sink/Jdbc/
- 9: https://pkg.go.dev/github.com/uptrace/opentelemetry-go-extra/otelsql
- 10: https://github.com/denisenkom/go-mssqldb/blob/master/newconnector_example_test.go
Fix IAM connector: it currently bypasses otel/apm/sqlmw instrumentation
I verified the code paths. The IAM connector builds and returns the raw mysql connector/driver, so registered wrappers (otelsql/apm/mysql/sqlmw) are skipped on the IAM path — traces and sqlmw interceptors will be lost.
Files to fix
- server/datastore/mysql/common_mysql/common.go — NewDB: when useIAMAuth is true you construct awsIAMAuthConnector and call sqlx.NewDb(sql.OpenDB(connector), driverName).
- server/datastore/mysql/common_mysql/aws_iam_auth.go — awsIAMAuthConnector.Connect uses mysql.NewConnector(cfg) and Driver() returns mysql.MySQLDriver{}.
Why this matters (very short)
- sql.OpenDB(connector) uses the connector.Driver() / connector.Connect() implementations you provide. awsIAMAuthConnector currently constructs a connector using the raw mysql driver, so instrumentation registered under driverName (otelsql/apm/mysql/mysql-mw) is not used.
Suggested fixes (pick one)
- Preferred: build the base mysql connector (mysql.NewConnector(cfg)) and wrap that connector with the instrumentation's connector API before returning Connect. For example (pseudo):
- base := mysql.NewConnector(cfg)
- if driverName == otelTracedDriverName { conn := otelsql.NewConnector(base); return conn.Connect(ctx) }
- else if driverName == "apm/mysql" { conn := apmmysql.NewConnector(base); ... }
- fallback: return base.Connect(ctx)
See otelsql docs: https://pkg.go.dev/github.com/XSAM/otelsql and database/sql OpenDB: https://pkg.go.dev/database/sql#OpenDB
- Alternative: when creating the awsIAMAuthConnector in NewDB, pass an actual instrumented driver/connector (or a factory) instead of just driverName, and delegate Connect()/Driver() to that injected connector so the wrapper is preserved.
- If using sqlmw (which doesn't provide a connector helper), ensure the wrapped driver is registered and create a connector that delegates to that wrapped driver (or move token injection into a Dialer that the wrapped driver will use).
Minimal pseudo-diff (problem area)
Original (current):
connector := &awsIAMAuthConnector{ driverName: driverName, baseDSN: generateMysqlConnectionString(*conf), ... }
db = sqlx.NewDb(sql.OpenDB(connector), driverName)
Suggested approach (pseudo):
cfg := mysql.ParseDSN(...)
base := mysql.NewConnector(cfg)
var conn driver.Connector = base
if driverName == otelTracedDriverName {
conn = otelsql.NewConnector(base) // wrap the base connector
}
db = sqlx.NewDb(sql.OpenDB(conn), driverName)
I can prepare a concrete patch for otelsql / apm / sqlmw cases if you want — the fix is small but needs to use the wrapper's connector APIs or inject the wrapped connector at creation time.
There was a problem hiding this comment.
Interesting... how does sqlx.Open() handle this? Surely it's not checking the driver name against all possible instrumentation wrappers.
There was a problem hiding this comment.
🧩 Analysis chain
@sgress454 Great question! You're absolutely right - sqlx.Open() doesn't check against all possible instrumentation wrappers. Here's how it actually works:
How sqlx.Open() handles instrumentation
sqlx.Open(driverName, dsn) internally calls sql.Open(driverName, dsn), which:
- Looks up the driver by name in Go's global driver registry (
sql.drivers) - Uses whatever driver was registered under that name
The instrumentation libraries register their wrapped drivers at startup:
// sqlmw approach
sql.Register("mysql-mw", sqlmw.Driver(mysql.MySQLDriver{}, interceptor))
// otel approach
sql.Register("mysql-otel", otelsql.Wrap(mysql.MySQLDriver{}))So when you call sqlx.Open("mysql-mw", dsn), it automatically gets the wrapped driver.
The problem with sql.OpenDB(connector)
sql.OpenDB(connector) bypasses the driver registry entirely. It directly uses:
connector.Driver()to get the driver instanceconnector.Connect(ctx)to create connections
The current IAM connector returns the raw MySQL driver:
func (c *awsIAMAuthConnector) Driver() driver.Driver {
return mysql.MySQLDriver{} // Raw driver, no instrumentation!
}The solution is to make the IAM connector delegate to the already-registered wrapped driver, or wrap the base connector with the instrumentation's connector API before using sql.OpenDB().
🏁 Script executed:
#!/bin/bash
# Let's look at how the instrumentation is set up in NewDB
rg -n -A 10 -B 5 "Register.*sqlmw|apm/mysql|otelDriverName" server/datastore/mysql/common_mysql/common.go
Length of output: 1201
Perfect! Now I can see the exact issue. Looking at the code:
The Problem
Non-IAM path (works correctly):
// Line 70: sqlmw registers wrapped driver
sql.Register(driverName, sqlmw.Driver(mysql.MySQLDriver{}, opts.Interceptor))
// Lines 86-91: Uses the registered wrapped driver
db, err = sqlx.Open(driverName, dsn) // ✅ Gets instrumented driver from registryIAM path (bypasses instrumentation):
// Lines 77-85: Connector bypasses registry entirely
connector := &awsIAMAuthConnector{
driverName: driverName, // Just stored, not used for driver lookup
// ...
}
db = sqlx.NewDb(sql.OpenDB(connector), driverName) // ❌ Uses connector.Driver()The Fix
The IAM connector should delegate to the registered driver instead of returning the raw MySQL driver. Here's the approach:
The solution is to modify the IAM connector to look up and use the registered instrumented driver, just like sqlx.Open() does internally.
🏁 Script executed:
#!/bin/bash
# Let's look at the current IAM connector implementation to see how it gets the driver
rg -n -A 20 "func.*Driver\(\)|Connect\(" server/datastore/mysql/common_mysql/aws_iam_auth.go
Length of output: 1170
Perfect! Now I can see exactly how to fix this. The issue is clear:
Current IAM Connector (Problem)
// Line 134: Returns raw MySQL driver, bypassing instrumentation
func (c *awsIAMAuthConnector) Driver() driver.Driver {
return mysql.MySQLDriver{} // ❌ Raw driver, no instrumentation
}
// Line 124-129: Creates raw MySQL connector
connector, err := mysql.NewConnector(cfg) // ❌ Raw connector
return connector.Connect(ctx)The Fix
The IAM connector should use the registered instrumented driver instead of the raw MySQL driver. Here's the solution:
Option 1: Pass the registered driver to the IAM connector
// In NewDB(), get the registered driver first
var registeredDriver driver.Driver
if opts.Interceptor != nil {
registeredDriver = sqlmw.Driver(mysql.MySQLDriver{}, opts.Interceptor)
} else {
// For otel/apm, we'd need to look them up from the registry
// This is more complex since Go doesn't expose sql.Drivers()
}
if useIAMAuth {
connector := &awsIAMAuthConnector{
driverName: driverName,
baseDSN: generateMysqlConnectionString(*conf),
tokenGen: iamTokenGen,
logger: opts.Logger,
instrumentedDriver: registeredDriver, // Pass the wrapped driver
}
db = sqlx.NewDb(sql.OpenDB(connector), driverName)
}Option 2 (Simpler): Let the IAM connector create instrumented connectors
Modify the IAM connector to handle instrumentation internally:
func (c *awsIAMAuthConnector) Connect(ctx context.Context) (driver.Conn, error) {
token, err := c.tokenGen.getAuthToken(ctx)
if err != nil {
return nil, fmt.Errorf("failed to generate IAM auth token: %w", err)
}
cfg, err := mysql.ParseDSN(c.baseDSN)
if err != nil {
return nil, fmt.Errorf("failed to parse DSN: %w", err)
}
cfg.Passwd = token
// Create base connector
baseConnector, err := mysql.NewConnector(cfg)
if err != nil {
return nil, fmt.Errorf("failed to create connector: %w", err)
}
// Wrap with instrumentation based on driverName
var finalConnector driver.Connector = baseConnector
switch c.driverName {
case "mysql-mw":
// For sqlmw, we need the interceptor - would need to pass it in
// This is tricky since sqlmw doesn't provide a connector wrapper
default:
// Handle otel/apm cases
if strings.Contains(c.driverName, "otel") {
// finalConnector = otelsql.NewConnector(baseConnector)
}
}
return finalConnector.Connect(ctx)
}The cleanest approach is Option 1 where we pass the already-registered instrumented driver to the IAM connector and have it delegate appropriately. This maintains the same instrumentation path for both IAM and non-IAM connections.
Would you like me to create a concrete patch showing exactly how to implement this?
🧬 Code Graph Analysis Results
- server/datastore/mysql/common_mysql/common.go:61-74
if opts.TracingConfig != nil && opts.TracingConfig.TracingEnabled {
if opts.TracingConfig.TracingType == "opentelemetry" {
driverName = otelDriverName
} else {
driverName = "apm/mysql"
}
}
if opts.Interceptor != nil {
driverName = "mysql-mw"
sql.Register(driverName, sqlmw.Driver(mysql.MySQLDriver{}, opts.Interceptor))
}
- server/datastore/mysql/common_mysql/common.go:77-85
if useIAMAuth {
connector := &awsIAMAuthConnector{
driverName: driverName,
baseDSN: generateMysqlConnectionString(*conf),
tokenGen: iamTokenGen,
logger: opts.Logger,
}
db = sqlx.NewDb(sql.OpenDB(connector), driverName)
} else {
- server/datastore/mysql/common_mysql/common.go:86-91
dsn := generateMysqlConnectionString(*conf)
db, err = sqlx.Open(driverName, dsn)
if err != nil {
return nil, err
}
}
- server/config/config.go:37-55
MysqlConfig struct {
Protocol string `yaml:"protocol"`
Address string `yaml:"address"`
Username string `yaml:"username"`
Password string `yaml:"password"`
PasswordPath string `yaml:"password_path"`
Database string `yaml:"database"`
TLSCert string `yaml:"tls_cert"`
TLSKey string `yaml:"tls_key"`
TLSCA string `yaml:"tls_ca"`
TLSServerName string `yaml:"tls_server_name"`
TLSConfig string `yaml:"tls_config"` // tls=customValue in DSN
MaxOpenConns int `yaml:"max_open_conns"`
MaxIdleConns int `yaml:"max_idle_conns"`
ConnMaxLifetime int `yaml:"conn_max_lifetime"`
SQLMode string `yaml:"sql_mode"`
StsAssumeRoleArn string `yaml:"sts_assume_role_arn"`
StsExternalID string `yaml:"sts_external_id"`
}
- server/config/config.go:252-260
LoggingConfig struct {
Debug bool
JSON bool
DisableBanner bool `yaml:"disable_banner"`
ErrorRetentionPeriod time.Duration `yaml:"error_retention_period"`
TracingEnabled bool `yaml:"tracing_enabled"`
// TracingType can either be opentelemetry or elasticapm for whichever type of tracing wanted
TracingType string `yaml:"tracing_type"`
}
🧠 Learnings used
Learnt from: sgress454
PR: fleetdm/fleet#31075
File: server/datastore/mysql/common_mysql/aws_iam_auth.go:111-130
Timestamp: 2025-08-20T21:24:59.223Z
Learning: In Fleet's MySQL IAM authentication implementation, the TLS and AllowCleartextPasswords configuration is handled in the generateMysqlConnectionString function in server/datastore/mysql/common_mysql/common.go, not in the awsIAMAuthConnector.Connect method. The connector receives a base DSN that already includes the necessary IAM authentication parameters.
Learnt from: sgress454
PR: fleetdm/fleet#31075
File: server/datastore/mysql/common_mysql/aws_iam_auth.go:111-130
Timestamp: 2025-08-20T21:24:59.224Z
Learning: In Fleet's MySQL IAM authentication implementation, the TLS and AllowCleartextPasswords configuration is handled in the generateMysqlConnectionString function in server/datastore/mysql/common_mysql/common.go. When no password is configured and the endpoint is an RDS endpoint, the function automatically sets allowCleartextPasswords=true and configures appropriate TLS settings. The awsIAMAuthConnector receives a properly configured base DSN and only needs to inject the IAM token as the password.
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
|
|
||
| Amazon Resource Name (ARN) of the AWS Security Token Service (STS) role to use for Redis authentication. | ||
|
|
||
| If set, Fleet uses AWS Identity and Access Management (IAM) authentication instead of basic authentication set by `redis_username` and `redis_password` (optional). |
There was a problem hiding this comment.
@noahtalerman this is subtly wrong, as mentioned in the other thread this config option can't be the flag we use because there are scenarios where there is no role to assume (the default is environmental credentials (eg. from the EC2 metadata service), assuming a role (this flag) uses the role/user those credentials are associated with to call sts:AssumeRole, and then uses that second set of chained credentials)
|
@titanous thanks for the awesome work on this. In the interest of moving it forward, we're going to merge this to a feature branch on the Fleet repo, and continue from there. That'll get it prioritized to be put into the next release. It'll still have all your commits in there and we'll be sure to give you a shoutout at the demo! |
|
@sgress454 Sounds great, and feel free to tag me when you make a draft and/or final PR if you're interested in feedback/review. |
|
Also FWIW, we've been running this patch since I submitted it and everything has been working with no issues. |
|
Ok -- moved to https://github.com/fleetdm/fleet/tree/feat-1817-add-iam-auth-to-mysql-and-redis.
That's great to hear!
Will do. I don't foresee many functional changes. |
for #1817 # Details This PR gives Fleet servers the ability to connect to RDS MySQL and Elasticache Redis via AWS [Identity and Access Management (IAM)](https://aws.amazon.com/iam/). It is based almost entirely on the work of @titanous, branched from his [original pull request](#31075). The main differences between his branch and this are: 1. Removal of auto-detection of AWS region (and cache name for Elasticache) in favor of specifying these values in configuration. The auto-detection is admittedly handy but parsing AWS host URLs is not considered a best practice. 2. Relying on the existence of these new configs to determine whether or not to connect via IAM. This sidesteps a thorny issue of whether to try an IAM-based Elasticache connection when a password is not supplied, since this is technically a valid setup. # Checklist for submitter If some of the following don't apply, delete the relevant line. - [X] Changes file added for user-visible changes in `changes/`, `orbit/changes/` or `ee/fleetd-chrome/changes`. See [Changes files](https://github.com/fleetdm/fleet/blob/main/docs/Contributing/guides/committing-changes.md#changes-files) for more information. ## Testing - [X] Added/updated automated tests - [X] QA'd all new/changed functionality manually - besides using @titanous's excellent test tool, I verified the following end-to-end: - [X] regular (non RDS) MySQL connection - [X] RDS MySQL connection using username/password - [X] RDS MySQL connection using IAM (no role) - [X] RDS MySQL connection using IAM (assuming role) - [X] regular (non Elasticache) Redis connection - [X] Elasticache Redis connection using username/password - [X] Elasticache Redis connection using NO password (without IAM) - [X] Elasticache Redis connection using IAM (no role) - [X] Elasticache Redis connection using IAM (assuming role) --------- Co-authored-by: Jonathan Rudenberg <jonathan@titanous.com> Co-authored-by: Noah Talerman <47070608+noahtalerman@users.noreply.github.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Adds support for reading server `private_key` from AWS Secrets Manager. Combined with #31075, this should allow removing all common sensitive secrets from the environment/config (if I missed any let me know). This works with localstack for local development (set `AWS_ENDPOINT_URL=$LOCALSTACK_URL`, `AWS_ACCESS_KEY_ID=test`, and `AWS_SECRET_ACCESS_KEY=test`). I did not include config options for `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` because they are a bad practice vs role credentials and defeat the purpose of this feature which is to remove secrets from the environment/config. # Checklist for submitter If some of the following don't apply, delete the relevant line. <!-- Note that API documentation changes are now addressed by the product design team. --> - [x] Changes file added for user-visible changes in `changes/`, `orbit/changes/` or `ee/fleetd-chrome/changes`. See [Changes files](https://github.com/fleetdm/fleet/blob/main/docs/Contributing/guides/committing-changes.md#changes-files) for more information. - [x] Added/updated automated tests - [x] Manual QA for all new/changed functionality --------- Co-authored-by: Scott Gress <scott@fleetdm.com>
This adds automatic support for AWS IAM auth for RDS MySQL/MariaDB and ElastiCache Redis/Valkey. If no password is configured with an RDS/ElastiCache endpoint, IAM auth is automatically used. An IAM role to assume is optionally configurable.
CloudFormation stacks and test binaries are provided to manually test common scenarios (including RDS Proxy and assuming roles).
Closes #1817
Closes #20527
Checklist for submitter
If some of the following don't apply, delete the relevant line.
changes/,orbit/changes/oree/fleetd-chrome/changes.See Changes files for more information.
Summary by CodeRabbit