Add AWS IAM auth for RDS MySQL/MariaDB and ElastiCache Redis/Valkey by titanous · Pull Request #31075 · fleetdm/fleet

titanous · 2025-07-19T22:26:45Z

This adds automatic support for AWS IAM auth for RDS MySQL/MariaDB and ElastiCache Redis/Valkey. If no password is configured with an RDS/ElastiCache endpoint, IAM auth is automatically used. An IAM role to assume is optionally configurable.

CloudFormation stacks and test binaries are provided to manually test common scenarios (including RDS Proxy and assuming roles).

Closes #1817
Closes #20527

Checklist for submitter

If some of the following don't apply, delete the relevant line.

Changes file added for user-visible changes in changes/, orbit/changes/ or ee/fleetd-chrome/changes.
See Changes files for more information.
Added/updated automated tests
Manual QA for all new/changed functionality

Summary by CodeRabbit

New Features
- Added IAM authentication support for AWS ElastiCache (Redis), including optional STS AssumeRole and External ID.
- Added IAM authentication support for AWS RDS (MySQL/MariaDB), with optional STS AssumeRole and External ID.
Documentation
- Updated configuration docs with per-service STS role/external ID options (MySQL, Redis, Firehose, Kinesis, Lambda, SES, S3, Packaging) and IAM auth behavior notes.
Tests
- Introduced test tools and CloudFormation templates for validating RDS and ElastiCache IAM authentication (deploy/cleanup scripts and CLI testers).

coderabbitai · 2025-07-19T22:26:52Z

Important

Review skipped

Auto reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

📝 Walkthrough

Walkthrough

Adds IAM-based authentication for MySQL (RDS) and Redis (ElastiCache). Introduces optional STS assume role and external ID config for both. Implements token generation, caching, and connection flows, updates configs and docs, adds tests for endpoint parsing, updates dependencies, and provides test tooling with CloudFormation templates and CLI utilities.

Changes

Cohort / File(s)	Summary
AWS IAM common utilities `server/aws_common/iam_auth.go`	Adds IAM token manager with caching and jitter; adds AWS config loader with optional STS assume role and external ID.
Config schema and wiring `server/config/config.go`, `cmd/fleet/serve.go`, `docs/Configuration/fleet-server-configuration.md`	Adds sts_assume_role_arn and sts_external_id to MySQL/Redis config; wires Redis pool fields; documents new options and IAM auto-auth conditions.
MySQL IAM auth (RDS) `server/datastore/mysql/common_mysql/common.go`, `server/datastore/mysql/common_mysql/aws_iam_auth.go`, `server/datastore/mysql/common_mysql/aws_iam_auth_test.go`	Detects RDS endpoints without passwords to use IAM; builds auth tokens; integrates connector-based flow; tests region extraction from endpoints.
Redis IAM auth (ElastiCache) `server/datastore/redis/redis.go`, `server/datastore/redis/aws_iam_auth.go`, `server/datastore/redis/aws_region_map_generated.go`, `server/datastore/redis/gen_aws_region_map.go`, `server/datastore/redis/redis_test.go`	Detects ElastiCache endpoints without passwords to use IAM; generates presigned tokens; forces TLS; adds region map generator and tests for endpoint parsing.
Dependencies `go.mod`	Adds/upgrades AWS RDS auth and MySQL driver deps; adds rdsmysql/v2; adjusts go-tpm direct dependency.
Changelog entries `changes/aws-elasticache-iam-auth`, `changes/aws-rds-iam-auth`	Notes IAM auth support and new STS assume role option for Redis and MySQL.
MySQL IAM test tooling `tools/mysql-tests/rds/*`	Adds CFN template, deploy/cleanup scripts, and CLI tool to validate RDS IAM auth.
Redis IAM test tooling `tools/redis-tests/elasticache/*`	Adds CFN template, deploy/cleanup scripts, and CLI tool to validate ElastiCache IAM auth.

Sequence Diagram(s)

sequenceDiagram
  participant Fleet
  participant Config
  participant AWS as AWS SDK
  participant STS as STS
  participant RDS as RDS MySQL

  Fleet->>Config: Load MySQL config (addr, user, sts_assume_role_arn, external_id)
  alt RDS endpoint and no password
    Fleet->>AWS: Load config (region)
    opt sts_assume_role_arn provided
      AWS->>STS: AssumeRole(external_id?)
      STS-->>AWS: Temp credentials
    end
    Fleet->>AWS: Build RDS IAM auth token
    Fleet->>RDS: Connect using token (TLS)
  else otherwise
    Fleet->>RDS: Connect using standard DSN
  end

sequenceDiagram
  participant Fleet
  participant Config
  participant AWS as AWS SDK
  participant STS as STS
  participant EC as ElastiCache Redis

  Fleet->>Config: Load Redis config (addr, username, sts_assume_role_arn, external_id)
  alt ElastiCache endpoint and no password
    Fleet->>AWS: Load config (region)
    opt sts_assume_role_arn provided
      AWS->>STS: AssumeRole(external_id?)
      STS-->>AWS: Temp credentials
    end
    Fleet->>AWS: Presign elasticache connect URL (token)
    Fleet->>EC: TLS connect using IAM token as password
  else otherwise
    Fleet->>EC: Connect with configured password or no auth
  end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Assessment against linked issues

Objective	Addressed	Explanation
Implement IAM auth for MySQL when RDS endpoint and no password (#1817)	✅
Support sts_assume_role_arn and sts_external_id for MySQL (#1817)	✅
Implement IAM auth for Redis when ElastiCache endpoint and no password (#1817, #20527)	✅
Support sts_assume_role_arn and sts_external_id for Redis (#1817, #20527)	✅

Assessment against linked issues: Out-of-scope changes

Code Change	Explanation
Add STS config options for Firehose, Kinesis, Lambda, SES, S3, packaging in docs (docs/Configuration/fleet-server-configuration.md; entire file)	These services are not part of the stated objectives focused on MySQL and Redis IAM authentication.
Add ElastiCache and RDS CloudFormation templates and deploy/cleanup scripts (tools/redis-tests/elasticache/cf-template.yaml, deploy-test-env.sh, cleanup-test-env.sh; tools/mysql-tests/rds/cf-template.yaml, deploy-test-env.sh, cleanup-test-env.sh; entire files)	Test environment provisioning is not specified in the objectives; it’s auxiliary tooling outside the core product changes.
Add Redis and MySQL IAM test CLI utilities (tools/redis-tests/elasticache/iam_auth.go, tools/mysql-tests/rds/iam_auth.go; entire files)	Standalone test tools are not part of the product requirements; they are supportive but not required by the issue.

✨ Finishing Touches

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

Visit our Status Page to check the current availability of CodeRabbit.
Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

codecov · 2025-07-19T22:39:08Z

Codecov Report

Attention: Patch coverage is 35.27697% with 222 lines in your changes missing coverage. Please review.

Project coverage is 63.93%. Comparing base (5363ce1) to head (d31f4e1).
Report is 3 commits behind head on main.

Files with missing lines	Patch %	Lines
server/datastore/redis/aws_iam_auth.go	52.54%	53 Missing and 3 partials ⚠️
...erver/datastore/mysql/common_mysql/aws_iam_auth.go	23.61%	54 Missing and 1 partial ⚠️
server/aws_common/iam_auth.go	0.00%	48 Missing ⚠️
server/datastore/mysql/common_mysql/common.go	23.52%	32 Missing and 7 partials ⚠️
server/datastore/redis/redis.go	24.13%	19 Missing and 3 partials ⚠️
cmd/fleet/serve.go	0.00%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main   #31075      +/-   ##
==========================================
- Coverage   63.99%   63.93%   -0.06%     
==========================================
  Files        1911     1914       +3     
  Lines      188185   188508     +323     
  Branches     5594     5594              
==========================================
+ Hits       120428   120528     +100     
- Misses      58302    58511     +209     
- Partials     9455     9469      +14

Flag	Coverage Δ
backend	`64.80% <35.27%> (-0.07%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

noahtalerman · 2025-07-28T16:48:58Z

Thanks @titanous! Just a heads up that we prioritized these changes and the changes in your separate PR (#31134).

We peeled off a user story for these changes. This story is now going through drafting. All changes to Fleet are brought through drafting as user stories.

Once the story goes through drafting, we'll have more info you for you on proposed changes/tweaks (if any). Thanks for your patience.

In the meantime, setting this PR to draft. You can follow along on drafting progress in the user story: #31321

titanous · 2025-07-28T16:56:40Z

@noahtalerman thanks, note that this PR (unlike the separate one) covers two issues (linked at the top) that have already been through at least some of your process.

noahtalerman · 2025-08-12T15:27:53Z


 For the address of the Redis server that Fleet should connect to, include the hostname and port.

+If an AWS ElastiCache endpoint is specified and `redis_password` isn't specified, Identity and Access Management (IAM) authentication is automatically used.


@titanous just checking, is this a breaking change for AWS users that don't set redis_password today?

Put differently, will this force these users to add a role in AWS or make another change when they upgrade Fleet?

Same question for MySQL: https://github.com/fleetdm/fleet/pull/31075/files#diff-66314062a452d1931a2ffbfa8b76a0f1e3e2e81c5290c451313194bbc59f902aR26

Ah, interesting. I think it may be possible that there are Elasticache clusters without a password enabled. I don't think it's possible to configure RDS with no password but I'm not 100% certain.

I think we can mitigate this on the Redis side by checking if redis_username is populated with no redis_password, because that config shouldn't exist in the wild and IAM auth requires a username.

I think it may be possible that there are Elasticache clusters without a password enabled

There are! My understanding is that Fleet is doing this for managed-cloud offerings. @rfairburn @BCTBB please correct me if I'm wrong.

Assuming this is true, I don't think we want to force a password (breaking change).

Instead, I think we want to update this behavior: IAM is only used if redis_sts_assume_role_arn is specified.

Meaning, If an AWS ElastiCache endpoint is specified and redis_password isn't specified, Fleet functions as it does today. Identity and Access Management (IAM) authentication isn't automatically used.

What do y'all think?

We can't key off of redis_sts_assume_role_arn, because role assumption is not required (eg. instance/container/pod roles that are ambiently in the environment do not require additional role assumption). I think just adding a check to make sure there is a username specified should handle this case.

Why not just be specific about it and have a "use IAM authentication" config var that will be set to true if it is intended to be used? Assumptions of intention can lead to confusion.

Yeah, makes sense. I was trying to make the UX nice with a happy path but not a big deal to just have a flag.

How does it work today for all other IAM Fleet server config options? ( firehose, kinesis, lambda, etc.)

It's IAM (if IAM is set), and if IAM isn't set fall back to basic auth. Right? We don't have a use IAM auth. I think we want to be consistent for now. I updated the docs to clarify this.

For example:

@noahtalerman currently those will use the environmental credentials (aka IAM auth) if nothing is configured

coderabbitai · 2025-08-13T18:01:17Z

✅ Actions performed

Full review triggered.

coderabbitai

Actionable comments posted: 16

🔭 Outside diff range comments (1)

server/aws_common/iam_auth.go (1)
83-101: Propagate ctx into ConfigureAssumeRoleProvider (fix required)

I verified server/aws_common/aws_common.go: ConfigureAssumeRoleProvider calls aws_config.LoadDefaultConfig(context.Background(), ...), so caller cancellation/deadlines are lost. Thread a context.Context through ConfigureAssumeRoleProvider and update all callers to pass ctx.

Files to change

server/aws_common/aws_common.go — change signature and use ctx when calling LoadDefaultConfig.

Update callers to pass ctx:

server/aws_common/iam_auth.go

server/mail/ses.go

server/logging/firehose.go

server/logging/lambda.go

server/logging/kinesis.go

server/datastore/s3/s3.go

Suggested minimal changes (illustrative)

aws_common.go: add ctx param and use it
-func ConfigureAssumeRoleProvider(
-    conf aws.Config,
-    opts []func(*aws_config.LoadOptions) error,
-    stsAssumeRoleARN,
-    stsExternalID string,
-) (aws.Config, error) {
+func ConfigureAssumeRoleProvider(
+    ctx context.Context,
+    conf aws.Config,
+    opts []func(*aws_config.LoadOptions) error,
+    stsAssumeRoleARN,
+    stsExternalID string,
+) (aws.Config, error) {
@@
-    conf, err := aws_config.LoadDefaultConfig(context.Background(), opts...)
+    conf, err := aws_config.LoadDefaultConfig(ctx, opts...)
Call sites (example in iam_auth.go)
-    cfg, err = ConfigureAssumeRoleProvider(cfg, opts, assumeRoleArn, stsExternalID)
+    cfg, err = ConfigureAssumeRoleProvider(ctx, cfg, opts, assumeRoleArn, stsExternalID)
Reasoning and reference

Propagating context preserves cancellation/timeouts from the caller. See Go context guidelines: https://pkg.go.dev/context

Would you like me to open a small follow-up issue/PR to thread ctx through these call sites?

🧹 Nitpick comments (23)

tools/mysql-tests/rds/README.md (3)
15-16: Clarify region environment variable for AWS CLI

AWS CLI honors AWS_DEFAULT_REGION in addition to AWS_REGION. To avoid confusion across shells and tooling, consider mentioning both.

Apply this diff to expand the example:
-# Set AWS credentials and region
-export AWS_PROFILE=your-profile  # or use AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY
-export AWS_REGION=us-east-2
+# Set AWS credentials and region
+export AWS_PROFILE=your-profile  # or use AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY
+# The CLI and SDK honor either of these. Set one or both for clarity.
+export AWS_REGION=us-east-2
+export AWS_DEFAULT_REGION=us-east-2
21-26: Add cost and quota disclaimer

A short note helps folks avoid surprises when provisioning multiple RDS instances and an EC2 host.

You could append:
 This will:
 - Create all RDS instances with IAM authentication enabled
 - Create an EC2 instance with necessary IAM permissions
 - Create database users for IAM authentication
 - Output connection information
+
+Note: This creates paid AWS resources. Be mindful of account quotas and costs. Clean up when finished.
27-33: Mention cleanup prerequisites (profile and region)

Make it explicit the same AWS_PROFILE/region must be set to delete the right stack.

A small addition keeps users safe:
 ### Clean Up
 
 ```bash
+# Ensure the same AWS_PROFILE and region used for deployment are set
 ./cleanup-test-env.sh
</blockquote></details>
<details>
<summary>server/config/config.go (1)</summary><blockquote>

`1051-1053`: **Edge case: external ID without an assume role ARN**

If users set sts_external_id without sts_assume_role_arn, the external ID is unused. Consider a gentle validation or log to inform users.

If you want, I can add a small validation step that logs a warning when sts_external_id is set but sts_assume_role_arn is empty.



Also applies to: 1086-1088

</blockquote></details>
<details>
<summary>tools/redis-tests/elasticache/README.md (4)</summary><blockquote>

`5-9`: **Clarify prerequisites (permissions, tooling, and cost notice).**

Small additions improve setup reliability and avoid surprises.

Apply this diff to expand the prerequisites:

```diff
 ## Prerequisites

-- AWS CLI configured with appropriate credentials (via `AWS_PROFILE` or `AWS_ACCESS_KEY_ID`/`AWS_SECRET_ACCESS_KEY`)
+- AWS CLI v2 installed and configured with appropriate credentials (via `AWS_PROFILE` or `AWS_ACCESS_KEY_ID`/`AWS_SECRET_ACCESS_KEY`)
+- An AWS region set (e.g., `AWS_REGION` or `AWS_DEFAULT_REGION`)
+- Sufficient IAM permissions to create and delete resources used by the stack (CloudFormation, EC2/VPC, ElastiCache, IAM, and CloudWatch Logs)
+- Note: Deploying this stack will incur AWS charges. Remember to run cleanup when you are done.
11-20: Add a quick “verify environment” step before deployment.

Helps catch misconfigured credentials or regions early.

Apply this diff to add a short preflight:
 ### 1. Deploy Test Environment

 ```bash
 # Set AWS credentials and region
 export AWS_PROFILE=your-profile  # or use AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY
 export AWS_REGION=us-east-2
+
+# Verify identity and region
+aws sts get-caller-identity
+echo "Region: ${AWS_REGION:-$AWS_DEFAULT_REGION}"
 
 # Deploy the test environment
 ./deploy-test-env.sh
---

`22-28`: **Document how to find and use the outputs.**

Readers will need endpoints/hostnames to run the test tools; point them to where deploy script surfaces these.

Apply this diff to clarify outputs:

```diff
 This creates:
 - VPC with subnets and security groups
 - EC2 instance for running tests
 - ElastiCache Serverless instance with IAM auth
 - ElastiCache Standalone replication group with IAM auth
 - IAM users and roles for authentication
+
+On success, the deploy script prints the Redis endpoints and instance details. It also writes the outputs to a local file for convenience (the script logs the path). Use those values with the test tools in this directory.
29-35: Set expectations for teardown timing and completeness.

Make it clear cleanup can take several minutes.

Apply this diff to expand cleanup notes:
 ### 2. Clean Up

 ```bash
 ./cleanup-test-env.sh
-This will destroy the AWS resources.
+This will destroy the AWS resources. Teardown can take 10–20 minutes depending on service state.
</blockquote></details>
<details>
<summary>server/datastore/mysql/common_mysql/aws_iam_auth_test.go (1)</summary><blockquote>

`9-121`: **Parallelize subtests and add a couple of edge cases.**

Running subtests in parallel speeds things up. Adding us-gov-east-1 and a China endpoint with port improves resiliency.

Apply this diff to parallelize and add cases:

```diff
 func TestExtractRDSRegion(t *testing.T) {
+	t.Parallel()
 	tests := []struct {
@@
 	for _, tt := range tests {
 		t.Run(tt.name, func(t *testing.T) {
+			t.Parallel()
 			got, err := extractRDSRegion(tt.endpoint)
@@
 		})
 	}
 }
Add these additional cases to the tests slice:
@@
 		{
 			name:     "GovCloud region",
 			endpoint: "database.abc123.us-gov-west-1.rds.amazonaws.com",
 			want:     "us-gov-west-1",
 			wantErr:  false,
 		},
+		{
+			name:     "GovCloud us-gov-east-1",
+			endpoint: "db123.abc123.us-gov-east-1.rds.amazonaws.com",
+			want:     "us-gov-east-1",
+			wantErr:  false,
+		},
@@
 		{
 			name:     "Endpoint with port should fail",
 			endpoint: "mydb.abc123.us-east-1.rds.amazonaws.com:3306",
 			wantErr:  true,
 		},
+		{
+			name:     "China endpoint with port should fail",
+			endpoint: "mydb.abc123.cn-north-1.rds.amazonaws.com.cn:3306",
+			wantErr:  true,
+		},
server/datastore/redis/redis_test.go (1)
115-250: Parallelize subtests and cover China and GovCloud partitions.

This widens confidence that parseElastiCacheEndpoint handles all AWS partitions.

Apply this diff to parallelize:
 func TestParseElastiCacheEndpoint(t *testing.T) {
+	t.Parallel()
@@
 	for _, tt := range tests {
 		t.Run(tt.name, func(t *testing.T) {
+			t.Parallel()
 			gotRegion, gotCacheName, err := parseElastiCacheEndpoint(tt.endpoint)
Add these cases to the tests slice:
@@
 		// Error cases
+		{
+			name:          "China partition serverless",
+			endpoint:      "prod-cache-xyz789.serverless.cnn1.cache.amazonaws.com.cn",
+			wantRegion:    "cn-north-1",
+			wantCacheName: "prod-cache",
+			wantErr:       false,
+		},
+		{
+			name:          "China partition standalone",
+			endpoint:      "master.cn-cache.abc123.cnnw1.cache.amazonaws.com.cn:6379",
+			wantRegion:    "cn-northwest-1",
+			wantCacheName: "cn-cache",
+			wantErr:       false,
+		},
+		{
+			name:          "GovCloud partition",
+			endpoint:      "gov-cache.abc123.us-gov-west-1.cache.amazonaws.com",
+			wantRegion:    "us-gov-west-1",
+			wantCacheName: "gov-cache",
+			wantErr:       false,
+		},
tools/redis-tests/elasticache/iam_auth.go (1)
12-17: Let -addr be portless (default 6379) and reflect that in the help text.

This makes the tool friendlier with AWS-provided endpoints that often omit ports.

Apply this diff to adjust the flag usage:
-var (
-	addrFlag       = flag.String("addr", "", "ElastiCache endpoint address, including port")
+var (
+	addrFlag       = flag.String("addr", "", "ElastiCache endpoint address (optionally with :port, defaults to 6379)")
 	userFlag       = flag.String("user", "", "Username for IAM authentication")
 	assumeRoleFlag = flag.String("assume-role", "", "STS assume role ARN (optional)")
 	externalIDFlag = flag.String("external-id", "", "STS external ID (optional)")
 )
tools/mysql-tests/rds/iam_auth.go (1)
54-57: Consider logging the final normalized address.

Helps when users pass host:port and the tool overrides the provided -port.

Apply this diff to improve visibility:
-	log.Printf("Connecting to RDS at %s:%s with IAM auth for user %s", *endpointFlag, *portFlag, *userFlag)
+	log.Printf("Connecting to RDS at %s with IAM auth for user %s", mysqlConfig.Address, *userFlag)
tools/redis-tests/elasticache/deploy-test-env.sh (1)
5-7: Remove or use unused variables (ShellCheck SC2034)

VPC_CIDR and SUBNET_CIDR are declared but never used. Either pass them as CloudFormation parameters or remove them to avoid confusion.

Apply one of these diffs.

Option A: remove them
-VPC_CIDR="10.0.0.0/16"
-SUBNET_CIDR="10.0.1.0/24"
Option B: plumb them into the stack (only if cf-template.yaml supports these parameters)
 aws cloudformation create-stack \
   --stack-name "$STACK_NAME" \
   --template-body file://cf-template.yaml \
-  --parameters ParameterKey=KeyName,ParameterValue="$KEY_NAME" ParameterKey=InstanceType,ParameterValue="$INSTANCE_TYPE" \
+  --parameters \
+    ParameterKey=KeyName,ParameterValue="$KEY_NAME" \
+    ParameterKey=InstanceType,ParameterValue="$INSTANCE_TYPE" \
+    ParameterKey=VPCCIDR,ParameterValue="$VPC_CIDR" \
+    ParameterKey=SubnetCIDR,ParameterValue="$SUBNET_CIDR" \
   --capabilities CAPABILITY_NAMED_IAM
server/datastore/redis/redis.go (1)

334-345: Consider logging when IAM auth is used (optional)

A debug log can help operators confirm IAM auth is active. Example: “Using Redis IAM auth for (region )”. This is optional since the code already behaves correctly.
server/datastore/redis/gen_aws_region_map.go (2)
7-8: Fix run instructions in the comment

The filename is gen_aws_region_map.go, but the comment says gen_region_map.go.
-// To run: go run gen_region_map.go
+// To run: go run gen_aws_region_map.go
143-147: Avoid external gofmt invocation; format using go/format instead

Calling the gofmt binary may fail if it’s not available in PATH. Use go/format to format the buffer before writing it.

Example change:
- // Run gofmt on the generated file
- cmd := exec.Command("gofmt", "-w", "aws_region_map_generated.go")
- if err := cmd.Run(); err != nil {
-   log.Printf("Warning: Failed to run gofmt: %v", err)
- }
+ // Optionally format with go/format for robustness
+ // buf := new(bytes.Buffer)
+ // if err := tmpl.Execute(buf, entries); err != nil {
+ //   log.Fatalf("Failed to execute template: %v", err)
+ // }
+ // formatted, err := format.Source(buf.Bytes())
+ // if err != nil {
+ //   log.Printf("Warning: Failed to format generated code: %v", err)
+ //   formatted = buf.Bytes()
+ // }
+ // if _, err := outputFile.Write(formatted); err != nil {
+ //   log.Fatalf("Failed to write output file: %v", err)
+ // }
+ // return
If you prefer to keep the current approach, make the gofmt step best-effort and non-fatal (as you already do).
server/datastore/mysql/common_mysql/aws_iam_auth.go (1)
25-50: Avoid context.Background() for AWS config loading

Using context.Background() prevents cancellation/deadline propagation during STS calls and credential resolution. Prefer accepting a context and passing it through.
-func newAWSIAMAuthTokenGenerator(dbEndpoint, dbUsername, dbPort, region, assumeRoleArn, stsExternalID string) (*awsIAMAuthTokenGenerator, error) {
-  // Load AWS configuration
-  cfg, err := aws_common.LoadAWSConfig(context.Background(), region, assumeRoleArn, stsExternalID)
+func newAWSIAMAuthTokenGenerator(ctx context.Context, dbEndpoint, dbUsername, dbPort, region, assumeRoleArn, stsExternalID string) (*awsIAMAuthTokenGenerator, error) {
+  // Load AWS configuration
+  cfg, err := aws_common.LoadAWSConfig(ctx, region, assumeRoleArn, stsExternalID)
Follow-ups: update call sites to pass a context (e.g., from Connect(ctx) or initialization path).
server/datastore/redis/aws_iam_auth.go (2)
110-149: Handle ports and China regions; add tests for endpoint parsing

Parsing removes the port (good) and locates region via “cache.amazonaws”. Consider explicitly allowing .com.cn suffix (parsing continues to work, but an early validation can improve error messages). Also, rely on a lowercased hostname to avoid case edge cases.
 func parseElastiCacheEndpoint(endpoint string) (region, cacheName string, err error) {
   // Remove port if present
   hostname := endpoint
   if idx := strings.LastIndex(hostname, ":"); idx != -1 {
     hostname = hostname[:idx]
   }
+  hostname = strings.ToLower(hostname)

   // Extract region from the ElastiCache endpoint
   // Format: name.region.cache.amazonaws.com or name.serverless.region.cache.amazonaws.com
   parts := strings.Split(hostname, ".")
   if len(parts) < 4 {
     return "", "", fmt.Errorf("invalid ElastiCache endpoint format for IAM auth: %s", endpoint)
   }

   // Find the region code - it's always before "cache.amazonaws.com"
   var regionCode string
   for i := 0; i < len(parts)-2; i++ {
     if parts[i+1] == "cache" && parts[i+2] == "amazonaws" {
       regionCode = parts[i]
       break
     }
   }
I can add table-driven tests for:

serverless endpoint

cluster node names with numeric suffix

endpoints with ports

China regions (.com.cn)
Would you like me to open a follow-up PR with tests?

151-204: Name extraction heuristics could false-positive on hyphenated IDs

For replication group IDs that legitimately end with -NNN not representing node suffix, this could chop too much. If this becomes an issue, consider checking the next label for exact replication group match only for cluster-mode patterns; else, do not strip.

Add unit tests demonstrating:

replication-group-id-202 read as full ID when not followed by same ID in next label

cluster mode replication-group-id-001.replication-group-id... collapses to replication-group-id
tools/mysql-tests/rds/deploy-test-env.sh (1)

81-91: Be careful storing plaintext DB credentials

test-env-info.txt stores DB_PASSWORD. If this is only for ephemeral testing, that’s okay, but please add a prominent warning and .gitignore protection for this file.

Add test-env-info.txt to .gitignore and print a reminder that it contains secrets.

tools/mysql-tests/rds/cf-template.yaml (2)

406-425: Pinning AMI via SSM is good; consider installing mysql-client instead of mariadb105 on AL2023

Not critical, but mysql client may more closely match some user environments. mariadb105 is fine for tests.

426-465: Fix YAML linting issues (trailing spaces, EOF newline)

There are trailing spaces at several lines and no newline at EOF per YAMLlint. Please clean these up to reduce noise in diffs.

You can auto-fix with a formatter (e.g., yamlfmt) or an editor setting to trim trailing whitespace and ensure a final newline.
server/aws_common/iam_auth.go (1)
49-51: Optional: avoid predictable jitter without global rand seeding

math/rand without seeding yields a deterministic sequence per process. It’s not a security issue here, but if you want more distributed refresh across pods, consider seeding once or using time-based jitter.
-	jitter := time.Duration(rand.Int63n(int64(maxJitter))) //nolint:gosec // jitter doesn't need cryptographic randomness
+	// Not security-sensitive; seed once in init() or derive from time for better distribution.
+	jitter := time.Duration(time.Now().UnixNano()%int64(maxJitter)) //nolint:gosec

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5363ce1 and 6e8b900.

⛔ Files ignored due to path filters (1)

go.sum is excluded by !**/*.sum

📒 Files selected for processing (25)

changes/aws-elasticache-iam-auth (1 hunks)
changes/aws-rds-iam-auth (1 hunks)
cmd/fleet/serve.go (1 hunks)
docs/Configuration/fleet-server-configuration.md (14 hunks)
go.mod (4 hunks)
server/aws_common/iam_auth.go (1 hunks)
server/config/config.go (6 hunks)
server/datastore/mysql/common_mysql/aws_iam_auth.go (1 hunks)
server/datastore/mysql/common_mysql/aws_iam_auth_test.go (1 hunks)
server/datastore/mysql/common_mysql/common.go (5 hunks)
server/datastore/redis/aws_iam_auth.go (1 hunks)
server/datastore/redis/aws_region_map_generated.go (1 hunks)
server/datastore/redis/gen_aws_region_map.go (1 hunks)
server/datastore/redis/redis.go (3 hunks)
server/datastore/redis/redis_test.go (1 hunks)
tools/mysql-tests/rds/README.md (1 hunks)
tools/mysql-tests/rds/cf-template.yaml (1 hunks)
tools/mysql-tests/rds/cleanup-test-env.sh (1 hunks)
tools/mysql-tests/rds/deploy-test-env.sh (1 hunks)
tools/mysql-tests/rds/iam_auth.go (1 hunks)
tools/redis-tests/elasticache/README.md (1 hunks)
tools/redis-tests/elasticache/cf-template.yaml (1 hunks)
tools/redis-tests/elasticache/cleanup-test-env.sh (1 hunks)
tools/redis-tests/elasticache/deploy-test-env.sh (1 hunks)
tools/redis-tests/elasticache/iam_auth.go (1 hunks)

🧰 Additional context used

🧠 Learnings (2)

📚 Learning: 2025-08-08T07:40:05.301Z

Learnt from: getvictor
PR: fleetdm/fleet#31726
File: server/datastore/mysql/labels_test.go:2031-2031
Timestamp: 2025-08-08T07:40:05.301Z
Learning: Fleet repo targets Go 1.24.5 (root go.mod), which supports testing.T.Context(). Do not flag usage of t.Context() or suggest replacing it with context.Background() in tests (e.g., server/datastore/mysql/labels_test.go Line 2031 and similar).

Applied to files:

tools/mysql-tests/rds/iam_auth.go

📚 Learning: 2025-08-08T07:40:05.301Z

Learnt from: getvictor
PR: fleetdm/fleet#31726
File: server/datastore/mysql/labels_test.go:2031-2031
Timestamp: 2025-08-08T07:40:05.301Z
Learning: In fleetdm/fleet repository tests (server/datastore/mysql/labels_test.go and similar), using testing.T.Context() is valid because the project targets a recent Go version where testing.T.Context() exists. Do not suggest replacing t.Context() with context.Background() in this codebase.

Applied to files:

tools/mysql-tests/rds/iam_auth.go

🧬 Code Graph Analysis (5)

server/datastore/mysql/common_mysql/aws_iam_auth.go (1)

server/aws_common/iam_auth.go (3)

IAMAuthTokenManager (30-37)

LoadAWSConfig (84-100)

NewIAMAuthTokenManager (40-44)

tools/redis-tests/elasticache/iam_auth.go (1)

server/datastore/redis/redis.go (2)

NewPool (96-116)

PoolConfig (63-92)

tools/mysql-tests/rds/iam_auth.go (2)

server/config/config.go (1)

MysqlConfig (37-55)

server/datastore/mysql/common_mysql/common.go (2)

DBOptions (20-30)

NewDB (32-117)

server/datastore/redis/aws_iam_auth.go (1)

server/aws_common/iam_auth.go (3)

IAMAuthTokenManager (30-37)

LoadAWSConfig (84-100)

NewIAMAuthTokenManager (40-44)

server/aws_common/iam_auth.go (1)

server/aws_common/aws_common.go (1)

ConfigureAssumeRoleProvider (19-40)

🪛 Shellcheck (0.10.0)

tools/redis-tests/elasticache/deploy-test-env.sh

[warning] 5-5: VPC_CIDR appears unused. Verify use (or export if used externally).

(SC2034)

[warning] 6-6: SUBNET_CIDR appears unused. Verify use (or export if used externally).

(SC2034)

tools/mysql-tests/rds/deploy-test-env.sh

[warning] 100-100: engine appears unused. Verify use (or export if used externally).

(SC2034)

[warning] 106-106: Quote 'EOF' to make here document expansions happen on the server side rather than on the client.

(SC2087)

🪛 Checkov (3.2.334)

tools/mysql-tests/rds/cf-template.yaml

[HIGH] 288-307: Ensure all data stored in Aurora is securely encrypted at rest

(CKV_AWS_96)

[HIGH] 321-336: Ensure all data stored in Aurora is securely encrypted at rest

(CKV_AWS_96)

🪛 YAMLlint (1.37.1)

tools/mysql-tests/rds/cf-template.yaml

[error] 213-213: trailing spaces

(trailing-spaces)

[error] 214-214: trailing spaces

(trailing-spaces)

[error] 217-217: trailing spaces

(trailing-spaces)

[error] 220-220: trailing spaces

(trailing-spaces)

[error] 223-223: trailing spaces

(trailing-spaces)

[error] 260-260: trailing spaces

(trailing-spaces)

[error] 262-262: trailing spaces

(trailing-spaces)

[error] 265-265: trailing spaces

(trailing-spaces)

[error] 268-268: trailing spaces

(trailing-spaces)

[error] 271-271: trailing spaces

(trailing-spaces)

[error] 375-375: trailing spaces

(trailing-spaces)

[error] 465-465: no new line character at the end of file

(new-line-at-end-of-file)

🔇 Additional comments (29)

go.mod (4)

24-24: Adding aws-sdk-go-v2 RDS auth feature is appropriate

This is the right module for generating IAM auth tokens for RDS/MySQL. It aligns with the rest of the v2 SDK usage in the repo.

65-65: Upgrade to go-sql-driver/mysql v1.9.1 looks good

This version is compatible and stable. No known breaking changes with common DSN options used by Fleet.

122-122: rdsmysql/v2 addition fits the RDS IAM flow

This is a good choice when pairing with go-sql-driver/mysql and AWS SDK v2. Please ensure it’s not duplicating token generation paths with feature/rds/auth (it usually wraps SDK calls under the hood).

74-74: go-tpm is imported in securehw_linux.go — direct dependency is justified

I ran the repo search and found direct imports of github.com/google/go-tpm in ee/orbit/pkg/securehw/securehw_linux.go, so keeping the module in go.mod is appropriate.

ee/orbit/pkg/securehw/securehw_linux.go

lines 18–20: imports "github.com/google/go-tpm/tpm2", "github.com/google/go-tpm/tpm2/transport", "github.com/google/go-tpm/tpm2/transport/linuxtpm"

go.mod: line 74 includes github.com/google/go-tpm v0.9.5 (go.sum also contains matching entries)

No change required here.

server/config/config.go (6)

38-55: MySQL config gains STS assume role settings — looks good

The new fields and yaml tags are consistent with existing AWS config patterns.

64-65: Redis config gains STS assume role settings — looks good

This mirrors the MySQL settings and other AWS service configs.

1051-1053: Expose mysql.sts_assume_role_arn and mysql.sts_external_id — good addition

Naming is consistent and descriptions are clear.

1086-1088: Expose redis.sts_assume_role_arn and redis.sts_external_id — good addition

This matches the config style of other AWS integrations.

1482-1499: Wire STS fields into loaded MySQL config — good

The loader picks up both fields for primary and read replica configs.

1529-1531: Wire STS fields into loaded Redis config — good

This enables IAM auth flows to access assume-role details.

tools/redis-tests/elasticache/cf-template.yaml (1)

1-302: CloudFormation template looks well-structured for testing IAM authentication.

This template provides a comprehensive test environment for ElastiCache IAM authentication with both serverless and standalone configurations. The IAM roles and policies are properly configured to test different authentication scenarios including direct access and role assumption.

A few observations:

The security group setup correctly allows Redis traffic between EC2 and ElastiCache

Both user types (fleet-test-user and test-assume-user) are configured for IAM authentication

The assume role configuration includes proper trust relationships and conditions

docs/Configuration/fleet-server-configuration.md (5)

26-26: Clear documentation for automatic IAM usage.

The documentation clearly explains when IAM authentication is automatically used for MySQL connections, which helps users understand the behavior without needing to configure additional flags.

201-223: Well-documented MySQL STS configuration options.

The new MySQL STS configuration options are clearly documented with appropriate examples and descriptions. The pairing guidance for external ID usage is helpful for security best practices.

242-242: Important note about automatic IAM usage for Redis.

Based on the past review comments, there was discussion about whether this automatic behavior could be breaking for existing users. The current approach seems reasonable - if an ElastiCache endpoint is specified without a password, IAM auth is used automatically.

However, I noticed from the past comments that there was consideration for making this behavior more explicit. You might want to verify this doesn't break existing deployments where ElastiCache clusters exist without passwords but users don't intend to use IAM auth.

Can you confirm whether there are any managed cloud offerings or existing deployments that might be affected by this automatic IAM usage when redis_password is not specified?

300-323: Comprehensive Redis STS configuration documentation.

The Redis STS configuration options are well-documented with clear examples and consistent formatting that matches the MySQL options. This provides users with the flexibility to configure role assumption for ElastiCache authentication.

1403-1428: Consistent STS configuration across all AWS services.

The STS assume role and external ID configuration has been consistently added across all AWS services (Firehose, Kinesis, Lambda, SES, S3). The documentation follows the same pattern and provides clear examples for each service.

Also applies to: 1551-1576, 1699-1724, 2038-2061, 2146-2169

changes/aws-rds-iam-auth (1)

1-1: Clear and concise changelog entry.

The changelog entry accurately describes the new RDS IAM authentication support and mentions the key configuration option for role assumption.

cmd/fleet/serve.go (1)

269-270: Redis pool configuration updated with STS fields.

The addition of StsAssumeRoleArn and StsExternalID fields to the Redis pool configuration properly wires the new IAM authentication options from the config into the Redis connection setup.

changes/aws-elasticache-iam-auth (1)

1-1: Clear changelog entry for ElastiCache IAM support.

The changelog entry accurately describes the new ElastiCache IAM authentication feature and mentions the STS role assumption capability.

server/datastore/mysql/common_mysql/aws_iam_auth_test.go (1)

9-121: LGTM: solid coverage across RDS, Aurora, proxy, China, and GovCloud.

Nice job covering common and tricky endpoint shapes, plus strict error assertions.

server/datastore/redis/redis_test.go (1)

115-250: LGTM: practical endpoint coverage with serverless, standalone, cluster, and hyphenated names.

The table tests map short region codes to full regions well and cover port/no-port forms.

server/datastore/redis/aws_region_map_generated.go (1)

5-40: Generated region map looks good

The mapping covers current regions and follows the intended zone-prefix-to-region approach. Keeping this unexported inside the package is appropriate.
server/datastore/mysql/common_mysql/common.go (2)

136-152: Good TLS and cleartext handling for IAM/RDS

Setting tls=rdsmysql for direct RDS and tls=true for RDS Proxy, plus allowCleartextPasswords=true, matches AWS guidance for IAM auth with go-sql-driver/mysql.

142-149: No change needed — isRDSProxyEndpoint accepts host:port

I verified the detector: isRDSProxyEndpoint checks substrings in the full address, so passing conf.Address (which may include :port) still works. The suggested change is unnecessary.
Location: server/datastore/mysql/common_mysql/aws_iam_auth.go (lines ~137–141)
Function:
// isRDSProxyEndpoint checks if the given address is an RDS Proxy endpoint
func isRDSProxyEndpoint(address string) bool {
    // RDS Proxy endpoints have the format: proxy-name.proxy-xxxxxxxxx.region.rds.amazonaws.com
    return strings.Contains(address, ".proxy-") && strings.Contains(address, ".rds.amazonaws.")
}
Action: ignore the proposed diff; no code change required.

Likely an incorrect or invalid review comment.
server/datastore/redis/redis.go (1)

278-296: Solid IAM detection and TLS enforcement for ElastiCache

Auto-enabling TLS for IAM auth and generating the token at dial time is correct. Appending redis.DialPassword(token) with redis.DialUsername set earlier ensures AUTH happens as expected.

server/datastore/mysql/common_mysql/aws_iam_auth.go (2)

58-66: LGTM on token generation

Using rds/auth.BuildAuthToken with host:port, region, username, and credentials is correct. Error wrapping is clear.

137-141: Proxy endpoint detection looks good

The pattern check for .proxy- and .rds.amazonaws. is a pragmatic and safe detection.

server/datastore/redis/aws_iam_auth.go (1)

62-92: Presigning flow for ElastiCache IAM auth looks correct

You’re building a GET with Action=connect and User=..., setting X-Amz-Expires=900, and presigning with service “elasticache”. Trimming the https:// prefix matches AWS guidance.

server/aws_common/iam_auth.go (1)

46-81: Token caching with jitter looks solid

The double-checked lock with RWMutex is correct, and the 10m+jitter refresh window is appropriate for 15m tokens.

coderabbitai · 2025-08-13T18:14:34Z

+// extractRDSRegion extracts the AWS region from an RDS endpoint
+func extractRDSRegion(endpoint string) (string, error) {
+	// RDS endpoint formats:
+	// - instance-name.abcdefg.region.rds.amazonaws.com (regular RDS)
+	// - cluster-name.cluster-abcdefg.region.rds.amazonaws.com (Aurora cluster endpoint)
+	// - cluster-name.cluster-ro-abcdefg.region.rds.amazonaws.com (Aurora read-only endpoint)
+	// - proxy-name.proxy-abcdefg.region.rds.amazonaws.com (RDS Proxy endpoint)
+	// - instance-name.abcdefg.region.rds.amazonaws.com.cn (China regions)
+
+	parts := strings.Split(endpoint, ".")
+	if len(parts) < 5 {
+		return "", fmt.Errorf("invalid RDS endpoint format for IAM auth: %s", endpoint)
+	}
+
+	// Check if it's an RDS endpoint
+	if !strings.Contains(endpoint, ".rds.amazonaws.com") {
+		return "", fmt.Errorf("endpoint does not appear to be an RDS endpoint: %s", endpoint)
+	}
+
+	// Find the region - it's always before "rds.amazonaws.com"
+	for i := 0; i < len(parts)-3; i++ {
+		if parts[i+1] == "rds" && parts[i+2] == "amazonaws" && (parts[i+3] == "com" || parts[i+3] == "com.cn") {
+			return parts[i], nil
+		}
+	}
+
+	return "", fmt.Errorf("could not extract region from RDS endpoint: %s", endpoint)
+}


💡 Verification agent

🧩 Analysis chain

Harden region parsing; strip port if present and validate suffixes

If a host:port is passed, region parsing will fail. Also, you only validate against .com/.com.cn later but not before stripping the port.

func extractRDSRegion(endpoint string) (string, error) { + // Remove port if present + if idx := strings.LastIndex(endpoint, ":"); idx != -1 { + endpoint = endpoint[:idx] + } // RDS endpoint formats: // - instance-name.abcdefg.region.rds.amazonaws.com (regular RDS) // - cluster-name.cluster-abcdefg.region.rds.amazonaws.com (Aurora cluster endpoint) // - cluster-name.cluster-ro-abcdefg.region.rds.amazonaws.com (Aurora read-only endpoint) // - proxy-name.proxy-abcdefg.region.rds.amazonaws.com (RDS Proxy endpoint) // - instance-name.abcdefg.region.rds.amazonaws.com.cn (China regions) parts := strings.Split(endpoint, ".") if len(parts) < 5 { return "", fmt.Errorf("invalid RDS endpoint format for IAM auth: %s", endpoint) } // Check if it's an RDS endpoint - if !strings.Contains(endpoint, ".rds.amazonaws.com") { + if !(strings.Contains(endpoint, ".rds.amazonaws.com") || strings.Contains(endpoint, ".rds.amazonaws.com.cn")) { return "", fmt.Errorf("endpoint does not appear to be an RDS endpoint: %s", endpoint) } // Find the region - it's always before "rds.amazonaws.com" for i := 0; i < len(parts)-3; i++ { - if parts[i+1] == "rds" && parts[i+2] == "amazonaws" && (parts[i+3] == "com" || parts[i+3] == "com.cn") { + if parts[i+1] == "rds" && parts[i+2] == "amazonaws" && (parts[i+3] == "com" || parts[i+3] == "com.cn") { return parts[i], nil } }

To confirm coverage, consider adding tests with host:port and .com.cn variants.

🏁 Script executed:

#!/bin/bash set -euo pipefail # Find the file containing extractRDSRegion rg -n "func extractRDSRegion" || true # Print the file if it exists FILE=$(rg -n --files-with-match "func extractRDSRegion" || true) if [ -n "$FILE" ]; then echo "---- $FILE ----" sed -n '1,240p' "$FILE" else echo "extractRDSRegion not found in repository." fi

Length of output: 797

🏁 Script executed:

#!/bin/bash set -euo pipefail FILE="server/datastore/mysql/common_mysql/aws_iam_auth.go" if [ -f "$FILE" ]; then echo "---- $FILE (lines 1-240) ----" sed -n '1,240p' "$FILE" else echo "file not found: $FILE" exit 1 fi

Length of output: 4984

Harden region parsing; strip port and accept .com.cn

If a host:port is passed the current parsing can fail (the port ends up in the last label). Also the function only checks for ".rds.amazonaws.com" up front, rejecting China endpoints even though the loop accepts "com.cn".

Files to change

server/datastore/mysql/common_mysql/aws_iam_auth.go — extractRDSRegion

Suggested patch (keep it small and safe):

func extractRDSRegion(endpoint string) (string, error) { + // Remove port if present + if idx := strings.LastIndex(endpoint, ":"); idx != -1 { + endpoint = endpoint[:idx] + } // RDS endpoint formats: // - instance-name.abcdefg.region.rds.amazonaws.com (regular RDS) // - cluster-name.cluster-abcdefg.region.rds.amazonaws.com (Aurora cluster endpoint) // - cluster-name.cluster-ro-abcdefg.region.rds.amazonaws.com (Aurora read-only endpoint) // - proxy-name.proxy-abcdefg.region.rds.amazonaws.com (RDS Proxy endpoint) // - instance-name.abcdefg.region.rds.amazonaws.com.cn (China regions) parts := strings.Split(endpoint, ".") if len(parts) < 5 { return "", fmt.Errorf("invalid RDS endpoint format for IAM auth: %s", endpoint) } // Check if it's an RDS endpoint - if !strings.Contains(endpoint, ".rds.amazonaws.com") { + if !(strings.Contains(endpoint, ".rds.amazonaws.com") || strings.Contains(endpoint, ".rds.amazonaws.com.cn")) { return "", fmt.Errorf("endpoint does not appear to be an RDS endpoint: %s", endpoint) }

Please add unit tests covering:

host:port endpoints (e.g. example.region.rds.amazonaws.com:3306)

China regions (.rds.amazonaws.com.cn)

RDS Proxy / cluster endpoint shapes

Thanks — this will make IAM token generation robust for ports and China regions.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

// extractRDSRegion extracts the AWS region from an RDS endpoint

func extractRDSRegion(endpoint string) (string, error) {

// RDS endpoint formats:

// - instance-name.abcdefg.region.rds.amazonaws.com (regular RDS)

// - cluster-name.cluster-abcdefg.region.rds.amazonaws.com (Aurora cluster endpoint)

// - cluster-name.cluster-ro-abcdefg.region.rds.amazonaws.com (Aurora read-only endpoint)

// - proxy-name.proxy-abcdefg.region.rds.amazonaws.com (RDS Proxy endpoint)

// - instance-name.abcdefg.region.rds.amazonaws.com.cn (China regions)

parts := strings.Split(endpoint, ".")

if len(parts) < 5 {

return "", fmt.Errorf("invalid RDS endpoint format for IAM auth: %s", endpoint)

}

// Check if it's an RDS endpoint

if !strings.Contains(endpoint, ".rds.amazonaws.com") {

return "", fmt.Errorf("endpoint does not appear to be an RDS endpoint: %s", endpoint)

}

// Find the region - it's always before "rds.amazonaws.com"

for i := 0; i < len(parts)-3; i++ {

if parts[i+1] == "rds" && parts[i+2] == "amazonaws" && (parts[i+3] == "com" || parts[i+3] == "com.cn") {

return parts[i], nil

}

}

return "", fmt.Errorf("could not extract region from RDS endpoint: %s", endpoint)

}

// extractRDSRegion extracts the AWS region from an RDS endpoint

func extractRDSRegion(endpoint string) (string, error) {

// Remove port if present

if idx := strings.LastIndex(endpoint, ":"); idx != -1 {

endpoint = endpoint[:idx]

}

// RDS endpoint formats:

// - instance-name.abcdefg.region.rds.amazonaws.com (regular RDS)

// - cluster-name.cluster-abcdefg.region.rds.amazonaws.com (Aurora cluster endpoint)

// - cluster-name.cluster-ro-abcdefg.region.rds.amazonaws.com (Aurora read-only endpoint)

// - proxy-name.proxy-abcdefg.region.rds.amazonaws.com (RDS Proxy endpoint)

// - instance-name.abcdefg.region.rds.amazonaws.com.cn (China regions)

parts := strings.Split(endpoint, ".")

if len(parts) < 5 {

return "", fmt.Errorf("invalid RDS endpoint format for IAM auth: %s", endpoint)

}

// Check if it's an RDS endpoint

if !(strings.Contains(endpoint, ".rds.amazonaws.com") || strings.Contains(endpoint, ".rds.amazonaws.com.cn")) {

return "", fmt.Errorf("endpoint does not appear to be an RDS endpoint: %s", endpoint)

}

// Find the region - it's always before "rds.amazonaws.com"

for i := 0; i < len(parts)-3; i++ {

if parts[i+1] == "rds" && parts[i+2] == "amazonaws" && (parts[i+3] == "com" || parts[i+3] == "com.cn") {

return parts[i], nil

}

}

return "", fmt.Errorf("could not extract region from RDS endpoint: %s", endpoint)

}

🤖 Prompt for AI Agents

In server/datastore/mysql/common_mysql/aws_iam_auth.go around lines 73-100, the extractRDSRegion function fails for host:port inputs and currently rejects China endpoints up front; update it to first strip any ":port" (use net.SplitHostPort or if that fails, trim at last ':' when there are no IPv6 brackets) so parsing works with host:port, broaden the upfront endpoint check to accept both ".rds.amazonaws.com" and ".rds.amazonaws.com.cn" (or check labels instead of a single substring), then split the cleaned host by '.' and locate the region label immediately before the "rds","amazonaws","com" or "rds","amazonaws","com","cn" sequence as the loop already does; add unit tests for host:port (example.region.rds.amazonaws.com:3306), China endpoints (example.region.rds.amazonaws.com.cn), and proxy/cluster shapes to verify the region extraction.

coderabbitai · 2025-08-13T18:14:34Z

+	var db *sqlx.DB
+	if useIAMAuth {
+		connector := &awsIAMAuthConnector{
+			driverName: driverName,
+			baseDSN:    generateMysqlConnectionString(*conf),
+			tokenGen:   iamTokenGen,
+			logger:     opts.Logger,
+		}
+		db = sqlx.NewDb(sql.OpenDB(connector), driverName)
+	} else {


💡 Verification agent

🧩 Analysis chain

IAM connector likely bypasses sqlmw/APM/OTEL driver instrumentation

Switching to sqlx.NewDb(sql.OpenDB(connector), driverName) means the registered wrapper drivers (mysql-mw for sqlmw, apm/mysql, or the OpenTelemetry driver) may not wrap connections created by awsIAMAuthConnector. Passing driverName to sqlx.NewDb only affects sqlx’s dialect selection, not runtime instrumentation.

Risk:

Loss of DB query traces/metrics/logging on the IAM path.

sqlmw interceptors won’t run.

Paths to fix:

Ensure awsIAMAuthConnector internally builds a driver.Connector using the wrapped driver (e.g., the otel/apm/sqlmw registered one), not the bare mysql driver.

If you use otelsql, prefer using its connector-aware APIs (e.g., OpenDB/NewConnector) to wrap your connector.

Alternatively, refactor awsIAMAuthConnector to delegate to the wrapped driver name via OpenConnector on the registered driver.

I can help adapt awsIAMAuthConnector once I see its implementation.

Run this quick scan to assess whether wrapped drivers are still used on the IAM path:

🏁 Script executed:

#!/bin/bash # Look for otel/apm/sqlmw registration and usage patterns rg -n "otelsql|apm/mysql|sqlmw\.Driver|Register\(" -A 2 # Inspect the IAM connector implementation fd aws_iam_auth\.go | xargs -I {} sh -c 'echo "--- {} ---"; sed -n "1,200p" "{}"'

Length of output: 18296

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "----- server/datastore/mysql/mysql.go -----" sed -n '1,420p' server/datastore/mysql/mysql.go || true echo echo "----- server/datastore/mysql/common_mysql/common.go -----" sed -n '1,240p' server/datastore/mysql/common_mysql/common.go || true

Length of output: 20131

🌐 Web query:

XSAM/otelsql NewConnector otelsql.NewConnector otelsql NewConnector driver.Connector otelsql OpenDB NewConnector ngrok sqlmw connector OpenConnector NewConnector sqlmw NewConnector github apm/mysql connector OpenConnector database/sql sql.OpenDB driver.Connector behavior sqlx.NewDb driverName effect instrumentation

💡 Result:

otelsql.NewConnector (from XSAM/otelsql) creates a new OpenTelemetry-instrumented database connector that implements the Go driver.Connector interface, allowing its use with otelsql.OpenDB just as standard connectors are used with database/sql.OpenDB[3][9][1]. This enables tracing and observability for any Go SQL driver that is instrumented using otelsql. The patterns for connector support, wrapping, and general driver.Connector usage are as follows:

Instrumentation Flow for otelsql and Connectors

otelsql.NewConnector returns a driver.Connector-compatible object that wraps an underlying DSN and driver with OpenTelemetry instrumentation. This is similar to how standard Go SQL driver connectors behave, but traces DB operations[3][9].

otelsql.OpenDB(connector) returns an instrumented *sql.DB object using the provided OpenTelemetry-wrapped connector[9][3].

otelsql.Open(driverName, dsn, ...opts) is a convenience for direct OpenTelemetry-instrumented DB access. This mirrors the behavior of sql.Open but with otelsql tracing, and does not require manual connector creation[7][9].

How driver.Connector, sql.OpenDB, and Wrappers Interact

driver.Connector: The Go interface for advanced connection pooling and configuration reuse. It allows DSN parsing only once, and enables (but does not require) context support for dialing[1].

sql.OpenDB(connector): This Go function establishes an *sql.DB instance from a driver.Connector, rather than from a driver name and DSN string[1].

All instrumented database wrappers (e.g., otelsql, sqlmw, APM/otel, etc.) that wish to fully support OpenTelemetry, interception, or other middleware need to implement, or wrap, this interface.

Middleware Example: ngrok/sqlmw

ngrok/sqlmw provides middleware/wrapping for Go SQL drivers, but does not expose a dedicated NewConnector helper. Instead, you wrap the base driver and register it under a new name (e.g. "postgres-mw") with sql.Register. Instrumentation/interception is handled at the driver level, not via a connector creation helper[4][6].

With sqlmw, you use sql.Open (not OpenDB(connector)) because the driver is already wrapped and registered under a custom name[4].

APM/Mysql Connector Pattern

Many APM SQL wrappers (e.g., OpenTelemetry, Elastic APM) also provide either a custom OpenConnector implementation or expose their own connector-creation helper. Usage pattern:

Create a connector via e.g. apmmysql.NewConnector(...)

Pass this connector to standard sql.OpenDB(connector) for maximum composition and flexibility.

Key behaviors and recommendations

Approach Connector Support Usage OpenTelemetry Instrumented? Notes

otelsql.OpenDB Yes, via Connector db := otelsql.OpenDB(connector) Yes Connector is OTel-instrumented and composable.

otelsql.Open Yes, via DSN db := otelsql.Open(driver, dsn, ...) Yes Easiest, but less composable than OpenDB.

sqlmw.Driver No Connector API Register wrapped driver; sql.Open Interceptor/middleware Used for function-level interception.

apmmysql.OpenDB/Connector Yes db := sql.OpenDB(connector) Yes For monitoring/metrics, follows same pattern.

database/sql Yes db := sql.OpenDB(connector) No Standard library without telemetry.

sqlx.NewDb Yes, but pass driver name and *sql.DB sqlx.NewDb(db, driverName) Follows underlying *sql.DB Use after wrapping or instrumenting.

Special Notes on sqlx.NewDb

sqlx.NewDb(db *sql.DB, driverName string) does not itself perform instrumentation—instead, it adapts an already-opened, possibly instrumented, *sql.DB for use with sqlx extensions.

The driverName is only for sqlx's internal metadata/tracing; the actual DB behavior (instrumented or not) depends on how you created the *sql.DB (i.e., via otelsql.OpenDB, otelsql.Open, sqlmw, etc.)[1][9].

Summary: To instrument SQL drivers with tracing and metrics using otelsql or similar libraries, use their connector construction (e.g., otelsql.NewConnector) or driver wrapping helper, create a connector or registered driver, and use either OpenDB(connector) or Open(driver, dsn) as appropriate. Avoid using raw driver connections if you want consistent instrumentation and metrics across all operations[3][9][7][4].

Citations:

1: https://golang.bg/pkg/database/sql/driver/

2: https://www.juniper.net/us/en/threatlabs/ips-signatures/detail.HTTP:SQL:INJ:DLNK-WIFI-MNGR-CMW.html

3: https://pkg.go.dev/github.com/XSAM/otelsql

4: https://github.com/ngrok/sqlmw

5: https://opentelemetry.io/docs/collector/building/connector/

6: https://groups.google.com/g/golang-nuts/c/Vkvf-Rrl0aQ

7: https://opentelemetry.io/blog/2024/getting-started-with-otelsql/

8: https://seatunnel.apache.org/docs/connector-v2/sink/Jdbc/

9: https://pkg.go.dev/github.com/uptrace/opentelemetry-go-extra/otelsql

10: https://github.com/denisenkom/go-mssqldb/blob/master/newconnector_example_test.go

Fix IAM connector: it currently bypasses otel/apm/sqlmw instrumentation

I verified the code paths. The IAM connector builds and returns the raw mysql connector/driver, so registered wrappers (otelsql/apm/mysql/sqlmw) are skipped on the IAM path — traces and sqlmw interceptors will be lost.

Files to fix

server/datastore/mysql/common_mysql/common.go — NewDB: when useIAMAuth is true you construct awsIAMAuthConnector and call sqlx.NewDb(sql.OpenDB(connector), driverName).

server/datastore/mysql/common_mysql/aws_iam_auth.go — awsIAMAuthConnector.Connect uses mysql.NewConnector(cfg) and Driver() returns mysql.MySQLDriver{}.

Why this matters (very short)

sql.OpenDB(connector) uses the connector.Driver() / connector.Connect() implementations you provide. awsIAMAuthConnector currently constructs a connector using the raw mysql driver, so instrumentation registered under driverName (otelsql/apm/mysql/mysql-mw) is not used.

Suggested fixes (pick one)

Preferred: build the base mysql connector (mysql.NewConnector(cfg)) and wrap that connector with the instrumentation's connector API before returning Connect. For example (pseudo):

base := mysql.NewConnector(cfg)

if driverName == otelTracedDriverName { conn := otelsql.NewConnector(base); return conn.Connect(ctx) }

else if driverName == "apm/mysql" { conn := apmmysql.NewConnector(base); ... }

fallback: return base.Connect(ctx)
See otelsql docs: https://pkg.go.dev/github.com/XSAM/otelsql and database/sql OpenDB: https://pkg.go.dev/database/sql#OpenDB

Alternative: when creating the awsIAMAuthConnector in NewDB, pass an actual instrumented driver/connector (or a factory) instead of just driverName, and delegate Connect()/Driver() to that injected connector so the wrapper is preserved.

If using sqlmw (which doesn't provide a connector helper), ensure the wrapped driver is registered and create a connector that delegates to that wrapped driver (or move token injection into a Dialer that the wrapped driver will use).

Minimal pseudo-diff (problem area)
Original (current):

connector := &awsIAMAuthConnector{ driverName: driverName, baseDSN: generateMysqlConnectionString(*conf), ... } db = sqlx.NewDb(sql.OpenDB(connector), driverName)

Suggested approach (pseudo):

cfg := mysql.ParseDSN(...) base := mysql.NewConnector(cfg) var conn driver.Connector = base if driverName == otelTracedDriverName { conn = otelsql.NewConnector(base) // wrap the base connector } db = sqlx.NewDb(sql.OpenDB(conn), driverName)

I can prepare a concrete patch for otelsql / apm / sqlmw cases if you want — the fix is small but needs to use the wrapper's connector APIs or inject the wrapped connector at creation time.

Interesting... how does sqlx.Open() handle this? Surely it's not checking the driver name against all possible instrumentation wrappers.

🧩 Analysis chain

@sgress454 Great question! You're absolutely right - sqlx.Open() doesn't check against all possible instrumentation wrappers. Here's how it actually works:

How sqlx.Open() handles instrumentation

sqlx.Open(driverName, dsn) internally calls sql.Open(driverName, dsn), which:

Looks up the driver by name in Go's global driver registry (sql.drivers)

Uses whatever driver was registered under that name

The instrumentation libraries register their wrapped drivers at startup:

// sqlmw approach sql.Register("mysql-mw", sqlmw.Driver(mysql.MySQLDriver{}, interceptor)) // otel approach sql.Register("mysql-otel", otelsql.Wrap(mysql.MySQLDriver{}))

So when you call sqlx.Open("mysql-mw", dsn), it automatically gets the wrapped driver.

The problem with sql.OpenDB(connector)

sql.OpenDB(connector) bypasses the driver registry entirely. It directly uses:

connector.Driver() to get the driver instance

connector.Connect(ctx) to create connections

The current IAM connector returns the raw MySQL driver:

func (c *awsIAMAuthConnector) Driver() driver.Driver { return mysql.MySQLDriver{} // Raw driver, no instrumentation! }

The solution is to make the IAM connector delegate to the already-registered wrapped driver, or wrap the base connector with the instrumentation's connector API before using sql.OpenDB().

🏁 Script executed:

#!/bin/bash # Let's look at how the instrumentation is set up in NewDB rg -n -A 10 -B 5 "Register.*sqlmw|apm/mysql|otelDriverName" server/datastore/mysql/common_mysql/common.go

Length of output: 1201

Perfect! Now I can see the exact issue. Looking at the code:

The Problem

Non-IAM path (works correctly):

// Line 70: sqlmw registers wrapped driver sql.Register(driverName, sqlmw.Driver(mysql.MySQLDriver{}, opts.Interceptor)) // Lines 86-91: Uses the registered wrapped driver db, err = sqlx.Open(driverName, dsn) // ✅ Gets instrumented driver from registry

IAM path (bypasses instrumentation):

// Lines 77-85: Connector bypasses registry entirely connector := &awsIAMAuthConnector{ driverName: driverName, // Just stored, not used for driver lookup // ... } db = sqlx.NewDb(sql.OpenDB(connector), driverName) // ❌ Uses connector.Driver()

The Fix

The IAM connector should delegate to the registered driver instead of returning the raw MySQL driver. Here's the approach:

The solution is to modify the IAM connector to look up and use the registered instrumented driver, just like sqlx.Open() does internally.

🏁 Script executed:

#!/bin/bash # Let's look at the current IAM connector implementation to see how it gets the driver rg -n -A 20 "func.*Driver|Connect\(" server/datastore/mysql/common_mysql/aws_iam_auth.go

Length of output: 1170

Perfect! Now I can see exactly how to fix this. The issue is clear:

Current IAM Connector (Problem)

// Line 134: Returns raw MySQL driver, bypassing instrumentation func (c *awsIAMAuthConnector) Driver() driver.Driver { return mysql.MySQLDriver{} // ❌ Raw driver, no instrumentation } // Line 124-129: Creates raw MySQL connector connector, err := mysql.NewConnector(cfg) // ❌ Raw connector return connector.Connect(ctx)

The Fix

The IAM connector should use the registered instrumented driver instead of the raw MySQL driver. Here's the solution:

Option 1: Pass the registered driver to the IAM connector

// In NewDB(), get the registered driver first var registeredDriver driver.Driver if opts.Interceptor != nil { registeredDriver = sqlmw.Driver(mysql.MySQLDriver{}, opts.Interceptor) } else { // For otel/apm, we'd need to look them up from the registry // This is more complex since Go doesn't expose sql.Drivers() } if useIAMAuth { connector := &awsIAMAuthConnector{ driverName: driverName, baseDSN: generateMysqlConnectionString(*conf), tokenGen: iamTokenGen, logger: opts.Logger, instrumentedDriver: registeredDriver, // Pass the wrapped driver } db = sqlx.NewDb(sql.OpenDB(connector), driverName) }

Option 2 (Simpler): Let the IAM connector create instrumented connectors

Modify the IAM connector to handle instrumentation internally:

func (c *awsIAMAuthConnector) Connect(ctx context.Context) (driver.Conn, error) { token, err := c.tokenGen.getAuthToken(ctx) if err != nil { return nil, fmt.Errorf("failed to generate IAM auth token: %w", err) } cfg, err := mysql.ParseDSN(c.baseDSN) if err != nil { return nil, fmt.Errorf("failed to parse DSN: %w", err) } cfg.Passwd = token // Create base connector baseConnector, err := mysql.NewConnector(cfg) if err != nil { return nil, fmt.Errorf("failed to create connector: %w", err) } // Wrap with instrumentation based on driverName var finalConnector driver.Connector = baseConnector switch c.driverName { case "mysql-mw": // For sqlmw, we need the interceptor - would need to pass it in // This is tricky since sqlmw doesn't provide a connector wrapper default: // Handle otel/apm cases if strings.Contains(c.driverName, "otel") { // finalConnector = otelsql.NewConnector(baseConnector) } } return finalConnector.Connect(ctx) }

The cleanest approach is Option 1 where we pass the already-registered instrumented driver to the IAM connector and have it delegate appropriately. This maintains the same instrumentation path for both IAM and non-IAM connections.

Would you like me to create a concrete patch showing exactly how to implement this?

🧬 Code Graph Analysis Results

server/datastore/mysql/common_mysql/common.go:61-74

if opts.TracingConfig != nil && opts.TracingConfig.TracingEnabled { if opts.TracingConfig.TracingType == "opentelemetry" { driverName = otelDriverName } else { driverName = "apm/mysql" } } if opts.Interceptor != nil { driverName = "mysql-mw" sql.Register(driverName, sqlmw.Driver(mysql.MySQLDriver{}, opts.Interceptor)) }

server/datastore/mysql/common_mysql/common.go:77-85

if useIAMAuth { connector := &awsIAMAuthConnector{ driverName: driverName, baseDSN: generateMysqlConnectionString(*conf), tokenGen: iamTokenGen, logger: opts.Logger, } db = sqlx.NewDb(sql.OpenDB(connector), driverName) } else {

server/datastore/mysql/common_mysql/common.go:86-91

dsn := generateMysqlConnectionString(*conf) db, err = sqlx.Open(driverName, dsn) if err != nil { return nil, err } }

server/config/config.go:37-55

MysqlConfig struct { Protocol string `yaml:"protocol"` Address string `yaml:"address"` Username string `yaml:"username"` Password string `yaml:"password"` PasswordPath string `yaml:"password_path"` Database string `yaml:"database"` TLSCert string `yaml:"tls_cert"` TLSKey string `yaml:"tls_key"` TLSCA string `yaml:"tls_ca"` TLSServerName string `yaml:"tls_server_name"` TLSConfig string `yaml:"tls_config"` // tls=customValue in DSN MaxOpenConns int `yaml:"max_open_conns"` MaxIdleConns int `yaml:"max_idle_conns"` ConnMaxLifetime int `yaml:"conn_max_lifetime"` SQLMode string `yaml:"sql_mode"` StsAssumeRoleArn string `yaml:"sts_assume_role_arn"` StsExternalID string `yaml:"sts_external_id"` }

server/config/config.go:252-260

LoggingConfig struct { Debug bool JSON bool DisableBanner bool `yaml:"disable_banner"` ErrorRetentionPeriod time.Duration `yaml:"error_retention_period"` TracingEnabled bool `yaml:"tracing_enabled"` // TracingType can either be opentelemetry or elasticapm for whichever type of tracing wanted TracingType string `yaml:"tracing_type"` }

🧠 Learnings used

Learnt from: sgress454 PR: fleetdm/fleet#31075 File: server/datastore/mysql/common_mysql/aws_iam_auth.go:111-130 Timestamp: 2025-08-20T21:24:59.223Z Learning: In Fleet's MySQL IAM authentication implementation, the TLS and AllowCleartextPasswords configuration is handled in the generateMysqlConnectionString function in server/datastore/mysql/common_mysql/common.go, not in the awsIAMAuthConnector.Connect method. The connector receives a base DSN that already includes the necessary IAM authentication parameters.

Learnt from: sgress454 PR: fleetdm/fleet#31075 File: server/datastore/mysql/common_mysql/aws_iam_auth.go:111-130 Timestamp: 2025-08-20T21:24:59.224Z Learning: In Fleet's MySQL IAM authentication implementation, the TLS and AllowCleartextPasswords configuration is handled in the generateMysqlConnectionString function in server/datastore/mysql/common_mysql/common.go. When no password is configured and the endpoint is an RDS endpoint, the function automatically sets allowCleartextPasswords=true and configures appropriate TLS settings. The awsIAMAuthConnector receives a properly configured base DSN and only needs to inject the IAM token as the password.

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

titanous · 2025-08-13T22:00:59Z

+
+Amazon Resource Name (ARN) of the AWS Security Token Service (STS) role to use for Redis authentication. 
+
+If set, Fleet uses AWS Identity and Access Management (IAM) authentication instead of basic authentication set by `redis_username` and `redis_password` (optional).


@noahtalerman this is subtly wrong, as mentioned in the other thread this config option can't be the flag we use because there are scenarios where there is no role to assume (the default is environmental credentials (eg. from the EC2 metadata service), assuming a role (this flag) uses the role/user those credentials are associated with to call sts:AssumeRole, and then uses that second set of chained credentials)

sgress454 · 2025-08-22T15:28:37Z

@titanous thanks for the awesome work on this. In the interest of moving it forward, we're going to merge this to a feature branch on the Fleet repo, and continue from there. That'll get it prioritized to be put into the next release. It'll still have all your commits in there and we'll be sure to give you a shoutout at the demo!

titanous · 2025-08-22T15:30:13Z

@sgress454 Sounds great, and feel free to tag me when you make a draft and/or final PR if you're interested in feedback/review.

titanous · 2025-08-22T15:31:44Z

Also FWIW, we've been running this patch since I submitted it and everything has been working with no issues.

sgress454 · 2025-08-22T15:45:40Z

Ok -- moved to https://github.com/fleetdm/fleet/tree/feat-1817-add-iam-auth-to-mysql-and-redis.

Also FWIW, we've been running this patch since I submitted it and everything has been working with no issues.

That's great to hear!

feel free to tag me when you make a draft and/or final PR if you're interested in feedback/review.

Will do. I don't foresee many functional changes.

@titanous

for #1817 # Details This PR gives Fleet servers the ability to connect to RDS MySQL and Elasticache Redis via AWS [Identity and Access Management (IAM)](https://aws.amazon.com/iam/). It is based almost entirely on the work of @titanous, branched from his [original pull request](#31075). The main differences between his branch and this are: 1. Removal of auto-detection of AWS region (and cache name for Elasticache) in favor of specifying these values in configuration. The auto-detection is admittedly handy but parsing AWS host URLs is not considered a best practice. 2. Relying on the existence of these new configs to determine whether or not to connect via IAM. This sidesteps a thorny issue of whether to try an IAM-based Elasticache connection when a password is not supplied, since this is technically a valid setup. # Checklist for submitter If some of the following don't apply, delete the relevant line. - [X] Changes file added for user-visible changes in `changes/`, `orbit/changes/` or `ee/fleetd-chrome/changes`. See [Changes files](https://github.com/fleetdm/fleet/blob/main/docs/Contributing/guides/committing-changes.md#changes-files) for more information. ## Testing - [X] Added/updated automated tests - [X] QA'd all new/changed functionality manually - besides using @titanous's excellent test tool, I verified the following end-to-end: - [X] regular (non RDS) MySQL connection - [X] RDS MySQL connection using username/password - [X] RDS MySQL connection using IAM (no role) - [X] RDS MySQL connection using IAM (assuming role) - [X] regular (non Elasticache) Redis connection - [X] Elasticache Redis connection using username/password - [X] Elasticache Redis connection using NO password (without IAM) - [X] Elasticache Redis connection using IAM (no role) - [X] Elasticache Redis connection using IAM (assuming role) --------- Co-authored-by: Jonathan Rudenberg <jonathan@titanous.com> Co-authored-by: Noah Talerman <47070608+noahtalerman@users.noreply.github.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

Adds support for reading server `private_key` from AWS Secrets Manager. Combined with #31075, this should allow removing all common sensitive secrets from the environment/config (if I missed any let me know). This works with localstack for local development (set `AWS_ENDPOINT_URL=$LOCALSTACK_URL`, `AWS_ACCESS_KEY_ID=test`, and `AWS_SECRET_ACCESS_KEY=test`). I did not include config options for `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` because they are a bad practice vs role credentials and defeat the purpose of this feature which is to remove secrets from the environment/config. # Checklist for submitter If some of the following don't apply, delete the relevant line.  - [x] Changes file added for user-visible changes in `changes/`, `orbit/changes/` or `ee/fleetd-chrome/changes`. See [Changes files](https://github.com/fleetdm/fleet/blob/main/docs/Contributing/guides/committing-changes.md#changes-files) for more information. - [x] Added/updated automated tests - [x] Manual QA for all new/changed functionality --------- Co-authored-by: Scott Gress <scott@fleetdm.com>

titanous requested review from a team and rachaelshaw as code owners July 19, 2025 22:26

titanous temporarily deployed to Docker Hub July 19, 2025 22:26 — with GitHub Actions Inactive

titanous force-pushed the aws-iam-auth branch from 815be16 to 9b16b2e Compare July 19, 2025 22:44

titanous temporarily deployed to Docker Hub July 19, 2025 22:44 — with GitHub Actions Inactive

titanous force-pushed the aws-iam-auth branch from 9b16b2e to 40c34bc Compare July 21, 2025 16:46

titanous temporarily deployed to Docker Hub July 21, 2025 16:46 — with GitHub Actions Inactive

lucasmrod reviewed Jul 21, 2025

View reviewed changes

Comment thread go.mod Outdated

titanous force-pushed the aws-iam-auth branch from 40c34bc to 050b6d5 Compare July 22, 2025 00:48

titanous temporarily deployed to Docker Hub July 22, 2025 00:48 — with GitHub Actions Inactive

titanous force-pushed the aws-iam-auth branch from 050b6d5 to 69e6174 Compare July 22, 2025 13:44

titanous temporarily deployed to Docker Hub July 22, 2025 13:44 — with GitHub Actions Inactive

titanous mentioned this pull request Jul 22, 2025

Add support for reading private_key from AWS Secrets Manager #31134

Merged

3 tasks

sgress454 assigned sharon-fdm Jul 22, 2025

titanous added 3 commits July 24, 2025 18:26

Add generic caching IAM token generator for RDS/ElastiCache

0df8b03

Add support for AWS ElastiCache (Redis) IAM auth

0ef576d

Add support for AWS RDS MySQL IAM auth

d31f4e1

titanous force-pushed the aws-iam-auth branch from 69e6174 to d31f4e1 Compare July 24, 2025 22:26

titanous temporarily deployed to Docker Hub July 24, 2025 22:26 — with GitHub Actions Inactive

noahtalerman marked this pull request as draft July 28, 2025 16:49

noahtalerman mentioned this pull request Jul 28, 2025

Read Fleet server_private_key from AWS Secrets Manager #31321

Closed

33 tasks

noahtalerman mentioned this pull request Jul 31, 2025

Product group handbook: speed up community PRs #31467

Merged

noahtalerman mentioned this pull request Aug 12, 2025

IAM authentication for MySQL and Redis #1817

Closed

30 tasks

noahtalerman added 2 commits August 12, 2025 10:18

Update fleet-server-configuration.md

46e05a1

Update fleet-server-configuration.md

3ac8ab7

noahtalerman reviewed Aug 12, 2025

View reviewed changes

Comment thread docs/Configuration/fleet-server-configuration.md Outdated

noahtalerman added 2 commits August 12, 2025 10:22

Update docs/Configuration/fleet-server-configuration.md

e860a43

Update fleet-server-configuration.md

6e8b900

noahtalerman reviewed Aug 12, 2025

View reviewed changes

coderabbitai bot reviewed Aug 13, 2025

View reviewed changes

titanous and others added 4 commits August 13, 2025 14:19

Update tools/redis-tests/elasticache/cleanup-test-env.sh

523fd5d

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

Update tools/mysql-tests/rds/deploy-test-env.sh

508bda2

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

Update server/datastore/redis/gen_aws_region_map.go

f35e74c

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

Update fleet-server-configuration.md

0c92378

noahtalerman reviewed Aug 13, 2025

View reviewed changes

Comment thread docs/Configuration/fleet-server-configuration.md Outdated

Update docs/Configuration/fleet-server-configuration.md

bf18b15

titanous commented Aug 13, 2025

View reviewed changes

sgress454 closed this Aug 22, 2025

sgress454 mentioned this pull request Sep 1, 2025

Feat 1817 add iam auth to mysql and redis #32488

Merged

12 tasks


		For the address of the Redis server that Fleet should connect to, include the hostname and port.

		If an AWS ElastiCache endpoint is specified and `redis_password` isn't specified, Identity and Access Management (IAM) authentication is automatically used.

Approach	Connector Support	Usage	OpenTelemetry Instrumented?	Notes
otelsql.OpenDB	Yes, via Connector	`db := otelsql.OpenDB(connector)`	Yes	Connector is OTel-instrumented and composable.
otelsql.Open	Yes, via DSN	`db := otelsql.Open(driver, dsn, ...)`	Yes	Easiest, but less composable than OpenDB.
sqlmw.Driver	No Connector API	Register wrapped driver; `sql.Open`	Interceptor/middleware	Used for function-level interception.
apmmysql.OpenDB/Connector	Yes	`db := sql.OpenDB(connector)`	Yes	For monitoring/metrics, follows same pattern.
database/sql	Yes	`db := sql.OpenDB(connector)`	No	Standard library without telemetry.
sqlx.NewDb	Yes, but pass driver name and *sql.DB	`sqlx.NewDb(db, driverName)`	Follows underlying *sql.DB	Use after wrapping or instrumenting.


		Amazon Resource Name (ARN) of the AWS Security Token Service (STS) role to use for Redis authentication.

		If set, Fleet uses AWS Identity and Access Management (IAM) authentication instead of basic authentication set by `redis_username` and `redis_password` (optional).

Conversation

titanous commented Jul 19, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Checklist for submitter

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jul 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Assessment against linked issues

Assessment against linked issues: Out-of-scope changes

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Status, Documentation and Community

Uh oh!

codecov bot commented Jul 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

noahtalerman commented Jul 28, 2025

Uh oh!

titanous commented Jul 28, 2025

Uh oh!

Uh oh!

noahtalerman Aug 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rfairburn Aug 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

noahtalerman Aug 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot commented Aug 13, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Aug 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Instrumentation Flow for otelsql and Connectors

How driver.Connector, sql.OpenDB, and Wrappers Interact

Middleware Example: ngrok/sqlmw

APM/Mysql Connector Pattern

Key behaviors and recommendations

Special Notes on sqlx.NewDb

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 20, 2025

Choose a reason for hiding this comment

How sqlx.Open() handles instrumentation

titanous commented Jul 19, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jul 19, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)

codecov bot commented Jul 19, 2025 •

edited

Loading

noahtalerman Aug 12, 2025 •

edited

Loading

rfairburn Aug 13, 2025 •

edited

Loading

noahtalerman Aug 13, 2025 •

edited

Loading

coderabbitai bot Aug 13, 2025 •

edited

Loading

Instrumentation Flow for `otelsql` and Connectors

How `driver.Connector`, `sql.OpenDB`, and Wrappers Interact