Skip to content

Add AWS IAM auth for RDS MySQL/MariaDB and ElastiCache Redis/Valkey#31075

Closed
titanous wants to merge 12 commits intofleetdm:mainfrom
titanous:aws-iam-auth
Closed

Add AWS IAM auth for RDS MySQL/MariaDB and ElastiCache Redis/Valkey#31075
titanous wants to merge 12 commits intofleetdm:mainfrom
titanous:aws-iam-auth

Conversation

@titanous
Copy link
Copy Markdown
Contributor

@titanous titanous commented Jul 19, 2025

This adds automatic support for AWS IAM auth for RDS MySQL/MariaDB and ElastiCache Redis/Valkey. If no password is configured with an RDS/ElastiCache endpoint, IAM auth is automatically used. An IAM role to assume is optionally configurable.

CloudFormation stacks and test binaries are provided to manually test common scenarios (including RDS Proxy and assuming roles).

Closes #1817
Closes #20527

Checklist for submitter

If some of the following don't apply, delete the relevant line.

  • Changes file added for user-visible changes in changes/, orbit/changes/ or ee/fleetd-chrome/changes.
    See Changes files for more information.
  • Added/updated automated tests
  • Manual QA for all new/changed functionality

Summary by CodeRabbit

  • New Features
    • Added IAM authentication support for AWS ElastiCache (Redis), including optional STS AssumeRole and External ID.
    • Added IAM authentication support for AWS RDS (MySQL/MariaDB), with optional STS AssumeRole and External ID.
  • Documentation
    • Updated configuration docs with per-service STS role/external ID options (MySQL, Redis, Firehose, Kinesis, Lambda, SES, S3, Packaging) and IAM auth behavior notes.
  • Tests
    • Introduced test tools and CloudFormation templates for validating RDS and ElastiCache IAM authentication (deploy/cleanup scripts and CLI testers).

@titanous titanous requested review from a team and rachaelshaw as code owners July 19, 2025 22:26
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Jul 19, 2025

Important

Review skipped

Auto reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

📝 Walkthrough

Walkthrough

Adds IAM-based authentication for MySQL (RDS) and Redis (ElastiCache). Introduces optional STS assume role and external ID config for both. Implements token generation, caching, and connection flows, updates configs and docs, adds tests for endpoint parsing, updates dependencies, and provides test tooling with CloudFormation templates and CLI utilities.

Changes

Cohort / File(s) Summary
AWS IAM common utilities
server/aws_common/iam_auth.go
Adds IAM token manager with caching and jitter; adds AWS config loader with optional STS assume role and external ID.
Config schema and wiring
server/config/config.go, cmd/fleet/serve.go, docs/Configuration/fleet-server-configuration.md
Adds sts_assume_role_arn and sts_external_id to MySQL/Redis config; wires Redis pool fields; documents new options and IAM auto-auth conditions.
MySQL IAM auth (RDS)
server/datastore/mysql/common_mysql/common.go, server/datastore/mysql/common_mysql/aws_iam_auth.go, server/datastore/mysql/common_mysql/aws_iam_auth_test.go
Detects RDS endpoints without passwords to use IAM; builds auth tokens; integrates connector-based flow; tests region extraction from endpoints.
Redis IAM auth (ElastiCache)
server/datastore/redis/redis.go, server/datastore/redis/aws_iam_auth.go, server/datastore/redis/aws_region_map_generated.go, server/datastore/redis/gen_aws_region_map.go, server/datastore/redis/redis_test.go
Detects ElastiCache endpoints without passwords to use IAM; generates presigned tokens; forces TLS; adds region map generator and tests for endpoint parsing.
Dependencies
go.mod
Adds/upgrades AWS RDS auth and MySQL driver deps; adds rdsmysql/v2; adjusts go-tpm direct dependency.
Changelog entries
changes/aws-elasticache-iam-auth, changes/aws-rds-iam-auth
Notes IAM auth support and new STS assume role option for Redis and MySQL.
MySQL IAM test tooling
tools/mysql-tests/rds/*
Adds CFN template, deploy/cleanup scripts, and CLI tool to validate RDS IAM auth.
Redis IAM test tooling
tools/redis-tests/elasticache/*
Adds CFN template, deploy/cleanup scripts, and CLI tool to validate ElastiCache IAM auth.

Sequence Diagram(s)

sequenceDiagram
  participant Fleet
  participant Config
  participant AWS as AWS SDK
  participant STS as STS
  participant RDS as RDS MySQL

  Fleet->>Config: Load MySQL config (addr, user, sts_assume_role_arn, external_id)
  alt RDS endpoint and no password
    Fleet->>AWS: Load config (region)
    opt sts_assume_role_arn provided
      AWS->>STS: AssumeRole(external_id?)
      STS-->>AWS: Temp credentials
    end
    Fleet->>AWS: Build RDS IAM auth token
    Fleet->>RDS: Connect using token (TLS)
  else otherwise
    Fleet->>RDS: Connect using standard DSN
  end
Loading
sequenceDiagram
  participant Fleet
  participant Config
  participant AWS as AWS SDK
  participant STS as STS
  participant EC as ElastiCache Redis

  Fleet->>Config: Load Redis config (addr, username, sts_assume_role_arn, external_id)
  alt ElastiCache endpoint and no password
    Fleet->>AWS: Load config (region)
    opt sts_assume_role_arn provided
      AWS->>STS: AssumeRole(external_id?)
      STS-->>AWS: Temp credentials
    end
    Fleet->>AWS: Presign elasticache connect URL (token)
    Fleet->>EC: TLS connect using IAM token as password
  else otherwise
    Fleet->>EC: Connect with configured password or no auth
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Assessment against linked issues

Objective Addressed Explanation
Implement IAM auth for MySQL when RDS endpoint and no password (#1817)
Support sts_assume_role_arn and sts_external_id for MySQL (#1817)
Implement IAM auth for Redis when ElastiCache endpoint and no password (#1817, #20527)
Support sts_assume_role_arn and sts_external_id for Redis (#1817, #20527)

Assessment against linked issues: Out-of-scope changes

Code Change Explanation
Add STS config options for Firehose, Kinesis, Lambda, SES, S3, packaging in docs (docs/Configuration/fleet-server-configuration.md; entire file) These services are not part of the stated objectives focused on MySQL and Redis IAM authentication.
Add ElastiCache and RDS CloudFormation templates and deploy/cleanup scripts (tools/redis-tests/elasticache/cf-template.yaml, deploy-test-env.sh, cleanup-test-env.sh; tools/mysql-tests/rds/cf-template.yaml, deploy-test-env.sh, cleanup-test-env.sh; entire files) Test environment provisioning is not specified in the objectives; it’s auxiliary tooling outside the core product changes.
Add Redis and MySQL IAM test CLI utilities (tools/redis-tests/elasticache/iam_auth.go, tools/mysql-tests/rds/iam_auth.go; entire files) Standalone test tools are not part of the product requirements; they are supportive but not required by the issue.
✨ Finishing Touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@codecov
Copy link
Copy Markdown

codecov bot commented Jul 19, 2025

Codecov Report

Attention: Patch coverage is 35.27697% with 222 lines in your changes missing coverage. Please review.

Project coverage is 63.93%. Comparing base (5363ce1) to head (d31f4e1).
Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
server/datastore/redis/aws_iam_auth.go 52.54% 53 Missing and 3 partials ⚠️
...erver/datastore/mysql/common_mysql/aws_iam_auth.go 23.61% 54 Missing and 1 partial ⚠️
server/aws_common/iam_auth.go 0.00% 48 Missing ⚠️
server/datastore/mysql/common_mysql/common.go 23.52% 32 Missing and 7 partials ⚠️
server/datastore/redis/redis.go 24.13% 19 Missing and 3 partials ⚠️
cmd/fleet/serve.go 0.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #31075      +/-   ##
==========================================
- Coverage   63.99%   63.93%   -0.06%     
==========================================
  Files        1911     1914       +3     
  Lines      188185   188508     +323     
  Branches     5594     5594              
==========================================
+ Hits       120428   120528     +100     
- Misses      58302    58511     +209     
- Partials     9455     9469      +14     
Flag Coverage Δ
backend 64.80% <35.27%> (-0.07%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Comment thread go.mod Outdated
@noahtalerman
Copy link
Copy Markdown
Member

Thanks @titanous! Just a heads up that we prioritized these changes and the changes in your separate PR (#31134).

We peeled off a user story for these changes. This story is now going through drafting. All changes to Fleet are brought through drafting as user stories.

Once the story goes through drafting, we'll have more info you for you on proposed changes/tweaks (if any). Thanks for your patience.

In the meantime, setting this PR to draft. You can follow along on drafting progress in the user story: #31321

@noahtalerman noahtalerman marked this pull request as draft July 28, 2025 16:49
@titanous
Copy link
Copy Markdown
Contributor Author

@noahtalerman thanks, note that this PR (unlike the separate one) covers two issues (linked at the top) that have already been through at least some of your process.

Comment thread docs/Configuration/fleet-server-configuration.md Outdated

For the address of the Redis server that Fleet should connect to, include the hostname and port.

If an AWS ElastiCache endpoint is specified and `redis_password` isn't specified, Identity and Access Management (IAM) authentication is automatically used.
Copy link
Copy Markdown
Member

@noahtalerman noahtalerman Aug 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@titanous just checking, is this a breaking change for AWS users that don't set redis_password today?

Put differently, will this force these users to add a role in AWS or make another change when they upgrade Fleet?

Same question for MySQL: https://github.com/fleetdm/fleet/pull/31075/files#diff-66314062a452d1931a2ffbfa8b76a0f1e3e2e81c5290c451313194bbc59f902aR26

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, interesting. I think it may be possible that there are Elasticache clusters without a password enabled. I don't think it's possible to configure RDS with no password but I'm not 100% certain.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can mitigate this on the Redis side by checking if redis_username is populated with no redis_password, because that config shouldn't exist in the wild and IAM auth requires a username.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it may be possible that there are Elasticache clusters without a password enabled

There are! My understanding is that Fleet is doing this for managed-cloud offerings. @rfairburn @BCTBB please correct me if I'm wrong.

Assuming this is true, I don't think we want to force a password (breaking change).

Instead, I think we want to update this behavior: IAM is only used if redis_sts_assume_role_arn is specified.

Meaning, If an AWS ElastiCache endpoint is specified and redis_password isn't specified, Fleet functions as it does today. Identity and Access Management (IAM) authentication isn't automatically used.

What do y'all think?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't key off of redis_sts_assume_role_arn, because role assumption is not required (eg. instance/container/pod roles that are ambiently in the environment do not require additional role assumption). I think just adding a check to make sure there is a username specified should handle this case.

Copy link
Copy Markdown
Contributor

@rfairburn rfairburn Aug 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just be specific about it and have a "use IAM authentication" config var that will be set to true if it is intended to be used? Assumptions of intention can lead to confusion.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, makes sense. I was trying to make the UX nice with a happy path but not a big deal to just have a flag.

Copy link
Copy Markdown
Member

@noahtalerman noahtalerman Aug 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does it work today for all other IAM Fleet server config options? ( firehose, kinesis, lambda, etc.)

It's IAM (if IAM is set), and if IAM isn't set fall back to basic auth. Right? We don't have a use IAM auth. I think we want to be consistent for now. I updated the docs to clarify this.

For example:

Screenshot 2025-08-13 at 5 31 11 PM

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@noahtalerman currently those will use the environmental credentials (aka IAM auth) if nothing is configured

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Aug 13, 2025

✅ Actions performed

Full review triggered.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 16

🔭 Outside diff range comments (1)
server/aws_common/iam_auth.go (1)

83-101: Propagate ctx into ConfigureAssumeRoleProvider (fix required)

I verified server/aws_common/aws_common.go: ConfigureAssumeRoleProvider calls aws_config.LoadDefaultConfig(context.Background(), ...), so caller cancellation/deadlines are lost. Thread a context.Context through ConfigureAssumeRoleProvider and update all callers to pass ctx.

Files to change

  • server/aws_common/aws_common.go — change signature and use ctx when calling LoadDefaultConfig.
  • Update callers to pass ctx:
    • server/aws_common/iam_auth.go
    • server/mail/ses.go
    • server/logging/firehose.go
    • server/logging/lambda.go
    • server/logging/kinesis.go
    • server/datastore/s3/s3.go

Suggested minimal changes (illustrative)

  • aws_common.go: add ctx param and use it
-func ConfigureAssumeRoleProvider(
-    conf aws.Config,
-    opts []func(*aws_config.LoadOptions) error,
-    stsAssumeRoleARN,
-    stsExternalID string,
-) (aws.Config, error) {
+func ConfigureAssumeRoleProvider(
+    ctx context.Context,
+    conf aws.Config,
+    opts []func(*aws_config.LoadOptions) error,
+    stsAssumeRoleARN,
+    stsExternalID string,
+) (aws.Config, error) {
@@
-    conf, err := aws_config.LoadDefaultConfig(context.Background(), opts...)
+    conf, err := aws_config.LoadDefaultConfig(ctx, opts...)
  • Call sites (example in iam_auth.go)
-    cfg, err = ConfigureAssumeRoleProvider(cfg, opts, assumeRoleArn, stsExternalID)
+    cfg, err = ConfigureAssumeRoleProvider(ctx, cfg, opts, assumeRoleArn, stsExternalID)

Reasoning and reference

Would you like me to open a small follow-up issue/PR to thread ctx through these call sites?

🧹 Nitpick comments (23)
tools/mysql-tests/rds/README.md (3)

15-16: Clarify region environment variable for AWS CLI

AWS CLI honors AWS_DEFAULT_REGION in addition to AWS_REGION. To avoid confusion across shells and tooling, consider mentioning both.

Apply this diff to expand the example:

-# Set AWS credentials and region
-export AWS_PROFILE=your-profile  # or use AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY
-export AWS_REGION=us-east-2
+# Set AWS credentials and region
+export AWS_PROFILE=your-profile  # or use AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY
+# The CLI and SDK honor either of these. Set one or both for clarity.
+export AWS_REGION=us-east-2
+export AWS_DEFAULT_REGION=us-east-2

21-26: Add cost and quota disclaimer

A short note helps folks avoid surprises when provisioning multiple RDS instances and an EC2 host.

You could append:

 This will:
 - Create all RDS instances with IAM authentication enabled
 - Create an EC2 instance with necessary IAM permissions
 - Create database users for IAM authentication
 - Output connection information
+
+Note: This creates paid AWS resources. Be mindful of account quotas and costs. Clean up when finished.

27-33: Mention cleanup prerequisites (profile and region)

Make it explicit the same AWS_PROFILE/region must be set to delete the right stack.

A small addition keeps users safe:

 ### Clean Up
 
 ```bash
+# Ensure the same AWS_PROFILE and region used for deployment are set
 ./cleanup-test-env.sh

</blockquote></details>
<details>
<summary>server/config/config.go (1)</summary><blockquote>

`1051-1053`: **Edge case: external ID without an assume role ARN**

If users set sts_external_id without sts_assume_role_arn, the external ID is unused. Consider a gentle validation or log to inform users.

If you want, I can add a small validation step that logs a warning when sts_external_id is set but sts_assume_role_arn is empty.



Also applies to: 1086-1088

</blockquote></details>
<details>
<summary>tools/redis-tests/elasticache/README.md (4)</summary><blockquote>

`5-9`: **Clarify prerequisites (permissions, tooling, and cost notice).**

Small additions improve setup reliability and avoid surprises.

Apply this diff to expand the prerequisites:

```diff
 ## Prerequisites

-- AWS CLI configured with appropriate credentials (via `AWS_PROFILE` or `AWS_ACCESS_KEY_ID`/`AWS_SECRET_ACCESS_KEY`)
+- AWS CLI v2 installed and configured with appropriate credentials (via `AWS_PROFILE` or `AWS_ACCESS_KEY_ID`/`AWS_SECRET_ACCESS_KEY`)
+- An AWS region set (e.g., `AWS_REGION` or `AWS_DEFAULT_REGION`)
+- Sufficient IAM permissions to create and delete resources used by the stack (CloudFormation, EC2/VPC, ElastiCache, IAM, and CloudWatch Logs)
+- Note: Deploying this stack will incur AWS charges. Remember to run cleanup when you are done.

11-20: Add a quick “verify environment” step before deployment.

Helps catch misconfigured credentials or regions early.

Apply this diff to add a short preflight:

 ### 1. Deploy Test Environment

 ```bash
 # Set AWS credentials and region
 export AWS_PROFILE=your-profile  # or use AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY
 export AWS_REGION=us-east-2
+
+# Verify identity and region
+aws sts get-caller-identity
+echo "Region: ${AWS_REGION:-$AWS_DEFAULT_REGION}"
 
 # Deploy the test environment
 ./deploy-test-env.sh

---

`22-28`: **Document how to find and use the outputs.**

Readers will need endpoints/hostnames to run the test tools; point them to where deploy script surfaces these.

Apply this diff to clarify outputs:

```diff
 This creates:
 - VPC with subnets and security groups
 - EC2 instance for running tests
 - ElastiCache Serverless instance with IAM auth
 - ElastiCache Standalone replication group with IAM auth
 - IAM users and roles for authentication
+
+On success, the deploy script prints the Redis endpoints and instance details. It also writes the outputs to a local file for convenience (the script logs the path). Use those values with the test tools in this directory.

29-35: Set expectations for teardown timing and completeness.

Make it clear cleanup can take several minutes.

Apply this diff to expand cleanup notes:

 ### 2. Clean Up

 ```bash
 ./cleanup-test-env.sh

-This will destroy the AWS resources.
+This will destroy the AWS resources. Teardown can take 10–20 minutes depending on service state.


</blockquote></details>
<details>
<summary>server/datastore/mysql/common_mysql/aws_iam_auth_test.go (1)</summary><blockquote>

`9-121`: **Parallelize subtests and add a couple of edge cases.**

Running subtests in parallel speeds things up. Adding us-gov-east-1 and a China endpoint with port improves resiliency.

Apply this diff to parallelize and add cases:

```diff
 func TestExtractRDSRegion(t *testing.T) {
+	t.Parallel()
 	tests := []struct {
@@
 	for _, tt := range tests {
 		t.Run(tt.name, func(t *testing.T) {
+			t.Parallel()
 			got, err := extractRDSRegion(tt.endpoint)
@@
 		})
 	}
 }

Add these additional cases to the tests slice:

@@
 		{
 			name:     "GovCloud region",
 			endpoint: "database.abc123.us-gov-west-1.rds.amazonaws.com",
 			want:     "us-gov-west-1",
 			wantErr:  false,
 		},
+		{
+			name:     "GovCloud us-gov-east-1",
+			endpoint: "db123.abc123.us-gov-east-1.rds.amazonaws.com",
+			want:     "us-gov-east-1",
+			wantErr:  false,
+		},
@@
 		{
 			name:     "Endpoint with port should fail",
 			endpoint: "mydb.abc123.us-east-1.rds.amazonaws.com:3306",
 			wantErr:  true,
 		},
+		{
+			name:     "China endpoint with port should fail",
+			endpoint: "mydb.abc123.cn-north-1.rds.amazonaws.com.cn:3306",
+			wantErr:  true,
+		},
server/datastore/redis/redis_test.go (1)

115-250: Parallelize subtests and cover China and GovCloud partitions.

This widens confidence that parseElastiCacheEndpoint handles all AWS partitions.

Apply this diff to parallelize:

 func TestParseElastiCacheEndpoint(t *testing.T) {
+	t.Parallel()
@@
 	for _, tt := range tests {
 		t.Run(tt.name, func(t *testing.T) {
+			t.Parallel()
 			gotRegion, gotCacheName, err := parseElastiCacheEndpoint(tt.endpoint)

Add these cases to the tests slice:

@@
 		// Error cases
+		{
+			name:          "China partition serverless",
+			endpoint:      "prod-cache-xyz789.serverless.cnn1.cache.amazonaws.com.cn",
+			wantRegion:    "cn-north-1",
+			wantCacheName: "prod-cache",
+			wantErr:       false,
+		},
+		{
+			name:          "China partition standalone",
+			endpoint:      "master.cn-cache.abc123.cnnw1.cache.amazonaws.com.cn:6379",
+			wantRegion:    "cn-northwest-1",
+			wantCacheName: "cn-cache",
+			wantErr:       false,
+		},
+		{
+			name:          "GovCloud partition",
+			endpoint:      "gov-cache.abc123.us-gov-west-1.cache.amazonaws.com",
+			wantRegion:    "us-gov-west-1",
+			wantCacheName: "gov-cache",
+			wantErr:       false,
+		},
tools/redis-tests/elasticache/iam_auth.go (1)

12-17: Let -addr be portless (default 6379) and reflect that in the help text.

This makes the tool friendlier with AWS-provided endpoints that often omit ports.

Apply this diff to adjust the flag usage:

-var (
-	addrFlag       = flag.String("addr", "", "ElastiCache endpoint address, including port")
+var (
+	addrFlag       = flag.String("addr", "", "ElastiCache endpoint address (optionally with :port, defaults to 6379)")
 	userFlag       = flag.String("user", "", "Username for IAM authentication")
 	assumeRoleFlag = flag.String("assume-role", "", "STS assume role ARN (optional)")
 	externalIDFlag = flag.String("external-id", "", "STS external ID (optional)")
 )
tools/mysql-tests/rds/iam_auth.go (1)

54-57: Consider logging the final normalized address.

Helps when users pass host:port and the tool overrides the provided -port.

Apply this diff to improve visibility:

-	log.Printf("Connecting to RDS at %s:%s with IAM auth for user %s", *endpointFlag, *portFlag, *userFlag)
+	log.Printf("Connecting to RDS at %s with IAM auth for user %s", mysqlConfig.Address, *userFlag)
tools/redis-tests/elasticache/deploy-test-env.sh (1)

5-7: Remove or use unused variables (ShellCheck SC2034)

VPC_CIDR and SUBNET_CIDR are declared but never used. Either pass them as CloudFormation parameters or remove them to avoid confusion.

Apply one of these diffs.

Option A: remove them

-VPC_CIDR="10.0.0.0/16"
-SUBNET_CIDR="10.0.1.0/24"

Option B: plumb them into the stack (only if cf-template.yaml supports these parameters)

 aws cloudformation create-stack \
   --stack-name "$STACK_NAME" \
   --template-body file://cf-template.yaml \
-  --parameters ParameterKey=KeyName,ParameterValue="$KEY_NAME" ParameterKey=InstanceType,ParameterValue="$INSTANCE_TYPE" \
+  --parameters \
+    ParameterKey=KeyName,ParameterValue="$KEY_NAME" \
+    ParameterKey=InstanceType,ParameterValue="$INSTANCE_TYPE" \
+    ParameterKey=VPCCIDR,ParameterValue="$VPC_CIDR" \
+    ParameterKey=SubnetCIDR,ParameterValue="$SUBNET_CIDR" \
   --capabilities CAPABILITY_NAMED_IAM
server/datastore/redis/redis.go (1)

334-345: Consider logging when IAM auth is used (optional)

A debug log can help operators confirm IAM auth is active. Example: “Using Redis IAM auth for (region )”. This is optional since the code already behaves correctly.

server/datastore/redis/gen_aws_region_map.go (2)

7-8: Fix run instructions in the comment

The filename is gen_aws_region_map.go, but the comment says gen_region_map.go.

-// To run: go run gen_region_map.go
+// To run: go run gen_aws_region_map.go

143-147: Avoid external gofmt invocation; format using go/format instead

Calling the gofmt binary may fail if it’s not available in PATH. Use go/format to format the buffer before writing it.

Example change:

- // Run gofmt on the generated file
- cmd := exec.Command("gofmt", "-w", "aws_region_map_generated.go")
- if err := cmd.Run(); err != nil {
-   log.Printf("Warning: Failed to run gofmt: %v", err)
- }
+ // Optionally format with go/format for robustness
+ // buf := new(bytes.Buffer)
+ // if err := tmpl.Execute(buf, entries); err != nil {
+ //   log.Fatalf("Failed to execute template: %v", err)
+ // }
+ // formatted, err := format.Source(buf.Bytes())
+ // if err != nil {
+ //   log.Printf("Warning: Failed to format generated code: %v", err)
+ //   formatted = buf.Bytes()
+ // }
+ // if _, err := outputFile.Write(formatted); err != nil {
+ //   log.Fatalf("Failed to write output file: %v", err)
+ // }
+ // return

If you prefer to keep the current approach, make the gofmt step best-effort and non-fatal (as you already do).

server/datastore/mysql/common_mysql/aws_iam_auth.go (1)

25-50: Avoid context.Background() for AWS config loading

Using context.Background() prevents cancellation/deadline propagation during STS calls and credential resolution. Prefer accepting a context and passing it through.

-func newAWSIAMAuthTokenGenerator(dbEndpoint, dbUsername, dbPort, region, assumeRoleArn, stsExternalID string) (*awsIAMAuthTokenGenerator, error) {
-  // Load AWS configuration
-  cfg, err := aws_common.LoadAWSConfig(context.Background(), region, assumeRoleArn, stsExternalID)
+func newAWSIAMAuthTokenGenerator(ctx context.Context, dbEndpoint, dbUsername, dbPort, region, assumeRoleArn, stsExternalID string) (*awsIAMAuthTokenGenerator, error) {
+  // Load AWS configuration
+  cfg, err := aws_common.LoadAWSConfig(ctx, region, assumeRoleArn, stsExternalID)

Follow-ups: update call sites to pass a context (e.g., from Connect(ctx) or initialization path).

server/datastore/redis/aws_iam_auth.go (2)

110-149: Handle ports and China regions; add tests for endpoint parsing

Parsing removes the port (good) and locates region via “cache.amazonaws”. Consider explicitly allowing .com.cn suffix (parsing continues to work, but an early validation can improve error messages). Also, rely on a lowercased hostname to avoid case edge cases.

 func parseElastiCacheEndpoint(endpoint string) (region, cacheName string, err error) {
   // Remove port if present
   hostname := endpoint
   if idx := strings.LastIndex(hostname, ":"); idx != -1 {
     hostname = hostname[:idx]
   }
+  hostname = strings.ToLower(hostname)

   // Extract region from the ElastiCache endpoint
   // Format: name.region.cache.amazonaws.com or name.serverless.region.cache.amazonaws.com
   parts := strings.Split(hostname, ".")
   if len(parts) < 4 {
     return "", "", fmt.Errorf("invalid ElastiCache endpoint format for IAM auth: %s", endpoint)
   }

   // Find the region code - it's always before "cache.amazonaws.com"
   var regionCode string
   for i := 0; i < len(parts)-2; i++ {
     if parts[i+1] == "cache" && parts[i+2] == "amazonaws" {
       regionCode = parts[i]
       break
     }
   }

I can add table-driven tests for:

  • serverless endpoint
  • cluster node names with numeric suffix
  • endpoints with ports
  • China regions (.com.cn)
    Would you like me to open a follow-up PR with tests?

151-204: Name extraction heuristics could false-positive on hyphenated IDs

For replication group IDs that legitimately end with -NNN not representing node suffix, this could chop too much. If this becomes an issue, consider checking the next label for exact replication group match only for cluster-mode patterns; else, do not strip.

Add unit tests demonstrating:

  • replication-group-id-202 read as full ID when not followed by same ID in next label
  • cluster mode replication-group-id-001.replication-group-id... collapses to replication-group-id
tools/mysql-tests/rds/deploy-test-env.sh (1)

81-91: Be careful storing plaintext DB credentials

test-env-info.txt stores DB_PASSWORD. If this is only for ephemeral testing, that’s okay, but please add a prominent warning and .gitignore protection for this file.

Add test-env-info.txt to .gitignore and print a reminder that it contains secrets.

tools/mysql-tests/rds/cf-template.yaml (2)

406-425: Pinning AMI via SSM is good; consider installing mysql-client instead of mariadb105 on AL2023

Not critical, but mysql client may more closely match some user environments. mariadb105 is fine for tests.


426-465: Fix YAML linting issues (trailing spaces, EOF newline)

There are trailing spaces at several lines and no newline at EOF per YAMLlint. Please clean these up to reduce noise in diffs.

You can auto-fix with a formatter (e.g., yamlfmt) or an editor setting to trim trailing whitespace and ensure a final newline.

server/aws_common/iam_auth.go (1)

49-51: Optional: avoid predictable jitter without global rand seeding

math/rand without seeding yields a deterministic sequence per process. It’s not a security issue here, but if you want more distributed refresh across pods, consider seeding once or using time-based jitter.

-	jitter := time.Duration(rand.Int63n(int64(maxJitter))) //nolint:gosec // jitter doesn't need cryptographic randomness
+	// Not security-sensitive; seed once in init() or derive from time for better distribution.
+	jitter := time.Duration(time.Now().UnixNano()%int64(maxJitter)) //nolint:gosec
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5363ce1 and 6e8b900.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (25)
  • changes/aws-elasticache-iam-auth (1 hunks)
  • changes/aws-rds-iam-auth (1 hunks)
  • cmd/fleet/serve.go (1 hunks)
  • docs/Configuration/fleet-server-configuration.md (14 hunks)
  • go.mod (4 hunks)
  • server/aws_common/iam_auth.go (1 hunks)
  • server/config/config.go (6 hunks)
  • server/datastore/mysql/common_mysql/aws_iam_auth.go (1 hunks)
  • server/datastore/mysql/common_mysql/aws_iam_auth_test.go (1 hunks)
  • server/datastore/mysql/common_mysql/common.go (5 hunks)
  • server/datastore/redis/aws_iam_auth.go (1 hunks)
  • server/datastore/redis/aws_region_map_generated.go (1 hunks)
  • server/datastore/redis/gen_aws_region_map.go (1 hunks)
  • server/datastore/redis/redis.go (3 hunks)
  • server/datastore/redis/redis_test.go (1 hunks)
  • tools/mysql-tests/rds/README.md (1 hunks)
  • tools/mysql-tests/rds/cf-template.yaml (1 hunks)
  • tools/mysql-tests/rds/cleanup-test-env.sh (1 hunks)
  • tools/mysql-tests/rds/deploy-test-env.sh (1 hunks)
  • tools/mysql-tests/rds/iam_auth.go (1 hunks)
  • tools/redis-tests/elasticache/README.md (1 hunks)
  • tools/redis-tests/elasticache/cf-template.yaml (1 hunks)
  • tools/redis-tests/elasticache/cleanup-test-env.sh (1 hunks)
  • tools/redis-tests/elasticache/deploy-test-env.sh (1 hunks)
  • tools/redis-tests/elasticache/iam_auth.go (1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-08-08T07:40:05.301Z
Learnt from: getvictor
PR: fleetdm/fleet#31726
File: server/datastore/mysql/labels_test.go:2031-2031
Timestamp: 2025-08-08T07:40:05.301Z
Learning: Fleet repo targets Go 1.24.5 (root go.mod), which supports testing.T.Context(). Do not flag usage of t.Context() or suggest replacing it with context.Background() in tests (e.g., server/datastore/mysql/labels_test.go Line 2031 and similar).

Applied to files:

  • tools/mysql-tests/rds/iam_auth.go
📚 Learning: 2025-08-08T07:40:05.301Z
Learnt from: getvictor
PR: fleetdm/fleet#31726
File: server/datastore/mysql/labels_test.go:2031-2031
Timestamp: 2025-08-08T07:40:05.301Z
Learning: In fleetdm/fleet repository tests (server/datastore/mysql/labels_test.go and similar), using testing.T.Context() is valid because the project targets a recent Go version where testing.T.Context() exists. Do not suggest replacing t.Context() with context.Background() in this codebase.

Applied to files:

  • tools/mysql-tests/rds/iam_auth.go
🧬 Code Graph Analysis (5)
server/datastore/mysql/common_mysql/aws_iam_auth.go (1)
server/aws_common/iam_auth.go (3)
  • IAMAuthTokenManager (30-37)
  • LoadAWSConfig (84-100)
  • NewIAMAuthTokenManager (40-44)
tools/redis-tests/elasticache/iam_auth.go (1)
server/datastore/redis/redis.go (2)
  • NewPool (96-116)
  • PoolConfig (63-92)
tools/mysql-tests/rds/iam_auth.go (2)
server/config/config.go (1)
  • MysqlConfig (37-55)
server/datastore/mysql/common_mysql/common.go (2)
  • DBOptions (20-30)
  • NewDB (32-117)
server/datastore/redis/aws_iam_auth.go (1)
server/aws_common/iam_auth.go (3)
  • IAMAuthTokenManager (30-37)
  • LoadAWSConfig (84-100)
  • NewIAMAuthTokenManager (40-44)
server/aws_common/iam_auth.go (1)
server/aws_common/aws_common.go (1)
  • ConfigureAssumeRoleProvider (19-40)
🪛 Shellcheck (0.10.0)
tools/redis-tests/elasticache/deploy-test-env.sh

[warning] 5-5: VPC_CIDR appears unused. Verify use (or export if used externally).

(SC2034)


[warning] 6-6: SUBNET_CIDR appears unused. Verify use (or export if used externally).

(SC2034)

tools/mysql-tests/rds/deploy-test-env.sh

[warning] 100-100: engine appears unused. Verify use (or export if used externally).

(SC2034)


[warning] 106-106: Quote 'EOF' to make here document expansions happen on the server side rather than on the client.

(SC2087)

🪛 Checkov (3.2.334)
tools/mysql-tests/rds/cf-template.yaml

[HIGH] 288-307: Ensure all data stored in Aurora is securely encrypted at rest

(CKV_AWS_96)


[HIGH] 321-336: Ensure all data stored in Aurora is securely encrypted at rest

(CKV_AWS_96)

🪛 YAMLlint (1.37.1)
tools/mysql-tests/rds/cf-template.yaml

[error] 213-213: trailing spaces

(trailing-spaces)


[error] 214-214: trailing spaces

(trailing-spaces)


[error] 217-217: trailing spaces

(trailing-spaces)


[error] 220-220: trailing spaces

(trailing-spaces)


[error] 223-223: trailing spaces

(trailing-spaces)


[error] 260-260: trailing spaces

(trailing-spaces)


[error] 262-262: trailing spaces

(trailing-spaces)


[error] 265-265: trailing spaces

(trailing-spaces)


[error] 268-268: trailing spaces

(trailing-spaces)


[error] 271-271: trailing spaces

(trailing-spaces)


[error] 375-375: trailing spaces

(trailing-spaces)


[error] 465-465: no new line character at the end of file

(new-line-at-end-of-file)

🔇 Additional comments (29)
go.mod (4)

24-24: Adding aws-sdk-go-v2 RDS auth feature is appropriate

This is the right module for generating IAM auth tokens for RDS/MySQL. It aligns with the rest of the v2 SDK usage in the repo.


65-65: Upgrade to go-sql-driver/mysql v1.9.1 looks good

This version is compatible and stable. No known breaking changes with common DSN options used by Fleet.


122-122: rdsmysql/v2 addition fits the RDS IAM flow

This is a good choice when pairing with go-sql-driver/mysql and AWS SDK v2. Please ensure it’s not duplicating token generation paths with feature/rds/auth (it usually wraps SDK calls under the hood).


74-74: go-tpm is imported in securehw_linux.go — direct dependency is justified

I ran the repo search and found direct imports of github.com/google/go-tpm in ee/orbit/pkg/securehw/securehw_linux.go, so keeping the module in go.mod is appropriate.

  • ee/orbit/pkg/securehw/securehw_linux.go
    • lines 18–20: imports "github.com/google/go-tpm/tpm2", "github.com/google/go-tpm/tpm2/transport", "github.com/google/go-tpm/tpm2/transport/linuxtpm"
  • go.mod: line 74 includes github.com/google/go-tpm v0.9.5 (go.sum also contains matching entries)

No change required here.

server/config/config.go (6)

38-55: MySQL config gains STS assume role settings — looks good

The new fields and yaml tags are consistent with existing AWS config patterns.


64-65: Redis config gains STS assume role settings — looks good

This mirrors the MySQL settings and other AWS service configs.


1051-1053: Expose mysql.sts_assume_role_arn and mysql.sts_external_id — good addition

Naming is consistent and descriptions are clear.


1086-1088: Expose redis.sts_assume_role_arn and redis.sts_external_id — good addition

This matches the config style of other AWS integrations.


1482-1499: Wire STS fields into loaded MySQL config — good

The loader picks up both fields for primary and read replica configs.


1529-1531: Wire STS fields into loaded Redis config — good

This enables IAM auth flows to access assume-role details.

tools/redis-tests/elasticache/cf-template.yaml (1)

1-302: CloudFormation template looks well-structured for testing IAM authentication.

This template provides a comprehensive test environment for ElastiCache IAM authentication with both serverless and standalone configurations. The IAM roles and policies are properly configured to test different authentication scenarios including direct access and role assumption.

A few observations:

  • The security group setup correctly allows Redis traffic between EC2 and ElastiCache
  • Both user types (fleet-test-user and test-assume-user) are configured for IAM authentication
  • The assume role configuration includes proper trust relationships and conditions
docs/Configuration/fleet-server-configuration.md (5)

26-26: Clear documentation for automatic IAM usage.

The documentation clearly explains when IAM authentication is automatically used for MySQL connections, which helps users understand the behavior without needing to configure additional flags.


201-223: Well-documented MySQL STS configuration options.

The new MySQL STS configuration options are clearly documented with appropriate examples and descriptions. The pairing guidance for external ID usage is helpful for security best practices.


242-242: Important note about automatic IAM usage for Redis.

Based on the past review comments, there was discussion about whether this automatic behavior could be breaking for existing users. The current approach seems reasonable - if an ElastiCache endpoint is specified without a password, IAM auth is used automatically.

However, I noticed from the past comments that there was consideration for making this behavior more explicit. You might want to verify this doesn't break existing deployments where ElastiCache clusters exist without passwords but users don't intend to use IAM auth.

Can you confirm whether there are any managed cloud offerings or existing deployments that might be affected by this automatic IAM usage when redis_password is not specified?


300-323: Comprehensive Redis STS configuration documentation.

The Redis STS configuration options are well-documented with clear examples and consistent formatting that matches the MySQL options. This provides users with the flexibility to configure role assumption for ElastiCache authentication.


1403-1428: Consistent STS configuration across all AWS services.

The STS assume role and external ID configuration has been consistently added across all AWS services (Firehose, Kinesis, Lambda, SES, S3). The documentation follows the same pattern and provides clear examples for each service.

Also applies to: 1551-1576, 1699-1724, 2038-2061, 2146-2169

changes/aws-rds-iam-auth (1)

1-1: Clear and concise changelog entry.

The changelog entry accurately describes the new RDS IAM authentication support and mentions the key configuration option for role assumption.

cmd/fleet/serve.go (1)

269-270: Redis pool configuration updated with STS fields.

The addition of StsAssumeRoleArn and StsExternalID fields to the Redis pool configuration properly wires the new IAM authentication options from the config into the Redis connection setup.

changes/aws-elasticache-iam-auth (1)

1-1: Clear changelog entry for ElastiCache IAM support.

The changelog entry accurately describes the new ElastiCache IAM authentication feature and mentions the STS role assumption capability.

server/datastore/mysql/common_mysql/aws_iam_auth_test.go (1)

9-121: LGTM: solid coverage across RDS, Aurora, proxy, China, and GovCloud.

Nice job covering common and tricky endpoint shapes, plus strict error assertions.

server/datastore/redis/redis_test.go (1)

115-250: LGTM: practical endpoint coverage with serverless, standalone, cluster, and hyphenated names.

The table tests map short region codes to full regions well and cover port/no-port forms.

server/datastore/redis/aws_region_map_generated.go (1)

5-40: Generated region map looks good

The mapping covers current regions and follows the intended zone-prefix-to-region approach. Keeping this unexported inside the package is appropriate.

server/datastore/mysql/common_mysql/common.go (2)

136-152: Good TLS and cleartext handling for IAM/RDS

Setting tls=rdsmysql for direct RDS and tls=true for RDS Proxy, plus allowCleartextPasswords=true, matches AWS guidance for IAM auth with go-sql-driver/mysql.


142-149: No change needed — isRDSProxyEndpoint accepts host:port

I verified the detector: isRDSProxyEndpoint checks substrings in the full address, so passing conf.Address (which may include :port) still works. The suggested change is unnecessary.

  • Location: server/datastore/mysql/common_mysql/aws_iam_auth.go (lines ~137–141)
  • Function:
    // isRDSProxyEndpoint checks if the given address is an RDS Proxy endpoint
    func isRDSProxyEndpoint(address string) bool {
        // RDS Proxy endpoints have the format: proxy-name.proxy-xxxxxxxxx.region.rds.amazonaws.com
        return strings.Contains(address, ".proxy-") && strings.Contains(address, ".rds.amazonaws.")
    }

Action: ignore the proposed diff; no code change required.

Likely an incorrect or invalid review comment.

server/datastore/redis/redis.go (1)

278-296: Solid IAM detection and TLS enforcement for ElastiCache

Auto-enabling TLS for IAM auth and generating the token at dial time is correct. Appending redis.DialPassword(token) with redis.DialUsername set earlier ensures AUTH happens as expected.

server/datastore/mysql/common_mysql/aws_iam_auth.go (2)

58-66: LGTM on token generation

Using rds/auth.BuildAuthToken with host:port, region, username, and credentials is correct. Error wrapping is clear.


137-141: Proxy endpoint detection looks good

The pattern check for .proxy- and .rds.amazonaws. is a pragmatic and safe detection.

server/datastore/redis/aws_iam_auth.go (1)

62-92: Presigning flow for ElastiCache IAM auth looks correct

You’re building a GET with Action=connect and User=..., setting X-Amz-Expires=900, and presigning with service “elasticache”. Trimming the https:// prefix matches AWS guidance.

server/aws_common/iam_auth.go (1)

46-81: Token caching with jitter looks solid

The double-checked lock with RWMutex is correct, and the 10m+jitter refresh window is appropriate for 15m tokens.

Comment on lines +73 to +100
// extractRDSRegion extracts the AWS region from an RDS endpoint
func extractRDSRegion(endpoint string) (string, error) {
// RDS endpoint formats:
// - instance-name.abcdefg.region.rds.amazonaws.com (regular RDS)
// - cluster-name.cluster-abcdefg.region.rds.amazonaws.com (Aurora cluster endpoint)
// - cluster-name.cluster-ro-abcdefg.region.rds.amazonaws.com (Aurora read-only endpoint)
// - proxy-name.proxy-abcdefg.region.rds.amazonaws.com (RDS Proxy endpoint)
// - instance-name.abcdefg.region.rds.amazonaws.com.cn (China regions)

parts := strings.Split(endpoint, ".")
if len(parts) < 5 {
return "", fmt.Errorf("invalid RDS endpoint format for IAM auth: %s", endpoint)
}

// Check if it's an RDS endpoint
if !strings.Contains(endpoint, ".rds.amazonaws.com") {
return "", fmt.Errorf("endpoint does not appear to be an RDS endpoint: %s", endpoint)
}

// Find the region - it's always before "rds.amazonaws.com"
for i := 0; i < len(parts)-3; i++ {
if parts[i+1] == "rds" && parts[i+2] == "amazonaws" && (parts[i+3] == "com" || parts[i+3] == "com.cn") {
return parts[i], nil
}
}

return "", fmt.Errorf("could not extract region from RDS endpoint: %s", endpoint)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Harden region parsing; strip port if present and validate suffixes

If a host:port is passed, region parsing will fail. Also, you only validate against .com/.com.cn later but not before stripping the port.

 func extractRDSRegion(endpoint string) (string, error) {
+  // Remove port if present
+  if idx := strings.LastIndex(endpoint, ":"); idx != -1 {
+    endpoint = endpoint[:idx]
+  }
   // RDS endpoint formats:
   // - instance-name.abcdefg.region.rds.amazonaws.com (regular RDS)
   // - cluster-name.cluster-abcdefg.region.rds.amazonaws.com (Aurora cluster endpoint)
   // - cluster-name.cluster-ro-abcdefg.region.rds.amazonaws.com (Aurora read-only endpoint)
   // - proxy-name.proxy-abcdefg.region.rds.amazonaws.com (RDS Proxy endpoint)
   // - instance-name.abcdefg.region.rds.amazonaws.com.cn (China regions)

   parts := strings.Split(endpoint, ".")
   if len(parts) < 5 {
     return "", fmt.Errorf("invalid RDS endpoint format for IAM auth: %s", endpoint)
   }

   // Check if it's an RDS endpoint
-  if !strings.Contains(endpoint, ".rds.amazonaws.com") {
+  if !(strings.Contains(endpoint, ".rds.amazonaws.com") || strings.Contains(endpoint, ".rds.amazonaws.com.cn")) {
     return "", fmt.Errorf("endpoint does not appear to be an RDS endpoint: %s", endpoint)
   }

   // Find the region - it's always before "rds.amazonaws.com"
   for i := 0; i < len(parts)-3; i++ {
-    if parts[i+1] == "rds" && parts[i+2] == "amazonaws" && (parts[i+3] == "com" || parts[i+3] == "com.cn") {
+    if parts[i+1] == "rds" && parts[i+2] == "amazonaws" && (parts[i+3] == "com" || parts[i+3] == "com.cn") {
       return parts[i], nil
     }
   }

To confirm coverage, consider adding tests with host:port and .com.cn variants.


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Find the file containing extractRDSRegion
rg -n "func extractRDSRegion" || true

# Print the file if it exists
FILE=$(rg -n --files-with-match "func extractRDSRegion" || true)
if [ -n "$FILE" ]; then
  echo "---- $FILE ----"
  sed -n '1,240p' "$FILE"
else
  echo "extractRDSRegion not found in repository."
fi

Length of output: 797


🏁 Script executed:

#!/bin/bash
set -euo pipefail

FILE="server/datastore/mysql/common_mysql/aws_iam_auth.go"
if [ -f "$FILE" ]; then
  echo "---- $FILE (lines 1-240) ----"
  sed -n '1,240p' "$FILE"
else
  echo "file not found: $FILE"
  exit 1
fi

Length of output: 4984


Harden region parsing; strip port and accept .com.cn

If a host:port is passed the current parsing can fail (the port ends up in the last label). Also the function only checks for ".rds.amazonaws.com" up front, rejecting China endpoints even though the loop accepts "com.cn".

  • Files to change
    • server/datastore/mysql/common_mysql/aws_iam_auth.go — extractRDSRegion

Suggested patch (keep it small and safe):

 func extractRDSRegion(endpoint string) (string, error) {
+  // Remove port if present
+  if idx := strings.LastIndex(endpoint, ":"); idx != -1 {
+    endpoint = endpoint[:idx]
+  }
   // RDS endpoint formats:
   // - instance-name.abcdefg.region.rds.amazonaws.com (regular RDS)
   // - cluster-name.cluster-abcdefg.region.rds.amazonaws.com (Aurora cluster endpoint)
   // - cluster-name.cluster-ro-abcdefg.region.rds.amazonaws.com (Aurora read-only endpoint)
   // - proxy-name.proxy-abcdefg.region.rds.amazonaws.com (RDS Proxy endpoint)
   // - instance-name.abcdefg.region.rds.amazonaws.com.cn (China regions)

   parts := strings.Split(endpoint, ".")
   if len(parts) < 5 {
     return "", fmt.Errorf("invalid RDS endpoint format for IAM auth: %s", endpoint)
   }

   // Check if it's an RDS endpoint
-  if !strings.Contains(endpoint, ".rds.amazonaws.com") {
+  if !(strings.Contains(endpoint, ".rds.amazonaws.com") || strings.Contains(endpoint, ".rds.amazonaws.com.cn")) {
     return "", fmt.Errorf("endpoint does not appear to be an RDS endpoint: %s", endpoint)
   }

Please add unit tests covering:

  • host:port endpoints (e.g. example.region.rds.amazonaws.com:3306)
  • China regions (.rds.amazonaws.com.cn)
  • RDS Proxy / cluster endpoint shapes

Thanks — this will make IAM token generation robust for ports and China regions.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// extractRDSRegion extracts the AWS region from an RDS endpoint
func extractRDSRegion(endpoint string) (string, error) {
// RDS endpoint formats:
// - instance-name.abcdefg.region.rds.amazonaws.com (regular RDS)
// - cluster-name.cluster-abcdefg.region.rds.amazonaws.com (Aurora cluster endpoint)
// - cluster-name.cluster-ro-abcdefg.region.rds.amazonaws.com (Aurora read-only endpoint)
// - proxy-name.proxy-abcdefg.region.rds.amazonaws.com (RDS Proxy endpoint)
// - instance-name.abcdefg.region.rds.amazonaws.com.cn (China regions)
parts := strings.Split(endpoint, ".")
if len(parts) < 5 {
return "", fmt.Errorf("invalid RDS endpoint format for IAM auth: %s", endpoint)
}
// Check if it's an RDS endpoint
if !strings.Contains(endpoint, ".rds.amazonaws.com") {
return "", fmt.Errorf("endpoint does not appear to be an RDS endpoint: %s", endpoint)
}
// Find the region - it's always before "rds.amazonaws.com"
for i := 0; i < len(parts)-3; i++ {
if parts[i+1] == "rds" && parts[i+2] == "amazonaws" && (parts[i+3] == "com" || parts[i+3] == "com.cn") {
return parts[i], nil
}
}
return "", fmt.Errorf("could not extract region from RDS endpoint: %s", endpoint)
}
// extractRDSRegion extracts the AWS region from an RDS endpoint
func extractRDSRegion(endpoint string) (string, error) {
// Remove port if present
if idx := strings.LastIndex(endpoint, ":"); idx != -1 {
endpoint = endpoint[:idx]
}
// RDS endpoint formats:
// - instance-name.abcdefg.region.rds.amazonaws.com (regular RDS)
// - cluster-name.cluster-abcdefg.region.rds.amazonaws.com (Aurora cluster endpoint)
// - cluster-name.cluster-ro-abcdefg.region.rds.amazonaws.com (Aurora read-only endpoint)
// - proxy-name.proxy-abcdefg.region.rds.amazonaws.com (RDS Proxy endpoint)
// - instance-name.abcdefg.region.rds.amazonaws.com.cn (China regions)
parts := strings.Split(endpoint, ".")
if len(parts) < 5 {
return "", fmt.Errorf("invalid RDS endpoint format for IAM auth: %s", endpoint)
}
// Check if it's an RDS endpoint
if !(strings.Contains(endpoint, ".rds.amazonaws.com") || strings.Contains(endpoint, ".rds.amazonaws.com.cn")) {
return "", fmt.Errorf("endpoint does not appear to be an RDS endpoint: %s", endpoint)
}
// Find the region - it's always before "rds.amazonaws.com"
for i := 0; i < len(parts)-3; i++ {
if parts[i+1] == "rds" && parts[i+2] == "amazonaws" && (parts[i+3] == "com" || parts[i+3] == "com.cn") {
return parts[i], nil
}
}
return "", fmt.Errorf("could not extract region from RDS endpoint: %s", endpoint)
}
🤖 Prompt for AI Agents
In server/datastore/mysql/common_mysql/aws_iam_auth.go around lines 73-100, the
extractRDSRegion function fails for host:port inputs and currently rejects China
endpoints up front; update it to first strip any ":port" (use net.SplitHostPort
or if that fails, trim at last ':' when there are no IPv6 brackets) so parsing
works with host:port, broaden the upfront endpoint check to accept both
".rds.amazonaws.com" and ".rds.amazonaws.com.cn" (or check labels instead of a
single substring), then split the cleaned host by '.' and locate the region
label immediately before the "rds","amazonaws","com" or
"rds","amazonaws","com","cn" sequence as the loop already does; add unit tests
for host:port (example.region.rds.amazonaws.com:3306), China endpoints
(example.region.rds.amazonaws.com.cn), and proxy/cluster shapes to verify the
region extraction.

Comment thread server/datastore/mysql/common_mysql/aws_iam_auth.go
Comment on lines +76 to +85
var db *sqlx.DB
if useIAMAuth {
connector := &awsIAMAuthConnector{
driverName: driverName,
baseDSN: generateMysqlConnectionString(*conf),
tokenGen: iamTokenGen,
logger: opts.Logger,
}
db = sqlx.NewDb(sql.OpenDB(connector), driverName)
} else {
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot Aug 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

IAM connector likely bypasses sqlmw/APM/OTEL driver instrumentation

Switching to sqlx.NewDb(sql.OpenDB(connector), driverName) means the registered wrapper drivers (mysql-mw for sqlmw, apm/mysql, or the OpenTelemetry driver) may not wrap connections created by awsIAMAuthConnector. Passing driverName to sqlx.NewDb only affects sqlx’s dialect selection, not runtime instrumentation.

Risk:

  • Loss of DB query traces/metrics/logging on the IAM path.
  • sqlmw interceptors won’t run.

Paths to fix:

  • Ensure awsIAMAuthConnector internally builds a driver.Connector using the wrapped driver (e.g., the otel/apm/sqlmw registered one), not the bare mysql driver.
  • If you use otelsql, prefer using its connector-aware APIs (e.g., OpenDB/NewConnector) to wrap your connector.
  • Alternatively, refactor awsIAMAuthConnector to delegate to the wrapped driver name via OpenConnector on the registered driver.

I can help adapt awsIAMAuthConnector once I see its implementation.

Run this quick scan to assess whether wrapped drivers are still used on the IAM path:


🏁 Script executed:

#!/bin/bash
# Look for otel/apm/sqlmw registration and usage patterns
rg -n "otelsql|apm/mysql|sqlmw\.Driver|Register\(" -A 2
# Inspect the IAM connector implementation
fd aws_iam_auth\.go | xargs -I {} sh -c 'echo "--- {} ---"; sed -n "1,200p" "{}"'

Length of output: 18296


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "----- server/datastore/mysql/mysql.go -----"
sed -n '1,420p' server/datastore/mysql/mysql.go || true

echo
echo "----- server/datastore/mysql/common_mysql/common.go -----"
sed -n '1,240p' server/datastore/mysql/common_mysql/common.go || true

Length of output: 20131


🌐 Web query:

XSAM/otelsql NewConnector otelsql.NewConnector
otelsql NewConnector driver.Connector
otelsql OpenDB NewConnector
ngrok sqlmw connector OpenConnector NewConnector
sqlmw NewConnector
github apm/mysql connector OpenConnector
database/sql sql.OpenDB driver.Connector behavior
sqlx.NewDb driverName effect instrumentation

💡 Result:

otelsql.NewConnector (from XSAM/otelsql) creates a new OpenTelemetry-instrumented database connector that implements the Go driver.Connector interface, allowing its use with otelsql.OpenDB just as standard connectors are used with database/sql.OpenDB[3][9][1]. This enables tracing and observability for any Go SQL driver that is instrumented using otelsql. The patterns for connector support, wrapping, and general driver.Connector usage are as follows:


Instrumentation Flow for otelsql and Connectors

  • otelsql.NewConnector returns a driver.Connector-compatible object that wraps an underlying DSN and driver with OpenTelemetry instrumentation. This is similar to how standard Go SQL driver connectors behave, but traces DB operations[3][9].
  • otelsql.OpenDB(connector) returns an instrumented *sql.DB object using the provided OpenTelemetry-wrapped connector[9][3].
  • otelsql.Open(driverName, dsn, ...opts) is a convenience for direct OpenTelemetry-instrumented DB access. This mirrors the behavior of sql.Open but with otelsql tracing, and does not require manual connector creation[7][9].

How driver.Connector, sql.OpenDB, and Wrappers Interact

  • driver.Connector: The Go interface for advanced connection pooling and configuration reuse. It allows DSN parsing only once, and enables (but does not require) context support for dialing[1].
  • sql.OpenDB(connector): This Go function establishes an *sql.DB instance from a driver.Connector, rather than from a driver name and DSN string[1].
  • All instrumented database wrappers (e.g., otelsql, sqlmw, APM/otel, etc.) that wish to fully support OpenTelemetry, interception, or other middleware need to implement, or wrap, this interface.

Middleware Example: ngrok/sqlmw

  • ngrok/sqlmw provides middleware/wrapping for Go SQL drivers, but does not expose a dedicated NewConnector helper. Instead, you wrap the base driver and register it under a new name (e.g. "postgres-mw") with sql.Register. Instrumentation/interception is handled at the driver level, not via a connector creation helper[4][6].
  • With sqlmw, you use sql.Open (not OpenDB(connector)) because the driver is already wrapped and registered under a custom name[4].

APM/Mysql Connector Pattern

  • Many APM SQL wrappers (e.g., OpenTelemetry, Elastic APM) also provide either a custom OpenConnector implementation or expose their own connector-creation helper. Usage pattern:
    • Create a connector via e.g. apmmysql.NewConnector(...)
    • Pass this connector to standard sql.OpenDB(connector) for maximum composition and flexibility.

Key behaviors and recommendations

Approach Connector Support Usage OpenTelemetry Instrumented? Notes
otelsql.OpenDB Yes, via Connector db := otelsql.OpenDB(connector) Yes Connector is OTel-instrumented and composable.
otelsql.Open Yes, via DSN db := otelsql.Open(driver, dsn, ...) Yes Easiest, but less composable than OpenDB.
sqlmw.Driver No Connector API Register wrapped driver; sql.Open Interceptor/middleware Used for function-level interception.
apmmysql.OpenDB/Connector Yes db := sql.OpenDB(connector) Yes For monitoring/metrics, follows same pattern.
database/sql Yes db := sql.OpenDB(connector) No Standard library without telemetry.
sqlx.NewDb Yes, but pass driver name and *sql.DB sqlx.NewDb(db, driverName) Follows underlying *sql.DB Use after wrapping or instrumenting.

Special Notes on sqlx.NewDb

  • sqlx.NewDb(db *sql.DB, driverName string) does not itself perform instrumentation—instead, it adapts an already-opened, possibly instrumented, *sql.DB for use with sqlx extensions.
  • The driverName is only for sqlx's internal metadata/tracing; the actual DB behavior (instrumented or not) depends on how you created the *sql.DB (i.e., via otelsql.OpenDB, otelsql.Open, sqlmw, etc.)[1][9].

Summary: To instrument SQL drivers with tracing and metrics using otelsql or similar libraries, use their connector construction (e.g., otelsql.NewConnector) or driver wrapping helper, create a connector or registered driver, and use either OpenDB(connector) or Open(driver, dsn) as appropriate. Avoid using raw driver connections if you want consistent instrumentation and metrics across all operations[3][9][7][4].

Citations:


Fix IAM connector: it currently bypasses otel/apm/sqlmw instrumentation

I verified the code paths. The IAM connector builds and returns the raw mysql connector/driver, so registered wrappers (otelsql/apm/mysql/sqlmw) are skipped on the IAM path — traces and sqlmw interceptors will be lost.

Files to fix

  • server/datastore/mysql/common_mysql/common.go — NewDB: when useIAMAuth is true you construct awsIAMAuthConnector and call sqlx.NewDb(sql.OpenDB(connector), driverName).
  • server/datastore/mysql/common_mysql/aws_iam_auth.go — awsIAMAuthConnector.Connect uses mysql.NewConnector(cfg) and Driver() returns mysql.MySQLDriver{}.

Why this matters (very short)

  • sql.OpenDB(connector) uses the connector.Driver() / connector.Connect() implementations you provide. awsIAMAuthConnector currently constructs a connector using the raw mysql driver, so instrumentation registered under driverName (otelsql/apm/mysql/mysql-mw) is not used.

Suggested fixes (pick one)

  • Preferred: build the base mysql connector (mysql.NewConnector(cfg)) and wrap that connector with the instrumentation's connector API before returning Connect. For example (pseudo):
  • Alternative: when creating the awsIAMAuthConnector in NewDB, pass an actual instrumented driver/connector (or a factory) instead of just driverName, and delegate Connect()/Driver() to that injected connector so the wrapper is preserved.
  • If using sqlmw (which doesn't provide a connector helper), ensure the wrapped driver is registered and create a connector that delegates to that wrapped driver (or move token injection into a Dialer that the wrapped driver will use).

Minimal pseudo-diff (problem area)
Original (current):

connector := &awsIAMAuthConnector{ driverName: driverName, baseDSN: generateMysqlConnectionString(*conf), ... }
db = sqlx.NewDb(sql.OpenDB(connector), driverName)

Suggested approach (pseudo):

cfg := mysql.ParseDSN(...)
base := mysql.NewConnector(cfg)
var conn driver.Connector = base
if driverName == otelTracedDriverName {
  conn = otelsql.NewConnector(base) // wrap the base connector
}
db = sqlx.NewDb(sql.OpenDB(conn), driverName)

I can prepare a concrete patch for otelsql / apm / sqlmw cases if you want — the fix is small but needs to use the wrapper's connector APIs or inject the wrapped connector at creation time.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting... how does sqlx.Open() handle this? Surely it's not checking the driver name against all possible instrumentation wrappers.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

@sgress454 Great question! You're absolutely right - sqlx.Open() doesn't check against all possible instrumentation wrappers. Here's how it actually works:

How sqlx.Open() handles instrumentation

sqlx.Open(driverName, dsn) internally calls sql.Open(driverName, dsn), which:

  1. Looks up the driver by name in Go's global driver registry (sql.drivers)
  2. Uses whatever driver was registered under that name

The instrumentation libraries register their wrapped drivers at startup:

// sqlmw approach
sql.Register("mysql-mw", sqlmw.Driver(mysql.MySQLDriver{}, interceptor))

// otel approach  
sql.Register("mysql-otel", otelsql.Wrap(mysql.MySQLDriver{}))

So when you call sqlx.Open("mysql-mw", dsn), it automatically gets the wrapped driver.

The problem with sql.OpenDB(connector)

sql.OpenDB(connector) bypasses the driver registry entirely. It directly uses:

  • connector.Driver() to get the driver instance
  • connector.Connect(ctx) to create connections

The current IAM connector returns the raw MySQL driver:

func (c *awsIAMAuthConnector) Driver() driver.Driver {
    return mysql.MySQLDriver{}  // Raw driver, no instrumentation!
}

The solution is to make the IAM connector delegate to the already-registered wrapped driver, or wrap the base connector with the instrumentation's connector API before using sql.OpenDB().


🏁 Script executed:

#!/bin/bash
# Let's look at how the instrumentation is set up in NewDB
rg -n -A 10 -B 5 "Register.*sqlmw|apm/mysql|otelDriverName" server/datastore/mysql/common_mysql/common.go

Length of output: 1201


Perfect! Now I can see the exact issue. Looking at the code:

The Problem

Non-IAM path (works correctly):

// Line 70: sqlmw registers wrapped driver
sql.Register(driverName, sqlmw.Driver(mysql.MySQLDriver{}, opts.Interceptor))

// Lines 86-91: Uses the registered wrapped driver
db, err = sqlx.Open(driverName, dsn)  // ✅ Gets instrumented driver from registry

IAM path (bypasses instrumentation):

// Lines 77-85: Connector bypasses registry entirely
connector := &awsIAMAuthConnector{
    driverName: driverName,  // Just stored, not used for driver lookup
    // ...
}
db = sqlx.NewDb(sql.OpenDB(connector), driverName)  // ❌ Uses connector.Driver()

The Fix

The IAM connector should delegate to the registered driver instead of returning the raw MySQL driver. Here's the approach:

The solution is to modify the IAM connector to look up and use the registered instrumented driver, just like sqlx.Open() does internally.


🏁 Script executed:

#!/bin/bash
# Let's look at the current IAM connector implementation to see how it gets the driver
rg -n -A 20 "func.*Driver\(\)|Connect\(" server/datastore/mysql/common_mysql/aws_iam_auth.go

Length of output: 1170


Perfect! Now I can see exactly how to fix this. The issue is clear:

Current IAM Connector (Problem)

// Line 134: Returns raw MySQL driver, bypassing instrumentation
func (c *awsIAMAuthConnector) Driver() driver.Driver {
    return mysql.MySQLDriver{}  // ❌ Raw driver, no instrumentation
}

// Line 124-129: Creates raw MySQL connector  
connector, err := mysql.NewConnector(cfg)  // ❌ Raw connector
return connector.Connect(ctx)

The Fix

The IAM connector should use the registered instrumented driver instead of the raw MySQL driver. Here's the solution:

Option 1: Pass the registered driver to the IAM connector

// In NewDB(), get the registered driver first
var registeredDriver driver.Driver
if opts.Interceptor != nil {
    registeredDriver = sqlmw.Driver(mysql.MySQLDriver{}, opts.Interceptor)
} else {
    // For otel/apm, we'd need to look them up from the registry
    // This is more complex since Go doesn't expose sql.Drivers()
}

if useIAMAuth {
    connector := &awsIAMAuthConnector{
        driverName:       driverName,
        baseDSN:         generateMysqlConnectionString(*conf),
        tokenGen:        iamTokenGen,
        logger:          opts.Logger,
        instrumentedDriver: registeredDriver,  // Pass the wrapped driver
    }
    db = sqlx.NewDb(sql.OpenDB(connector), driverName)
}

Option 2 (Simpler): Let the IAM connector create instrumented connectors

Modify the IAM connector to handle instrumentation internally:

func (c *awsIAMAuthConnector) Connect(ctx context.Context) (driver.Conn, error) {
    token, err := c.tokenGen.getAuthToken(ctx)
    if err != nil {
        return nil, fmt.Errorf("failed to generate IAM auth token: %w", err)
    }

    cfg, err := mysql.ParseDSN(c.baseDSN)
    if err != nil {
        return nil, fmt.Errorf("failed to parse DSN: %w", err)
    }
    cfg.Passwd = token

    // Create base connector
    baseConnector, err := mysql.NewConnector(cfg)
    if err != nil {
        return nil, fmt.Errorf("failed to create connector: %w", err)
    }

    // Wrap with instrumentation based on driverName
    var finalConnector driver.Connector = baseConnector
    switch c.driverName {
    case "mysql-mw":
        // For sqlmw, we need the interceptor - would need to pass it in
        // This is tricky since sqlmw doesn't provide a connector wrapper
    default:
        // Handle otel/apm cases
        if strings.Contains(c.driverName, "otel") {
            // finalConnector = otelsql.NewConnector(baseConnector)
        }
    }

    return finalConnector.Connect(ctx)
}

The cleanest approach is Option 1 where we pass the already-registered instrumented driver to the IAM connector and have it delegate appropriately. This maintains the same instrumentation path for both IAM and non-IAM connections.

Would you like me to create a concrete patch showing exactly how to implement this?


🧬 Code Graph Analysis Results
  • server/datastore/mysql/common_mysql/common.go:61-74
	if opts.TracingConfig != nil && opts.TracingConfig.TracingEnabled {
		if opts.TracingConfig.TracingType == "opentelemetry" {
			driverName = otelDriverName
		} else {
			driverName = "apm/mysql"
		}
	}
	if opts.Interceptor != nil {
		driverName = "mysql-mw"
		sql.Register(driverName, sqlmw.Driver(mysql.MySQLDriver{}, opts.Interceptor))
	}
  • server/datastore/mysql/common_mysql/common.go:77-85
	if useIAMAuth {
		connector := &awsIAMAuthConnector{
			driverName: driverName,
			baseDSN:    generateMysqlConnectionString(*conf),
			tokenGen:   iamTokenGen,
			logger:     opts.Logger,
		}
		db = sqlx.NewDb(sql.OpenDB(connector), driverName)
	} else {
  • server/datastore/mysql/common_mysql/common.go:86-91
		dsn := generateMysqlConnectionString(*conf)
		db, err = sqlx.Open(driverName, dsn)
		if err != nil {
			return nil, err
		}
	}
  • server/config/config.go:37-55
MysqlConfig struct {
	Protocol         string `yaml:"protocol"`
	Address          string `yaml:"address"`
	Username         string `yaml:"username"`
	Password         string `yaml:"password"`
	PasswordPath     string `yaml:"password_path"`
	Database         string `yaml:"database"`
	TLSCert          string `yaml:"tls_cert"`
	TLSKey           string `yaml:"tls_key"`
	TLSCA            string `yaml:"tls_ca"`
	TLSServerName    string `yaml:"tls_server_name"`
	TLSConfig        string `yaml:"tls_config"` // tls=customValue in DSN
	MaxOpenConns     int    `yaml:"max_open_conns"`
	MaxIdleConns     int    `yaml:"max_idle_conns"`
	ConnMaxLifetime  int    `yaml:"conn_max_lifetime"`
	SQLMode          string `yaml:"sql_mode"`
	StsAssumeRoleArn string `yaml:"sts_assume_role_arn"`
	StsExternalID    string `yaml:"sts_external_id"`
}
  • server/config/config.go:252-260
LoggingConfig struct {
	Debug                bool
	JSON                 bool
	DisableBanner        bool          `yaml:"disable_banner"`
	ErrorRetentionPeriod time.Duration `yaml:"error_retention_period"`
	TracingEnabled       bool          `yaml:"tracing_enabled"`
	// TracingType can either be opentelemetry or elasticapm for whichever type of tracing wanted
	TracingType string `yaml:"tracing_type"`
}

🧠 Learnings used
Learnt from: sgress454
PR: fleetdm/fleet#31075
File: server/datastore/mysql/common_mysql/aws_iam_auth.go:111-130
Timestamp: 2025-08-20T21:24:59.223Z
Learning: In Fleet's MySQL IAM authentication implementation, the TLS and AllowCleartextPasswords configuration is handled in the generateMysqlConnectionString function in server/datastore/mysql/common_mysql/common.go, not in the awsIAMAuthConnector.Connect method. The connector receives a base DSN that already includes the necessary IAM authentication parameters.

Learnt from: sgress454
PR: fleetdm/fleet#31075
File: server/datastore/mysql/common_mysql/aws_iam_auth.go:111-130
Timestamp: 2025-08-20T21:24:59.224Z
Learning: In Fleet's MySQL IAM authentication implementation, the TLS and AllowCleartextPasswords configuration is handled in the generateMysqlConnectionString function in server/datastore/mysql/common_mysql/common.go. When no password is configured and the endpoint is an RDS endpoint, the function automatically sets allowCleartextPasswords=true and configures appropriate TLS settings. The awsIAMAuthConnector receives a properly configured base DSN and only needs to inject the IAM token as the password.

Comment thread server/datastore/redis/gen_aws_region_map.go Outdated
Comment thread tools/mysql-tests/rds/cf-template.yaml
Comment thread tools/redis-tests/elasticache/cleanup-test-env.sh
Comment thread tools/redis-tests/elasticache/cleanup-test-env.sh
Comment thread tools/redis-tests/elasticache/cleanup-test-env.sh
Comment thread tools/redis-tests/elasticache/deploy-test-env.sh
Comment thread tools/redis-tests/elasticache/iam_auth.go
titanous and others added 4 commits August 13, 2025 14:19
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Comment thread docs/Configuration/fleet-server-configuration.md Outdated

Amazon Resource Name (ARN) of the AWS Security Token Service (STS) role to use for Redis authentication.

If set, Fleet uses AWS Identity and Access Management (IAM) authentication instead of basic authentication set by `redis_username` and `redis_password` (optional).
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@noahtalerman this is subtly wrong, as mentioned in the other thread this config option can't be the flag we use because there are scenarios where there is no role to assume (the default is environmental credentials (eg. from the EC2 metadata service), assuming a role (this flag) uses the role/user those credentials are associated with to call sts:AssumeRole, and then uses that second set of chained credentials)

@sgress454
Copy link
Copy Markdown
Contributor

@titanous thanks for the awesome work on this. In the interest of moving it forward, we're going to merge this to a feature branch on the Fleet repo, and continue from there. That'll get it prioritized to be put into the next release. It'll still have all your commits in there and we'll be sure to give you a shoutout at the demo!

@titanous
Copy link
Copy Markdown
Contributor Author

@sgress454 Sounds great, and feel free to tag me when you make a draft and/or final PR if you're interested in feedback/review.

@titanous
Copy link
Copy Markdown
Contributor Author

Also FWIW, we've been running this patch since I submitted it and everything has been working with no issues.

@sgress454
Copy link
Copy Markdown
Contributor

Ok -- moved to https://github.com/fleetdm/fleet/tree/feat-1817-add-iam-auth-to-mysql-and-redis.

Also FWIW, we've been running this patch since I submitted it and everything has been working with no issues.

That's great to hear!

feel free to tag me when you make a draft and/or final PR if you're interested in feedback/review.

Will do. I don't foresee many functional changes.

@sgress454 sgress454 closed this Aug 22, 2025
sgress454 added a commit that referenced this pull request Sep 4, 2025
for #1817 

# Details

This PR gives Fleet servers the ability to connect to RDS MySQL and
Elasticache Redis via AWS [Identity and Access Management
(IAM)](https://aws.amazon.com/iam/). It is based almost entirely on the
work of @titanous, branched from his [original pull
request](#31075). The main
differences between his branch and this are:

1. Removal of auto-detection of AWS region (and cache name for
Elasticache) in favor of specifying these values in configuration. The
auto-detection is admittedly handy but parsing AWS host URLs is not
considered a best practice.
2. Relying on the existence of these new configs to determine whether or
not to connect via IAM. This sidesteps a thorny issue of whether to try
an IAM-based Elasticache connection when a password is not supplied,
since this is technically a valid setup.

# Checklist for submitter

If some of the following don't apply, delete the relevant line.

- [X] Changes file added for user-visible changes in `changes/`,
`orbit/changes/` or `ee/fleetd-chrome/changes`.
See [Changes
files](https://github.com/fleetdm/fleet/blob/main/docs/Contributing/guides/committing-changes.md#changes-files)
for more information.

## Testing

- [X] Added/updated automated tests
- [X] QA'd all new/changed functionality manually - besides using
@titanous's excellent test tool, I verified the following end-to-end:
  - [X] regular (non RDS) MySQL connection
  - [X] RDS MySQL connection using username/password
  - [X] RDS MySQL connection using IAM (no role)
  - [X] RDS MySQL connection using IAM (assuming role)
  - [X] regular (non Elasticache) Redis connection
  - [X] Elasticache Redis connection using username/password
  - [X] Elasticache Redis connection using NO password (without IAM)
  - [X] Elasticache Redis connection using IAM (no role)
  - [X] Elasticache Redis connection using IAM (assuming role)

---------

Co-authored-by: Jonathan Rudenberg <jonathan@titanous.com>
Co-authored-by: Noah Talerman <47070608+noahtalerman@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
sgress454 added a commit that referenced this pull request Sep 9, 2025
Adds support for reading server `private_key` from AWS Secrets Manager.
Combined with #31075, this should allow removing all common sensitive
secrets from the environment/config (if I missed any let me know). This
works with localstack for local development (set
`AWS_ENDPOINT_URL=$LOCALSTACK_URL`, `AWS_ACCESS_KEY_ID=test`, and
`AWS_SECRET_ACCESS_KEY=test`).

I did not include config options for `AWS_ACCESS_KEY_ID` and
`AWS_SECRET_ACCESS_KEY` because they are a bad practice vs role
credentials and defeat the purpose of this feature which is to remove
secrets from the environment/config.

# Checklist for submitter

If some of the following don't apply, delete the relevant line.

<!-- Note that API documentation changes are now addressed by the
product design team. -->

- [x] Changes file added for user-visible changes in `changes/`,
`orbit/changes/` or `ee/fleetd-chrome/changes`.
See [Changes
files](https://github.com/fleetdm/fleet/blob/main/docs/Contributing/guides/committing-changes.md#changes-files)
for more information.
- [x] Added/updated automated tests
- [x] Manual QA for all new/changed functionality

---------

Co-authored-by: Scott Gress <scott@fleetdm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Redis: AWS IAM authentication IAM authentication for MySQL and Redis

6 participants