Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add AWS_MSK_IAM SASL Mechanism #763

Closed

Conversation

cmaher
Copy link
Contributor

@cmaher cmaher commented Oct 21, 2021

Addresses #661

I've verified this against a live MSK cluster and confirmed the behavior relative to other implementations. Unfortunately, live connections can't be unit-tested, so I'm testing the signature generation, which is the heart of this mechanism. Also, as discussed in the original ticket, I've made this a submodule, with the intent that it won't cause conflicts with general user code. Please let me know if I need to change/add anything.

.circleci/config.yml Outdated Show resolved Hide resolved
Copy link
Contributor

@achille-roussel achille-roussel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exciting to see this shape up!

sasl/aws_msk_iam/msk_iam.go Outdated Show resolved Hide resolved
sasl/aws_msk_iam/msk_iam.go Outdated Show resolved Hide resolved
sasl/aws_msk_iam/msk_iam.go Outdated Show resolved Hide resolved
sasl/aws_msk_iam/msk_iam.go Outdated Show resolved Hide resolved
sasl/aws_msk_iam/go.mod Outdated Show resolved Hide resolved
sasl/aws_msk_iam/msk_iam.go Outdated Show resolved Hide resolved
sasl/aws_msk_iam/msk_iam.go Outdated Show resolved Hide resolved
sasl/aws_msk_iam/msk_iam.go Outdated Show resolved Hide resolved
@cmaher
Copy link
Contributor Author

cmaher commented Oct 22, 2021

Thanks for the quick response! I think I've addressed all the feedback here.

sasl/aws_msk_iam/msk_iam.go Outdated Show resolved Hide resolved
sasl/aws_msk_iam/msk_iam_test.go Outdated Show resolved Hide resolved
@cmaher
Copy link
Contributor Author

cmaher commented Oct 26, 2021

Just checking in to see if you want anything else done / what the next steps are

@dominicbarnes
Copy link
Contributor

@cmaher sorry about that, I had a couple other questions I wanted to raise but the last few days have been a little hectic for me.

// The sigv4.Signer to use when signing the request. Required.
Signer *sigv4.Signer
// The host of the kafka broker to connect to. Required.
Host string
Copy link
Contributor

@dominicbarnes dominicbarnes Oct 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any time I've gone to set up an MSK cluster, each broker (which requires more than 1 in all cases) has it's own distinct host name. So far, I have not found a unified host name like I see in other services with clusters (eg: Redshift, Elasticache) so should we be supporting multiple values here? (it looks like the host name is part of the signing process, so maybe that isn't as easy at it seems?)

Additionally, it seems like the MSK API exposes endpoints to retrieve the broker host names, depending on the specific authentication strategy being used. I think it would be more future-proof to allow using the MSK API to perform this look-up internally, but I think we can pass on this bit of "magic" for now and make it a later enhancement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. I was using a single broker in my experiments, but the host needs to be dynamic. I think the best way to do this while maintaining the existing sasl interface is to set the host on the context. This approach raises some questions, however, which I'll point out in my commit

@@ -258,6 +258,8 @@ func (d *Dialer) connectTLS(ctx context.Context, conn net.Conn, config *tls.Conf
return
}

const ContextKeyBrokerAddr = "brokerAddr"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering where the proper place to put this is, assuming you want to use constants instead of literals

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this is actually unnecessary now that sasl.MetadataFromContext is being used. Would you mind removing this const and updating the tests that still reference it?

@@ -6,3 +6,5 @@ require (
github.com/aws/aws-sdk-go v1.41.3
github.com/segmentio/kafka-go v0.4.21
)

replace github.com/segmentio/kafka-go => ../..
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is needed for the tests to pass in this pull-request, but it should not be merged. How should this be handled? 2 separate pull-requests?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah, in that case I would think we can just remove this line before merging, once we've gotten tests to pass. I would rather avoid 2 PRs.

@cmaher cmaher force-pushed the cmaher/add-aws-msk-iam-sasl branch from c054939 to fb5d688 Compare November 5, 2021 20:15
Copy link
Contributor

@dominicbarnes dominicbarnes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we're getting closer!

@@ -258,6 +258,8 @@ func (d *Dialer) connectTLS(ctx context.Context, conn net.Conn, config *tls.Conf
return
}

const ContextKeyBrokerAddr = "brokerAddr"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this is actually unnecessary now that sasl.MetadataFromContext is being used. Would you mind removing this const and updating the tests that still reference it?

@@ -6,3 +6,5 @@ require (
github.com/aws/aws-sdk-go v1.41.3
github.com/segmentio/kafka-go v0.4.21
)

replace github.com/segmentio/kafka-go => ../..
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah, in that case I would think we can just remove this line before merging, once we've gotten tests to pass. I would rather avoid 2 PRs.

}
signUrl := url.URL{
Scheme: "kafka",
Host: saslMeta.Host,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Locally, my tests panic because saslMeta does not exist, which I think will be corrected by updating your test cases.

That being said, should we guard against this and return an explicit error message if we detect this? (rather than panicking)

}{
{
ctx: func() context.Context {
return context.WithValue(context.Background(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these need to be updated to use sasl.WithMetadata.

@dominicbarnes
Copy link
Contributor

Howdy again! Sorry about the delay, I've been trying to get this code to work with a manually-provisioned MSK cluster just to make sure I've at least seen it in action for myself (since as you pointed out, unit-testing this is hard/unfeasible). Thankfully, after some trial and error of my own, I was able to get it to work! That makes me more comfortable moving forward with merging even without some added test coverage beyond what we have here.

I'll follow up on the existing threads, as I think we have a few outstanding ones to finish addressing. That being said, this is looking to be really close.

@@ -40,6 +40,7 @@ jobs:
paths:
- /go/pkg/mod
- run: go test -race -cover ./...
- run: find ./sasl -name go.mod -execdir go test -race -cover ./... \;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this was my idea, but it looks like it was a bad one, sorry about that! It looks like the exit code from the -execdir is being lost, so CircleCI is treating these as passing even though they panic. :(

If we don't have any other ideas, feel free to revert back to what you had before.

@cmaher
Copy link
Contributor Author

cmaher commented Nov 23, 2021

Hey, thanks for getting back to this. I probably don't have time this week, but I'll be able to get to it within the next two weeks after the holiday.

@dominicbarnes
Copy link
Contributor

This PR has been addressed by #798 so I will close this. Thank you so much for your contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants