Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mTLS authentication for Amazon MSK/self-managed kafka event source #10260

Closed
1 task done
mishabruml opened this issue Nov 22, 2021 · 24 comments
Closed
1 task done

mTLS authentication for Amazon MSK/self-managed kafka event source #10260

mishabruml opened this issue Nov 22, 2021 · 24 comments

Comments

@mishabruml
Copy link
Contributor

mishabruml commented Nov 22, 2021

Is there an existing issue for this?

  • I have searched existing issues, it hasn't been reported yet

Use case description

Support a further accessConfigurations: CLIENT_CERTIFICATE_TLS_AUTH) to allow configuring mTLS-authenticated lambda event sources

https://docs.aws.amazon.com/lambda/latest/dg/with-kafka.html#services-smaa-topic-add
https://aws.amazon.com/about-aws/whats-new/2021/11/aws-lambda-supports-mtls-authentication-amazon-msk-event-source/
https://aws.amazon.com/blogs/compute/introducing-mutual-tls-authentication-for-amazon-msk-as-an-event-source/

I'm happy to have a look at a PR for this?

@mishabruml
Copy link
Contributor Author

Also, not documented anywhere obvious, but creating the trigger through the AWS console I found out that if you add an encryption "Required if your Kafka brokers use certificates signed by a private CA." then AWS adds a Source access configuration entry with

{
    "type": "SERVER_ROOT_CA_CERTIFICATE",
    "uRI": "arn:aws:secretsmanager:eu-west-1:xxxxx:secret:xxxx"
  }

@pgrzesik
Copy link
Contributor

Hello @mishabruml - thanks a lot for proposing this enhancement. I believe it makes sense to add such settings - here's the corresponding CloudFormation doc that suggest there shouldn't be any problem in implementing it in the Framework: https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-lambda-eventsourcemapping-sourceaccessconfiguration.html

Before moving forward, I will sync up with the team internally and if all sounds good, we'll be more than happy to accept a PR for this one.

@mishabruml
Copy link
Contributor Author

@pgrzesik that's great news, I would love to work on this! let me know 😄

following up on my above comment, it seems that if a SERVER_ROOT_CA_CERTIFICATE is required, then the secret at the ARN specified must be of the format

"{\"certificate\":\"-----BEGIN CERTIFICATE----- xxxx -----END CERTIFICATE-----\"}"

similar to how the CLIENT_CERTIFICATE_TLS_AUTH must be of the format

"{\"certificate\":\"-----BEGIN CERTIFICATE-----xxxx -----END CERTIFICATE-----\",\"privateKey\":\"-----BEGIN PRIVATE KEY----- xxxx -----END PRIVATE KEY-----\"}"

@pgrzesik
Copy link
Contributor

Thanks for sharing the details @mishabruml - we'd be more than happy to accept a PR if you're interested 🙌

@mishabruml
Copy link
Contributor Author

Hey @pgrzesik, got a draft PR up, the changes were much simpler than I expected!

I stuck with the path of least resistance and the programming style already present. I was wondering whether it might make sense to validate the settings a little more? Or just let errors fall through to cloudformation, I suppose. It depends on the design philosophy I guess.

Another thing I noticed was the lack of integration tests for self-managed kafka event source.

Is this something you'd like me to look at, or is it out of scope for this PR?

@pgrzesik
Copy link
Contributor

pgrzesik commented Nov 26, 2021

Thanks a lot for that @mishabruml 🙇 What would you like to validate further? It's always worth considering failing as soon as possible to shorten feedback cycle.

As for the integration tests for Self-managed Kafka - we've decided against them because it's a bit challenging to manage Kafka cluster just for them, we have integration tests for MSK though and even that was quite time-consuming.

I also have another question - the PR addresses changes for kafka event source - we also have msk event source for MSK offering, would you be interested in submitting a PR for that as well? Ideally as a part of separate PR. Changes should be quite similar

@mishabruml
Copy link
Contributor Author

Hey @pgrzesik so I was thinking validation along the lines of:

If the event source is self managed, then there must be at least one bootstrap endpoint, and there must be at least one access configuration of type VPC/sasl/mtls

I was thinking also if the strings provided for the source access configuration types could be validated as either ARNs or security group/subnet identifiers (regex?)

That's fair enough about the self managed cluster integration tests. I deal with Kafka daily at work and we write integration tests against clusters we spin up in docker containers, so feel confident I could make an attempt at writing a test suite, if you think it would be valuable, but also I understand if you don't want to have to maintain this in the future.

Re: MSK I completely forgot about that! 😂Yes, happy to make those changes too 😎

@pgrzesik
Copy link
Contributor

Hey @mishabruml 👋

If the event source is self managed, then there must be at least one bootstrap endpoint, and there must be at least one access configuration of type VPC/sasl/mtls

I believe at the moment we have that - kafka event is for self-managed and it requires: accessConfigurations, bootstrapServers and topic to be defined. Is it missing something here?

I was thinking also if the strings provided for the source access configuration types could be validated as either ARNs or security group/subnet identifiers (regex?)

This is also handled on schema validation level - is there something missing about that?

That's fair enough about the self managed cluster integration tests. I deal with Kafka daily at work and we write integration tests against clusters we spin up in docker containers, so feel confident I could make an attempt at writing a test suite, if you think it would be valuable, but also I understand if you don't want to have to maintain this in the future.

This is much appreciated, but I believe maintenance on our side might be a bit challenging in the long run, also I believe the setup might be a bit time consuming and we want to keep our integration tests (relatively) fast.

Re: MSK I completely forgot about that! joyYes, happy to make those changes too sunglasses

Awesome 🙌

Thanks a lot for your work on this @mishabruml 🙇

@mishabruml
Copy link
Contributor Author

mishabruml commented Nov 30, 2021

I believe at the moment we have that - kafka event is for self-managed and it requires: accessConfigurations, bootstrapServers and topic to be defined. Is it missing something here?

OK yeah cool, I now understand this better, I can see that the validation is being applied by AJV Schema. The lack of testing around this made it a little hard to spot 😛 - I've put some in

No probs about the integration tests. You are right, setup is costly (time+effort) both in terms of initial design and build time in CI. And maintenance too....

I've pushed up some changes tonight, I think the PR for self-managed kafka is reaching completion 🙌

@pgrzesik
Copy link
Contributor

Thanks a lot @mishabruml, I'll check that out today 👍

@fdesoye
Copy link
Contributor

fdesoye commented Feb 24, 2022

Thanks @mishabruml for providing this feature!
Stumbling over a little detail though:
You implemented this check if (serverRootCaCertificate && !clientCertificateTlsAuth). Actually, from AWS console, I can provide a server root ca certificate without a client certificate, i.e. use for example scram 256 together with a server root ca cert.
In that case, the access config in the aws console looks like so:

[
  {
    "type": "VPC_SUBNET",
    "uRI": "subnet:subnet-123"
  },
  {
    "type": "VPC_SUBNET",
    "uRI": "subnet:subnet-456"
  },
  {
    "type": "VPC_SUBNET",
    "uRI": "subnet:subnet-789"
  },
  {
    "type": "VPC_SECURITY_GROUP",
    "uRI": "security_group:sg-123"
  },
  {
    "type": "SASL_SCRAM_256_AUTH",
    "uRI": "arn:aws:secretsmanager:eu-central-1:<account_id>:secret:kafka-credentials"
  },
  {
    "type": "SERVER_ROOT_CA_CERTIFICATE",
    "uRI": "arn:aws:secretsmanager:eu-central-1:<account_id>:secret:kafka_root_ca"
  }
]

According to https://docs.aws.amazon.com/lambda/latest/dg/with-kafka.html#smaa-authentication you can also pass a client certificate without a server root ca in case the server uses a cert which is "signed by a certificate authority (CA) that's in the Lambda trust store".

So client and server cert do not depend on each other from what I can see.
You probably had a reason to implement this test though so I wonder where this came from.

I currently work around this with an post install script to remove the test.

@mishabruml
Copy link
Contributor Author

Hey @fdesoye thanks for noticing this. That check is only due to my limited knowledge of mTLS and networking- I assumed that both client and server certificate were required. If @pgrzesik agrees that the validation could be relaxed then I can make a PR, or feel free yourself?

@fdesoye could you prove/explain your scenario further by providing a cloudformation template(s)?

@mishabruml
Copy link
Contributor Author

ahh i see that SERVER_ROOT_CA_CERTIFICATE is not used for mTLS only, that makes sense. I was thinking that if it was for mTLS only then of course it wouldn't make sense to provide one without the other. Could you follow up on the case or providing CLIENT_CERTIFICATE_TLS_AUTH without SERVER_ROOT_CA_CERTIFICATE ? thanks 😄

@pgrzesik
Copy link
Contributor

pgrzesik commented Mar 6, 2022

Hello 👋 So it seems like we might need to relax the restriction that we have in place currently if I'm understanding the above correctly, right?

@fdesoye
Copy link
Contributor

fdesoye commented Mar 7, 2022

Indeed, that would be my conclusion given it "works for me". I'd be happy to raise a PR for this. Would be my first one so might take a bit to find which unit tests have to be adjusted and what else I have to look at.

@mishabruml
Copy link
Contributor Author

@fdesoye happy to help you out if you need 😌

@fdesoye
Copy link
Contributor

fdesoye commented Mar 10, 2022

Small change only so I will probably manage. Will push something later this week (will mainly have to figure out the formalities of raising a proper MR).

Do we need integration tests for this?

@pgrzesik
Copy link
Contributor

Thanks @fdesoye - unit test will be just fine 👍

@fdesoye
Copy link
Contributor

fdesoye commented Mar 10, 2022

Not sure if this belongs in this thread, but I ran into a slightly annoying issue: the Kafka trigger takes rather long to deploy (probably due to whatever AWS spins up for this under the hood).
It did cost me quite some time to realise that the secrets I referenced were only created in our dev account but not in the prod account. Do we have some kind of precompile validation that could assert that the configured secrets are actually available? Would require the user to have secretsmanager:ListSecrets permissions though.

@pgrzesik
Copy link
Contributor

pgrzesik commented Mar 10, 2022

Hey @fdesoye - this is definitely a related topic, but would deserve a separate discussion in my opinion. The proposal sounds interesting, but I'm not sure if adding such functionality to Framework is the way to go - I'm not aware of any similar validations being implemented thus far. cc @mnapoli

@mnapoli
Copy link
Contributor

mnapoli commented Mar 11, 2022

Agreed, that sounds like a whole topic on its own, not sure if it would make sense here too.

@fdesoye
Copy link
Contributor

fdesoye commented Mar 11, 2022

My question is, if there is some functionality already available in the framework, that could do this kind of validation prior to kicking off a deployment.
If yes, I am happy to open a new issue and probably work on this. If there is no such concept in place already, I think it might be a bit of overkill as honestly I think the self-hosted kafka trigger is a rather unusual trigger and only adding a framework feature for this sounds not worth the effort to me.

@pgrzesik
Copy link
Contributor

We don't really have a good way to handle it right now - it can also cause issues where .e.g deploymentRole is different/has different set of permissions - in that case, the check might fail while the deployment would've succeeded. I would honestly advocate against implementing something custom like that at this point, especially that Kafka use case isn't very popular.

@fdesoye
Copy link
Contributor

fdesoye commented Mar 11, 2022

Agreed. Sounds like too much effort and risk for too little gain.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants