-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: support static tokens to be used for agent enrollment #2654
Conversation
647d485
to
a0c9c7f
Compare
8f99d7e
to
9e482fc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @olegsu, exciting to see this up already. I have a few notes & questions on an initial look:
- What is the behavior on checkin if the specified policy ID does not actually exist yet?
- Because the policies are initialized by Kibana asynchronously from Fleet Server startup, I think we absolutely need to handle this case. Probably the agent could receive some default / mostly empty policy that doesn't do much but I'm not sure what this would break on the Agent side, for instance if there are no inputs and no output.
- We will need to have e2e tests that include Agent and Fleet Server for this scenario because it will be important not to break it
- I think we will want to limit who can enroll agents using a static token.
- I'd suggest we do this by IP address / range so we can lock this down to only agents deployed within our infrastructure internally.
- This doesn't necessarily need to be implemented in the first PR but we should think about what the final config schema should be so we can add this config later without making a breaking change to the structure
- This definitely needs unit and e2e tests before we'll be able to merge the first PR.
Thank you for the quick feedback, @joshdover
This is a good point, I agree we need to validate the policy is been created already. I see two additional options this can be achieved:
On the other side, sending an empty policy to the agent will also force us to update it once it has been created, which I am not sure is something we want to handle at this point.
Right, we need to have some ability to do that, not sure how though.
Thank you, will try to do that. |
Thinking about this more, I think the enrollment should reject until the policy is created. This way we avoid having agents in the
Yeah that's fine, let's just make sure the config will allows to add it later. If we needed to have different allowed IPs per policy_token, that would be harder to add without a breaking change to the config schema you proposed. Maybe we should change the inputs:
- type: fleet-server
policy.id: "${FLEET_SERVER_POLICY_ID:fleet-server-policy}"
server:
auth: statis
static_policy_tokens:
enabled: true
allowed_ips: [10.0.0.0/24] # allowlist for all tokens
policy_tokens:
- policy_id: 901b70f0-fefa-11ed-aa5e-5974b0535e80
token: abcdefg
allowed_ips: [10.0.0.0/24] # allowlist for just this token A couple other notes:
|
Great work @olegsu! @joshdover Thanks for your comments.
Could you elaborate why do you think this limitation is required? If we still want to defend ourselves from a scenario where an external user gets a static enrollment token somehow, and only allow "internal" agents to enroll using a static token, I'd argue against basing this limitation on IP address or range. |
@eyalkraft To be clear - I wasn't proposing we implement IP range restrictions in this PR, it is just an example. My main point was that we need to make the config more flexible to add more options later if we need. The token itself may be good enough - my thinking here was just adding an additional layer of protection in the case that this token is compromised or we use a really bad default (0000) for some reason. Happy to have a deeper conversation about this - but I don't think we need to have this problem solved now in order to move forward with the current PR and unblock validation of this feature. |
Sounds good, will add that.
I will update the names, sure. tokens:
abcd:
policy: some-id
# ... more props ...
Do you still think its better to have an array? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with what @joshdover has stated, we should not accept enroll requests if the policy does not exist.
I know that we're adding this to support our cloud efforts, but it's also likely that this feature will be used in infrastructure as code deployments by customers so we'll need to add user facing documentation and guidelines as well.
internal/pkg/config/input.go
Outdated
Bulk ServerBulk `config:"bulk"` | ||
GC GC `config:"gc"` | ||
Instrumentation Instrumentation `config:"instrumentation"` | ||
StaticPolicyTokens StaticPolicyTokens `config:"static_policy_tokens"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that having enrollment tokens as a list as @joshdover suggested is the most straightforward.
And we should add a description in fleet-server.reference.yml
.
I don't think we need to be concerned with list/map performance implications on our initial implementation, it can be changed later if we need to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that having enrollment tokens as a list as @joshdover suggested is the most straightforward.
And we should add a description in fleet-server.reference.yml.
Thank you for the feedback, added the reference to fleet-server.reference.yml
I don't think we need to be concerned with list/map performance implications on our initial implementation, it can be changed later if we need to.
I dont think that it would be that easy to change in the future but I have updated it to hold a slice.
internal/pkg/api/handleEnroll.go
Outdated
if et.cfg.StaticPolicyTokens.Enabled { | ||
// Validate that an enrollment record exists for a key with this id. | ||
if policy, ok := et.cfg.StaticPolicyTokens.PolicyTokens[enrollmentAPIKey.Key]; ok { | ||
enrollAPI = &model.EnrollmentAPIKey{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably want a debug message at this point as well.
internal/pkg/api/handleEnroll.go
Outdated
func (et *EnrollerT) processRequest(zlog zerolog.Logger, w http.ResponseWriter, r *http.Request, rb *rollback.Rollback, enrollmentAPIKeyID, ver string) (*EnrollResponse, error) { | ||
func (et *EnrollerT) processRequest(zlog zerolog.Logger, w http.ResponseWriter, r *http.Request, rb *rollback.Rollback, enrollmentAPIKey *apikey.APIKey, ver string) (*EnrollResponse, error) { | ||
var enrollAPI *model.EnrollmentAPIKey | ||
if et.cfg.StaticPolicyTokens.Enabled { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should try to keep our authentication handling in handleEnroll
(currently line 76), we may need to move some existing logic out of processRequest
to do so
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is done to prevent the call to elastic to fetch the token since it does not really exists there.
Instead, here I think that it would be a good place to see if the policy exists and return an error in case id does not (suggested here).
7e2543f
to
b1065b4
Compare
This pull request is now in conflicts. Could you fix it @olegsu? 🙏
|
8683e7b
to
95b896f
Compare
845a569
to
5527a6f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding an e2e test!
We just need a changelog fragment for this as well.
Otherwise I mostly have nitpicks
@@ -245,3 +245,53 @@ func (suite *StandAloneSuite) TestClientAPI() { | |||
bCancel() | |||
cmd.Wait() | |||
} | |||
|
|||
func (suite *StandAloneSuite) TestStaticTokenAuthentication() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
internal/pkg/api/handleEnroll.go
Outdated
|
||
func (et *EnrollerT) fetchStaticTokenPolicy(ctx context.Context, zlog zerolog.Logger, enrollmentAPIKey *apikey.APIKey) (*model.EnrollmentAPIKey, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a comment to this func? returning nil, nil
is not very common across this codebase
internal/pkg/api/handleEnroll.go
Outdated
return nil, nil | ||
} | ||
|
||
zlog.Info().Msgf("Checking static enrollment token %s", enrollmentAPIKey.Key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(nit) change this to a debug and put an info log if a static token is found
890f433
to
a40e0a2
Compare
ed3fea7
to
f43421d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes LGTM - my only concern right now is that adding the caching in the same PR seems unnecessary. I'd prefer to be able to split that change out separately so we can revert either change independent of one another, if necessary.
Thanks for the review, I will remove the caching |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, Good Job @olegsu
Merging this after a discussion we have with @jkakavas, @michel-laterman, and @eyalkraft
|
What is the problem this PR solves?
This PR adds support for static tokens for an agent enrollment.
The reason we need this capability is to support a flow for serverless security projects when an agent should be installed at the same time as the other project services (kibana, es, fleet).
To be able to install an agent at that point, we need a predictable token that both the agent and the fleet accept.
How does this PR solve the problem?
By accepting static tokens, well just the
key
part of the token, fleet server will authenticate a request that have thiskey
in the enrollment token it uses and will send back the API key to communicate with ES. For example, passing to the agent token likebase64(0123456789:abcdefg)
the fleet server will accept it if the configuration includes the followingToday flow looks something like that
And this PR will support the following flow
How to test this PR locally
elastic-package stack up -s "elasticsearch,kibana,package-registry" --version 8.8.0 -v -d
Steps 2,3 looks simple but they require more configuration and changes (especially if I want to debug the fleet process)
First building docker image for the server will not work on arm architecture as the
make build-docker
compiles it for amd ( I guess that I was missing something).Later to command to run the image looks like
Where the change in
fleet-server.yml
was to updateinputs[0].server
withThe command to run Elastic Agent looks like
docker run -it \ --network elastic-package-stack_default \ -v $(HOME)/.elastic-package/profiles/default/certs/ca-cert.pem:/etc/ssl/certs/elastic-package.pem \ -e FLEET_ENROLLMENT_TOKEN=MDEyMzQ6YWJjZGVmZw== \ -e FLEET_ENROLL=1 \ -e FLEET_URL=https://fleet-server:8220 \ -e KIBANA_HOST=https://kibana:5601 \ -e ELASTIC_CONTAINER=true \ docker.elastic.co/elastic-agent/elastic-agent-complete:8.8.0
Design Checklist
Checklist
./changelog/fragments
using the changelog toolRelated issues