Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fleet] Ability to configure agent.download.sourceURI in Kibana #100413

Closed
n0othing opened this issue May 20, 2021 · 53 comments
Closed

[Fleet] Ability to configure agent.download.sourceURI in Kibana #100413

n0othing opened this issue May 20, 2021 · 53 comments
Assignees
Labels
enhancement New value added to drive a business result Team:Fleet Team label for Observability Data Collection Fleet team

Comments

@n0othing
Copy link
Member

n0othing commented May 20, 2021

Describe the feature:

Today it's possible to run Fleet in an environment where:

  • Kibana has internet access
  • The Elastic Agent does not

However, when you trigger Agent upgrades [1], the Agent attempts to reach out to artifacts.elastic.co to pull down the latest version. You can manually edit the Agent's elastic-agent.yml and configure a custom HTTP server, but it'd be useful to set this in Kibana.

[1] https://www.elastic.co/guide/en/fleet/master/upgrade-elastic-agent.html

@n0othing n0othing added the enhancement New value added to drive a business result label May 20, 2021
@botelastic botelastic bot added the needs-team Issues missing a team label label May 20, 2021
@lukeelmers lukeelmers added Team:Fleet Team label for Observability Data Collection Fleet team triage_needed labels May 26, 2021
@elasticmachine
Copy link
Contributor

Pinging @elastic/fleet (Team:Fleet)

@mostlyjason
Copy link
Contributor

@n0othing is there any reason this needs to be set in the UI or would a setting in kibana.yml work just as well? I'm guessing users who run in air-gapped environments are not afraid of YAML, and they may prefer it since its easier to automate.

@scottdfedorov
Copy link

@mostlyjason as the user on who's behalf @n0othing entered this, can confirm that would be acceptable for us. We just don't want to have to enter it in every agent's elastic-agent.yaml file.

@mostlyjason mostlyjason changed the title [Fleet] Ability to configure agent.download.sourceURI in UI [Fleet] Ability to configure agent.download.sourceURI in Kibana Nov 3, 2021
@joshdover
Copy link
Contributor

However this is implemented, we should require HTTPS URLs from trusted CAs by default. If we want to allow regular HTTP URLs at all (a genuine question @ruflin @elastic/elastic-agent-control-plane), it should only be allowed if an admin-level setting or a config in the kibana.yml allows it. We may also want to allow trusting self-signed CAs.

For example, we could add these kibana.yml settings:

  • xpack.fleet.agents.download.allow_insecure_http - default false
  • xpack.fleet.agents.download.ca_fingerprint - sha256 fingerprint of self-signed CA for trusting custom certs

@ruflin
Copy link
Member

ruflin commented Nov 15, 2021

I think we should ONLY allow https URLs with a trusted CA. I can see that for testing purpose a workaround might be needed but for this, the change could be likely done in Elastic Agent directly or it requires Kibana to be run in dev / testing mode or similar? We should not add the option until we have a real use case for it.

For the future itself, @n0othing I would like to better understand the feature request. I would expect the target download url is set once on installation of the Elastic Agent. It is the system administrator that sets this. Why would this need a change after the installation? Adding this to Fleet would increase the attack surface as now Fleet has the power to point all Elastic Agent to any remote repository to download artifacts.

@n0othing
Copy link
Member Author

@ruflin I could see scenarios where a shop changes their naming scheme or needs to retire a given host. Being able to easily change the target URL (at scale) would be helpful in those situations.

@ruflin
Copy link
Member

ruflin commented Nov 17, 2021

@n0othing for the scenario you mention, I would expect that both urls would be working for some time and during this time hopefully a sysadmin would roll out some changes to the Elastic Agent machines to adjust it. I'm not saying it wouldn't be convenient I'm just not sure if it would cause more issues for us then solving problems.

An alternative idea to solve this problem: What if Elastic Agent could also pull down binaries through fleet-server, indirectly likely Elasticsearch -> Kibana. It is something we had discussed in the past to solve the problem of no access to the internet. Each Elastic Agent still has a trusted connection to fleet-server.

@scottdfedorov
Copy link

Hi all, apologies if this is late. I'm no longer with the company this was applicable but I can answer some questions about the original intent.

An alternative idea to solve this problem: What if Elastic Agent could also pull down binaries through fleet-server, indirectly likely Elasticsearch -> Kibana. It is something we had discussed in the past to solve the problem of no access to the internet. Each Elastic Agent still has a trusted connection to fleet-server.

This would 100% solve the issue. The issue that sparked this ticket was we needed to upgrade on agents that do not have access to the internet. I'm also closely watching your air-gapped env issue here, but the workaround was to set the download source to be a locally accessible url so that the agents could download the binaries needed for upgrade. However, having to set this manually on all the agents in a fleet was cumbersome. Sure, automation could help, but the idea was to be able to set this centrally so we wouldn't have to update all the agents.

If we can have the agents download the upgrade binaries from the fleet-server, that's the ideal for our specific use case. Then we wouldn't need to setup a server to host the binaries separately and worry about the SSL/TLS stuff (mentioned above).

@mostlyjason
Copy link
Contributor

mostlyjason commented Dec 1, 2021

Adding this to Fleet would increase the attack surface as now Fleet has the power to point all Elastic Agent to any remote repository to download artifacts.

@ruflin why trust fleet server to deliver binaries, but not deliver a download URL? Don't we require the same level of trust in Fleet Server in both cases? Are you trying to remove Kibana from the attack surface?

What if Elastic Agent could also pull down binaries through fleet-server, indirectly likely Elasticsearch -> Kibana

Where would ES or Kibana get the new binaries in this case? Some users run the entire stack in an air gapped environment and would need an on-prem artifact repo at some point in the chain.

@ruflin
Copy link
Member

ruflin commented Dec 2, 2021

My assumption was that the binaries would be uploaded to Kibana (stored in ES). An alternative we also discussed is to make it part of the package registry. In the ES / KB scenario we already have a trusted connection from where to pull the signed binaries. Instead of just changing the url, an attacked would have to upload the relevant binaries.

I'm not an expert in this area and we should likely pull someone in with more expertise to look at the bigger picture.

@hendry-lim
Copy link

Just to share with anyone who are waiting for this enhancement, we are using Ansible to automate Elastic Agent upgrade through Fleet /agents/bulk_upgrade API. By specifying the source_uri to this API, the Elastic Agent is able to download the tar ball/zip files from the specified URL and upgrade itself.

@jugsofbeer
Copy link

I dont mind which solution is chosen for an airgap environment, but can we choose one and get it merged into 7.17.0

Please.
Please.
Please.

We want to use this but this is the blocker.

@mbudge
Copy link

mbudge commented Jan 11, 2022

Hi,

We also need to upgrade Elastic Agent through Fleet in an air gapped with no internet access.

To do this securely maybe Fleet needs application whitelisting, so only approved executable files can be used in the upgrade process after they are downloaded from a local web/file server or Fleet server. The verification is done using a cryptographic algorithm like sha256 and the Elastic Package Repository could maintain a list of approved hash values.

We are fine deploying the new executables via a local web/file server (via a docker container), elastic package repo or fleet server. We would prefer not to have to deploy more SSL certificates to all of the servers for client/server verification.

If configured correctly, an attacker would have to compromise the hosting server and privilege escalate to root before they can add malware, then login to the local Kibana instance to run the upgrade process.

Thanks

@mostlyjason
Copy link
Contributor

Thanks I love hearing from our users because it will help us prioritize this and deliver the best solution. @jugsofbeer I appreciate your enthusiasm! This is currently on our 8.x roadmap. I can't give a specific time frame for this feature right now.

@jugsofbeer
Copy link

We have been waiting to use fleet in the v7 for 6months while all the necessary parts are fixed and finishes for an ece based air gapped environment with logstash involved.

So to NOT have this last item in v7 is a real buzzkill.

when v8 comes out as GA, we wouldnt upgrade to it until v8.2 or v8.3 when all the bugs we need solved are solved, just like or v6 to v7process.

I really want to impress on you how vital getting this one last item into v7.17 is.

@BokuKDVZ
Copy link

Hi, i'd like to support jugsofbeer we are waitung to use fleet and the elastic agent for quite some time now and now it's almost ready to use but this feature is the last one missing.

@joshdover
Copy link
Contributor

joshdover commented Jan 19, 2022

As mentioned above in #100413 (comment), our upgrade API endpoints actually already supports accepting a source_uri field and passing that through to the Agent upgrade action that is sent out. It appears Elastic Agent supports limiting this ability via the capabilties.yml settings, but it's not limited by default.

For the security concerns discussed in #100413 (comment), I don't think we should accept this on the Kibana Fleet API by default and it should only be configurable by a user with access to kibana.yml. Otherwise, anyone with Fleet API access can push arbitrary binaries to Elastic Agents. At the very least I think it should be disabled by default and require that users add URL patterns to an allowlist of allowed download locations. This would be a breaking change to our unstable/unsupported API. The config would be something like:

xpack.fleet.agents:
  upgrade:
    sourceUri:
      allowlist:
        # Allow any source_uri that begins with this pattern
        - https://myinternaldomain.com/agents/versions/* 
        # Non-HTTPS URLs also require `allowInsecureHttp` to be set
        - http://myinsecuredomain.com/agents/versions/*
      allowInsecureHttp: true

We're about to open access to Fleet to non-superusers in #122347 which increases the impact of this being exposed today. I wonder if we should prioritize this sooner. @ruflin

@jugsofbeer
Copy link

When you suggest limiting this type of thi g to people who have access to kibana.yml can you please consider this from the point of view from someone who uses kibana via E.C.E on-premise.

We have access to user settings , but not the full kibana.yml .

@mostlyjason
Copy link
Contributor

@jugsofbeer We are not planning to release this feature in 7.17 because we are already wrapping up development on that release. The only option is sometime in 8.x, at least for the easier UX described above.

Another workaround that might work in 7.x. is to use the proxy setting on the agent https://www.elastic.co/guide/en/fleet/7.16/fleet-agent-proxy-support.html#_set_the_proxy_for_downloading_artifacts. You'd have to set up an on-prem proxy server and mirror the artifacts repo as well. FYI the docs are incorrect it does respect environment variables elastic/observability-docs#1470. I have not tried this with ECE, but it could be worth a shot.

@jugsofbeer
Copy link

In an air gapped environment though we wouldnt want to connect to the proxy for thousands of machines to all independently go out to the internet and get the files.

We want to avoid that and just be able to tell the agents to grab the files from a noninternet url, without using a proxy.

Sounds like the analysis / requirements on this topic are not finalised.

@joshdover
Copy link
Contributor

After further offline discussion, it was raised that our binary code signature verification controls should be good enough to prevent any tampering of agent downloads, regardless of source. I think we should be safe to allow configuring a download location of any HTTP or HTTPS URL from Kibana.

I think the only work required on the Fleet UI team here is to:

@mukeshelastic
Copy link

@joshdover thanks for providing clarity on this issue. I think we still need to publish the agent binary artifacts to EPR and then have operators configure custom source URI as EPR for agent binary downloads.

@jlind23
Copy link
Contributor

jlind23 commented May 17, 2022

@mukeshelastic why should we push the agent to EPR? I would rather have users dealing with agent binaries by their own instead of overloading the EPR docker image size.
As sourceURI value is configurable, they may choose to rely on another registry.

@user-987654321
Copy link

We are a 20K+ device estate and this kills Fleet/Elastic Agent for our use case. It should be a basic function of the product. The fact that this has had zero movement in 1yr is a concern that the feature will never arrive

@jlind23
Copy link
Contributor

jlind23 commented May 17, 2022

@user-987654321 what exactly kills your Fleet/Elastic Agent? The fact that there is no specific URI or the fact that all agents are upgrading at the same time?

@user-987654321
Copy link

Hi @jlind23

Its an air gapped env. having to set this manually on all the agents is cumbersome, automation might help, but the we should be able set this centrally so we wouldn't have to update all the agents manually/automated. AV vendors have the ability to push Agent Binary Updates out centrally without Internet Access. I dont undestand why this is missing from Fleet?

@markniemeijer
Copy link

Hi, same for us. We also have multiple customers with 10K plus devices and no way to connect to the internet. And even those devices would connect to the internet, that is not what you want when upgrading an agent.
You want the binary as close to the source as possible. Hence somewhere in the Fleet server ( I know that the fleet server = the elastic agent in fleet mode). Agents are already connecting to the fleet server, so that route is always available. And no need that 10K devices go individually to the internet to download 10.000 times the same binary.

Basically you want from the GUI to start the update process and the tool will gradually upgrade all devices without us jumping through hoops, like setup a network location somewhere, or setup a webserver with the binary, or whatever.

Thanks.

@user-987654321
Copy link

@user-987654321 what exactly kills your Fleet/Elastic Agent? The fact that there is no specific URI or the fact that all agents are upgrading at the same time?

Also the upgrade at the same time needs fixing. Other providers have a randomisation interval feature to prevent the network being drowned during the update period. You would need a feature like this.

@mukeshelastic
Copy link

@jlind23 If we don't publish agent binaries to Elastic owned airgapped artifactory then we rely on customers to have an existing artifactory and having automation to copy the agent binaries uploaded to these artifactory instances.

@nimarezainia have we done any research with customers to know all airgapped customers run some sort of airgapped artifactory themselves?

@user-987654321 @markniemeijer do you folks air-gapped, self managed artifactory that you can upload agent binaries to?

@user-987654321 #130259 is an attempt to provide rolling upgrades to avoid network bandwidth congestion when upgrading thousands of agents.

@dyltay
Copy link

dyltay commented May 17, 2022

@mukeshelastic In our air-gapped environment, we have Red Hat Satellite available for storing agent binaries. My 2c...

@jugsofbeer
Copy link

We run airgapped ECE and our large number of beats agents installed do not have a proxy configured on the OS and never will so id want a simple way to either define where to get the packages from within a locally accessible url.

We have a local docker registry for our ece docker images so would happilly put the images there and supply the urls in a config.

Would be nice if fleet could tell an agemt to grab the next version in advance and save to local cache for the later upgrades. Stagger the downloads and stagger the upgrades.

@markniemeijer
Copy link

yes, that would also be a solution for us. We can then also host these binaries in Dockerized platforms as well as our Kubernetes hosted platforms. As long as we can easily point the clients towards this "repo" we are fine. And with easy I mean just configurable from the Fleet GUI. Without the need of proxy url's, specific kibana.yml settings or elastic-agent.yml settings. Just cofigurable within the whole managed Fleet solution.

Thanks!

@user-987654321
Copy link

user-987654321 commented May 19, 2022

We run airgapped ECE and our large number of beats agents installed do not have a proxy configured on the OS and never will so id want a simple way to either define where to get the packages from within a locally accessible url.

We have a local docker registry for our ece docker images so would happilly put the images there and supply the urls in a config.

Would be nice if fleet could tell an agemt to grab the next version in advance and save to local cache for the later upgrades. Stagger the downloads and stagger the upgrades.

This gets my vote. we already use ECE air gapped aswell so we have the private docker registry up and running. Makes sense to re-use kit that is already in place.

@user-987654321
Copy link

yes, that would also be a solution for us. We can then also host these binaries in Dockerized platforms as well as our Kubernetes hosted platforms. As long as we can easily point the clients towards this "repo" we are fine. And with easy I mean just configurable from the Fleet GUI. Without the need of proxy url's, specific kibana.yml settings or elastic-agent.yml settings. Just cofigurable within the whole managed Fleet solution.

Thanks!

100% needs to be a setting from the Fleet GUI and not setting via a YAML file

@joshdover
Copy link
Contributor

Thanks all for the feedback here, we’re evaluating options and are looking to schedule this feature soon. I want to address a few points:

  • We are in the final stages of implementing a rolling upgrade feature to space out agent upgrades to avoid network saturation. You can follow that work in [Fleet] Implement rolling upgrades for Agent upgrades #130259
  • We definitely plan to allow configuring the download source via the UI
  • Hosting and/or caching agent binaries through Fleet Server is an option we’re evaluating. It’s good to hear that some here agree this would be a viable option since agents already have access to this endpoint.
  • I don’t believe that hosting via a Docker registry will work for non-Docker agents and we do not yet have support for remote upgrades of Docker agents.
  • For those of you using our Docker container, I’m curious how you would want Fleet to be able to remotely upgrade those Agents. I believe this would require giving the Agent’s Docker container privileged access to the Docker socket (for plain Docker) or privileged access to the orchestration layer (eg. Kubernetes API). Is this something that would be acceptable to you from an operations perspective?

@jlind23
Copy link
Contributor

jlind23 commented Jun 3, 2022

@AndersonQ Hopefully on Elastic Agent nothing has to be done. This is a topic we discussed today with @michalpristas and @ph - For the record, this is the Elastic Agent upgrade structure:

type ActionUpgrade struct {
	ActionID         string `yaml:"action_id"`
	ActionType       string `yaml:"type"`
	ActionStartTime  string `json:"start_time" yaml:"start_time,omitempty"` // TODO change to time.Time in unmarshal
	ActionExpiration string `json:"expiration" yaml:"expiration,omitempty"`
	Version          string `json:"version" yaml:"version,omitempty"`
	SourceURI        string `json:"source_uri,omitempty" yaml:"source_uri,omitempty"`
}

So this issue is only a fleet-ui issue then.

@kpollich kpollich assigned criamico and unassigned AndersonQ Jun 7, 2022
@bradenlpreston
Copy link

@joe-desimone , @mark-dufresne , @ferullo - this seems promising. Could we potentially host the endpoint protection artifacts (models, default lists, etc.) in the same way?

@bradenlpreston
Copy link

@mark-dufresne
Copy link

@joe-desimone , @mark-dufresne , @ferullo - this seems promising. Could we potentially host the endpoint protection artifacts (models, default lists, etc.) in the same way?

@bradenlpreston Maybe, we haven't designed an offline solution for endpoint artifacts yet. Good to keep this in mind when we do that.

@jen-huang
Copy link
Contributor

This has been implemented in Kibana with #133828 and will land in 8.4.

@hop-dev
Copy link
Contributor

hop-dev commented Aug 3, 2022

@criamico not sure if this would be an enhancement or a bug, if I only have one agent binary download URL, would users expect the agent binary download command to be changed here?

Screenshot 2022-08-03 at 15 34 09

Here are my download URLs:
Screenshot 2022-08-03 at 15 35 48

@criamico
Copy link
Contributor

criamico commented Aug 3, 2022

@hop-dev I think that this would be an enhancement, I don't remember seeing it in the design docs. As you suggested, it would make sense to have at least the user configured default value there, rather than the elastic default.

@nimarezainia what do you think? I can file an enhancement ticket for this.

@hop-dev
Copy link
Contributor

hop-dev commented Aug 3, 2022

@criamico I found a few very minor UI issues as part of my testing #138008

@hop-dev
Copy link
Contributor

hop-dev commented Aug 4, 2022

Another potential enhancement. I am not sure users should be able to edit or delete the elastic artifacts host https://artifacts.elastic.co/downloads/ @criamico @kpollich ? (other than to change it from the default) we could have it as "managed". If they accidentally deleted it it would be reasonably difficult for a user to restore it.

@criamico
Copy link
Contributor

criamico commented Aug 4, 2022

Another potential enhancement. I am not sure users should be able to edit or delete the elastic artifacts host https://artifacts.elastic.co/downloads/ @criamico @kpollich ? (other than to change it from the default) we could have it as "managed". If they accidentally deleted it it would be reasonably difficult for a user to restore it.

I was thinking the same, at the moment the user is able to completely get rid of it but I don't know if it's something we should allow. It could be an useful enhancement to make it managed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New value added to drive a business result Team:Fleet Team label for Observability Data Collection Fleet team
Projects
None yet
Development

No branches or pull requests