Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Elastic Agent] Fix Docker container to allow state to properly be handled #24817

Merged
merged 12 commits into from
Mar 31, 2021

Conversation

blakerouse
Copy link
Contributor

@blakerouse blakerouse commented Mar 29, 2021

What does this PR do?

This fixes the Elastic Agent Docker image to allow a proper state to be stored for the image. Previously this was not possible for the state to be placed into a volume which prevent restarts of the container to maintain its previous state.

This changes the Elastic Agent to not version its directory when running under Docker. This is key to allowing the state to be storable and being that an Elastic Agent container cannot be self-upgraded it is acceptable that it is not versioned.

Because of how the Elastic Agent ships with the builds in the download directory the state could not mount over that directory of Elastic Agent would need to re-download those files. This changes the container sub-command to rsync the builds download directory into the state directory before actually running. This allows new downloads to stay in the state directory and allows the running Elastic Agent to use the binaries that shipped with the image.

By default the state of the Docker container goes to /usr/share/elastic-agent/state. This can changed with STATE_PATH environment. Elastic Agent logs now default to output stderr so they can be collected by the container runtime. In the case that LOGS_PATH variable is set then the Elastic Agent will log to $LOGS_PATH/elastic-agent.log instead of stderr.

Other fixes include using the $STATE_PATH/data/tmp instead of /tmp for the unix sockets as the container cannot write to /tmp.

Why is it important?

Critical to allow the state of the Elastic Agent to persist restarts of the Elastic Agent Docker image.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • [ ] I have made corresponding changes to the documentation
  • [ ] I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Author's Checklist

  • State is stored in /usr/share/elastic-agent/state.
  • State path can be changed with STATE_PATH.

How to test this PR locally

$ # Build the docker container in dev-mode
$ cd x-pack/elastic-agent
$ PLATFORMS="+all linux/amd64" mage dev:package
$ # Run the docker container
$ docker run -it -e KIBANA_FLEET_SETUP=1 -e ELASTICSEARCH_HOST=http://host.docker.internal:9200 -e KIBANA_HOST=http://host.docker.internal:5601 -e FLEET_ENROLL=1 -e FLEET_URL=http://host.docker.internal:8220 -e FLEET_INSECURE=1 docker.elastic.co/beats/elastic-agent:8.0.0

Related issues

@blakerouse blakerouse added the Team:Elastic-Agent Label for the Agent team label Mar 29, 2021
@blakerouse blakerouse self-assigned this Mar 29, 2021
@botelastic botelastic bot added needs_team Indicates that the issue/PR needs a Team:* label and removed needs_team Indicates that the issue/PR needs a Team:* label labels Mar 29, 2021
@blakerouse blakerouse marked this pull request as ready for review March 29, 2021 14:34
@elasticmachine
Copy link
Collaborator

Pinging @elastic/agent (Team:Agent)

@blakerouse
Copy link
Contributor Author

/package

@elasticmachine
Copy link
Collaborator

elasticmachine commented Mar 29, 2021

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Build Cause: Pull request #24817 updated

  • Start Time: 2021-03-31T18:27:25.675+0000

  • Duration: 51 min 4 sec

  • Commit: 7d1da80

Test stats 🧪

Test Results
Failed 0
Passed 6588
Skipped 16
Total 6604

Trends 🧪

Image of Build Times

Image of Tests

💚 Flaky test report

Tests succeeded.

Expand to view the summary

Test stats 🧪

Test Results
Failed 0
Passed 6588
Skipped 16
Total 6604

@blakerouse
Copy link
Contributor Author

/package

Copy link
Contributor

@simitt simitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Log path defaults to $STATE_PATH/logs but that can also log to a completely different path by setting LOG_PATH environment.

Providing LOGS_PATH=/myapp/logs when starting the container does not log anything at this directory. All logs are always logged under $STATE_PATH/data/logs.

return fmt.Errorf("syncing download directory to STATE_PATH(%s) failed: %s", pathCfg.State, err)
}
paths.SetTop(filepath.Join(pathCfg.State, "data"))
paths.SetConfig(pathCfg.State)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make this configurable independently of the pathCfg.State?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could? What is the use case? Don't like to add an ENV unless it is useful.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a usecase with an existing directory structure:

/app
/app/config
/app/logs
/app/data

When setting STATE_PATH=/app the agent expects the configuration at /app/.

var err error
var client *kibana.Client
executable, err := os.Executable()
if err != nil {
return err
}

// set paths early so all action below use the defined paths
err = handlePaths(cfg.Agent.Paths)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The APM Server is extracted from the Elastic Agent's download folder. At the moment this is done before the Elastic Agent gets started (https://github.com/elastic/beats/pull/24817/files#diff-1f8e1e2079524931fa8f3cf121667a25baa2b998557136500d5e7596b46e9187R159), so also before you are setting the path variables here. The path variables need to be set as the very first action.

If you prefer I can change the order to make this work with the cloud mode after your PR gets merged.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think because of me pushing an update I broke the link your provided. Can you provide it again so I can see what you are talking about. Would be happy to fix it in this branch.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@blakerouse
Copy link
Contributor Author

Log path defaults to $STATE_PATH/logs but that can also log to a completely different path by setting LOG_PATH environment.

Providing LOGS_PATH=/myapp/logs when starting the container does not log anything at this directory. All logs are always logged under $STATE_PATH/data/logs.

Yes I question if the LOGS_PATH should just be removed. Being that the Elastic Agent logs to stdout when running in container mode, there is nothing to write to in that directory. I only added for brevity with the --path.logs, which does work when you want it to actually log to a file.

But for a container it should write to stdout, which is how it is setup now. That way a log collector for docker or kubernetes will automatically pick up the logs.

@blakerouse
Copy link
Contributor Author

Providing LOGS_PATH=/myapp/logs when starting the container does not log anything at this directory. All logs are always logged under $STATE_PATH/data/logs.

Adding more to this the Elastic Agent expects that logs exists at $STATE_PATH/data/logs/*. This is because the Elastic Agent itself ships those logs to Elasticsearch (it can be a different one). Adding the ability to change this path and still allow the Elastic Agent to ship them its going to make it even more confusing and error prone I feel.

Why doesn't setting STATE_PATH=/myapp and reading from /myapp/data/logs not work?

blakerouse and others added 3 commits March 30, 2021 12:41
* unify ENV for home, logs, config, data
* add CONFIG_PATH
* set paths and copy downloads before extracting APM Server
* restrict path config to ENV
@@ -77,3 +77,4 @@
- Add TLS support for Fleet Server {pull}24142[24142]
- Add support for Fleet Server running under Elastic Agent {pull}24220[24220]
- Add CA support to Elastic Agent docker image {pull}24486[24486]
- Add STATE_PATH to Elastic Agent docker image {pull}24817[24817]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also add LOGS_PATH and CONFIG_PATH

blakerouse added a commit to blakerouse/beats that referenced this pull request Mar 31, 2021
…ndled (elastic#24817)

* Adjust paths to allow setting data path when running in container mode.

* Fix ability for elastic-agent to have proper state in Docker.

* Add changelog entry.

* Fix lint.

* Fix issue with bootstrapping with Fleet Server.

* Fix lint.

* additional fixes

* unify ENV for home, logs, config, data
* add CONFIG_PATH
* set paths and copy downloads before extracting APM Server
* restrict path config to ENV

* Cleanup some issues with cloud mode.

* Fix issue with CONFIG_PATH when empty. Improve sub-process agent in enroll to exit when Elastic Agent crashes.

* Cleanup the paths to be correct.

* Update changelog.

* Fix unit tests.

Co-authored-by: simitt <silvia.mitter@elastic.co>
(cherry picked from commit 1f1fae5)
blakerouse added a commit that referenced this pull request Mar 31, 2021
…ndled (#24817) (#24879)

* Adjust paths to allow setting data path when running in container mode.

* Fix ability for elastic-agent to have proper state in Docker.

* Add changelog entry.

* Fix lint.

* Fix issue with bootstrapping with Fleet Server.

* Fix lint.

* additional fixes

* unify ENV for home, logs, config, data
* add CONFIG_PATH
* set paths and copy downloads before extracting APM Server
* restrict path config to ENV

* Cleanup some issues with cloud mode.

* Fix issue with CONFIG_PATH when empty. Improve sub-process agent in enroll to exit when Elastic Agent crashes.

* Cleanup the paths to be correct.

* Update changelog.

* Fix unit tests.

Co-authored-by: simitt <silvia.mitter@elastic.co>
(cherry picked from commit 1f1fae5)
axw added a commit to axw/apm-server that referenced this pull request Jun 13, 2021
Due to some changes in elastic-agent
(elastic/beats#24817), injection
of the apm-server binary became ineffective and we have
been running system tests with the published artifacts.

Artifacts (such as the apm-server) are now unpacked into
state/data/install/<artifact>. The state/data/install
directory is expected to be owned by the elastic-agent
user, so we can no longer bind mount the apm-server binary.
Instead, we now create a custom Docker image and copy in
the apm-server and apm-server.yml files.
axw added a commit to elastic/apm-server that referenced this pull request Jun 14, 2021
Due to some changes in elastic-agent
(elastic/beats#24817), injection
of the apm-server binary became ineffective and we have
been running system tests with the published artifacts.

Artifacts (such as the apm-server) are now unpacked into
state/data/install/<artifact>. The state/data/install
directory is expected to be owned by the elastic-agent
user, so we can no longer bind mount the apm-server binary.
Instead, we now create a custom Docker image and copy in
the apm-server and apm-server.yml files.
mergify bot pushed a commit to elastic/apm-server that referenced this pull request Jun 14, 2021
Due to some changes in elastic-agent
(elastic/beats#24817), injection
of the apm-server binary became ineffective and we have
been running system tests with the published artifacts.

Artifacts (such as the apm-server) are now unpacked into
state/data/install/<artifact>. The state/data/install
directory is expected to be owned by the elastic-agent
user, so we can no longer bind mount the apm-server binary.
Instead, we now create a custom Docker image and copy in
the apm-server and apm-server.yml files.

(cherry picked from commit 301caed)
stuartnelson3 pushed a commit to stuartnelson3/apm-server that referenced this pull request Jun 14, 2021
Due to some changes in elastic-agent
(elastic/beats#24817), injection
of the apm-server binary became ineffective and we have
been running system tests with the published artifacts.

Artifacts (such as the apm-server) are now unpacked into
state/data/install/<artifact>. The state/data/install
directory is expected to be owned by the elastic-agent
user, so we can no longer bind mount the apm-server binary.
Instead, we now create a custom Docker image and copy in
the apm-server and apm-server.yml files.
axw added a commit to elastic/apm-server that referenced this pull request Jun 15, 2021
Due to some changes in elastic-agent
(elastic/beats#24817), injection
of the apm-server binary became ineffective and we have
been running system tests with the published artifacts.

Artifacts (such as the apm-server) are now unpacked into
state/data/install/<artifact>. The state/data/install
directory is expected to be owned by the elastic-agent
user, so we can no longer bind mount the apm-server binary.
Instead, we now create a custom Docker image and copy in
the apm-server and apm-server.yml files.

(cherry picked from commit 301caed)

Co-authored-by: Andrew Wilkins <axw@elastic.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:Elastic-Agent Label for the Agent team v7.13.0
Projects
None yet
3 participants