Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Upgrade-related logging #3382

Merged
merged 9 commits into from
Sep 14, 2023

Conversation

ycombinator
Copy link
Contributor

@ycombinator ycombinator commented Sep 8, 2023

What does this PR do?

This PR makes several improvements to logs emitted by the Agent upgrade process:

  • Changes all DEBUG level messages to INFO level in the upgrade process, whether that's the steps that happen in the Agent or in the Upgrade Watcher.
    • The only exception to this is log messages where files/directories from the Agent artifact are being unpacked, as logging those at INFO level is unhelpefully verbose.
  • Logs the Agent's version (including hash) and PID as early as possible after start up.
  • Logs the Upgrade Watcher's version (including hash) and PID as early as possible after start up.
  • Logs the Upgrade Watcher's PID when that process is spawned from the "main" Agent process.

Why is it important?

To increase visibility into the Agent upgrade process, so we can tell from logs exactly where and how upgrades might have failed.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in ./changelog/fragments using the changelog tool
  • I have added an integration test or an E2E test

How to test this PR locally

Since this PR makes changes to the upgrade code in the coordinator's execution path, which is executed by the pre-upgrade Agent binary, as well as upgrade watcher code, which is executed by the post-upgrade Agent binary, two separate tests are necessary.

Testing the changes in the upgrade code in the coordinator's execution path

  1. Build Elastic Agent package from this PR's branch.
    EXTERNAL=true SNAPSHOT=true DEV=true PLATFORMS=linux/arm64 PACKAGES=tar.gz mage package
    
  2. Install it.
  3. Start checking the Agent logs for upgrade-related logs.
    hash=$(sudo elastic-agent version --binary-only --yaml | grep commit | cut -w -f3 | cut -c1-6)
    sudo tail -F /Library/Elastic/Agent/data/elastic-agent-$hash/logs/elastic-agent-$(date +%Y%m%d).ndjson | grep -i upgrade
    
  4. In a separate window, trigger an upgrade to another version of Agent.
    sudo elastic-agent upgrade 8.9.2
    
  5. Verify that we have more upgrade-related logs in the Agent logs than before this PR.

Testing the changes in the upgrade watcher code

  1. Install an older version of Agent, but one that's >= 8.10.0, since that's the version where the pre-upgrade Agent spawns the Upgrade Watcher using the post-upgrade Agent binary.
  2. In a separate window, build Elastic Agent package from this PR's branch.
    EXTERNAL=true SNAPSHOT=true DEV=true PLATFORMS=linux/arm64 PACKAGES=tar.gz mage package
    
  3. Trigger an upgrade to the built Agent.
    sudo elastic-agent upgrade 8.11.0-SNAPSHOT --source-uri=file://$(pwd)/build/distributions --skip-verify
    
  4. Check the Upgrade Watcher logs for upgrade-related logs.
    hash=$(sudo elastic-agent version --binary-only --yaml | grep commit | cut -w -f3 | cut -c1-6)
    sudo tail -n +1 -F /Library/Elastic/Agent/data/elastic-agent-$hash/logs/elastic-agent-watcher-$(date +%Y%m%d).ndjson | grep -i upgrade
    
  5. Verify that we have more upgrade-related logs in the Upgrade Watcher logs than before this PR.

Related issues

@mergify
Copy link
Contributor

mergify bot commented Sep 8, 2023

This pull request does not have a backport label. Could you fix it @ycombinator? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-v./d./d./d is the label to automatically backport to the 8./d branch. /d is the digit

NOTE: backport-skip has been added to this pull request.

@mergify mergify bot added the backport-skip label Sep 8, 2023
@elasticmachine
Copy link
Collaborator

elasticmachine commented Sep 8, 2023

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2023-09-11T16:51:21.473+0000

  • Duration: 25 min 9 sec

Test stats 🧪

Test Results
Failed 0
Passed 6285
Skipped 59
Total 6344

💚 Flaky test report

Tests succeeded.

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages.

  • run integration tests : Run the Elastic Agent Integration tests.

  • run end-to-end tests : Generate the packages and run the E2E Tests.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@mergify
Copy link
Contributor

mergify bot commented Sep 8, 2023

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b improve-upgrade-logging upstream/improve-upgrade-logging
git merge upstream/main
git push upstream improve-upgrade-logging

@ycombinator ycombinator added the Team:Elastic-Agent Label for the Agent team label Sep 8, 2023
@ycombinator ycombinator marked this pull request as ready for review September 8, 2023 16:26
@ycombinator ycombinator requested a review from a team as a code owner September 8, 2023 16:26
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent (Team:Elastic-Agent)

@elasticmachine
Copy link
Collaborator

🌐 Coverage report

Name Metrics % (covered/total) Diff
Packages 98.78% (81/82) 👍
Files 66.212% (194/293) 👍
Classes 65.562% (356/543) 👍
Methods 52.627% (1122/2132) 👍
Lines 38.241% (12789/33443) 👎 -0.001
Conditionals 100.0% (0/0) 💚

@ycombinator ycombinator enabled auto-merge (squash) September 12, 2023 20:30
@ycombinator
Copy link
Contributor Author

buildkite test this

@elastic-sonarqube
Copy link

SonarQube Quality Gate

Quality Gate failed

Failed condition 17.2% 17.2% Coverage on New Code (is less than 40%)

See analysis details on SonarQube

@cmacknz
Copy link
Member

cmacknz commented Sep 14, 2023

Force merging to get past the sonarqube gate, we don't need unit test coverage for log level changes.

@cmacknz cmacknz merged commit 66fa5a7 into elastic:main Sep 14, 2023
20 of 24 checks passed
mergify bot pushed a commit that referenced this pull request Sep 14, 2023
* Log all upgrade-related messages at INFO level

* Print Agent PID and version at startup

* Print Upgrade Watcher PID and Agent version at startup

* Use ECS keys

* Log invoked upgrade watcher PID along with invoking Agent PID

* Add CHANGELOG entry

* Revert to DEBUG level logging on file extractions

* Removing redundant log line

* Running mage fmt

(cherry picked from commit 66fa5a7)

# Conflicts:
#	internal/pkg/agent/application/upgrade/crash_checker.go
@ycombinator ycombinator deleted the improve-upgrade-logging branch September 14, 2023 01:03
pierrehilbert pushed a commit that referenced this pull request Sep 14, 2023
* Improve Upgrade-related logging (#3382)

* Log all upgrade-related messages at INFO level

* Print Agent PID and version at startup

* Print Upgrade Watcher PID and Agent version at startup

* Use ECS keys

* Log invoked upgrade watcher PID along with invoking Agent PID

* Add CHANGELOG entry

* Revert to DEBUG level logging on file extractions

* Removing redundant log line

* Running mage fmt

(cherry picked from commit 66fa5a7)

# Conflicts:
#	internal/pkg/agent/application/upgrade/crash_checker.go

* Resolve conflict

---------

Co-authored-by: Shaunak Kashyap <ycombinator@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants