You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the enhancement:
Within fleet there would be an additional drop down option to 'repair agent' that would task remote agents to attempt to repair unhealthy agents. The user would then get a confirmation prompt that would let them know that this could remove unsent logs from the agent. When 'repair agent' is used the agent would run multiple actions to attempt to repair or reset the agent to a working state. These actions could include things like deleting all downloads, any cached files that may be corrupted, etc. It would also reset the agent state within fleet.
Describe a specific use case for the enhancement or feature:
An agent is in an unhealthy state following an attempted upgrade where the system powered off during the upgrade leaving partial downloads on the system. Using the 'repair agent' button would clear all downloads and restore the agent to a healthy state.
An agent is upgraded but the final 'ack' for the upgrade doesn't make it to fleet. This leaves the agent in a healthy state but the status with fleet isn't correctly synchronized. The repair agent button will re-synchronize the agent and fleet so the current state is accurate.
The text was updated successfully, but these errors were encountered:
An agent is in an unhealthy state following an attempted upgrade where the system powered off during the upgrade leaving partial downloads on the system
We don't download input or integrations anymore, only the the next agent version during upgrades. That process should be well behaved during poorly timed reboots, but there have been bugs that left the agent in a partially working state because of missing symlinks or Unix socket paths. I don't have a good example off the top of my head to link to.
The repair here would just triggering the post-install process again to fix up any paths or broken links. This is probably still useful to handle "unknown unknown" bugs, but I would prefer to get the upgrade to a place where we don't need this.
An agent is upgraded but the final 'ack' for the upgrade doesn't make it to fleet. This leaves the agent in a healthy state but the status with fleet isn't correctly synchronized. The repair agent button will re-synchronize the agent and fleet so the current state is accurate.
We should consider if having the currently installed agent version upgrade to itself would repair most issues, that would simplify the repair process into a variation of the upgrade process. #2780
Describe the enhancement:
Within fleet there would be an additional drop down option to 'repair agent' that would task remote agents to attempt to repair unhealthy agents. The user would then get a confirmation prompt that would let them know that this could remove unsent logs from the agent. When 'repair agent' is used the agent would run multiple actions to attempt to repair or reset the agent to a working state. These actions could include things like deleting all downloads, any cached files that may be corrupted, etc. It would also reset the agent state within fleet.
Describe a specific use case for the enhancement or feature:
The text was updated successfully, but these errors were encountered: