New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Agent-Upgrade]: For Linux .tar deploy; Agent goes Unhealthy on upgrade with Endpoint Security #148
Comments
Pinging @elastic/fleet (Team:Fleet) |
@manishgupta-qasource Please review. |
Reviewed & assigned to @blakerouse CC: @EricDavisX |
We had other recent upgrade tests passing successfully, I believe this will be specific to either Ubuntu (unlikely?) or specific to something in use or set up on the vm. @amolnater-qasource it would be helpful to confirm if a straight install of 7.15-snapshot works on that same Ubuntu 20 image. If not, then it isn't necessarily an upgrade bug which helps reducing triage effort. Thank you for testing on AWS based Ubuntu - that is very helpful already. |
Yeah I wonder if this is really an upgrade issue and more of an issue with Endpoint? Seems that the upgrade worked but Endpoint is having an issue. |
Hi @EricDavisX We have revalidated linux agent upgrade issue too, however today we are not able to upgrade Linux tar agent and agent is getting We are successfully able to upgrade Windows and MAC with Endpoint security and getting no It seems like these Upgrade issues on linux are due to VSphere issue as on AWS Ubuntu 20 we are able to upgrade Agents successfully with no errors. Build details: cc: @blakerouse |
Glad to hear it was just an environment issue. |
Further, We will re-test this issue once VSphere Linux machines will not show errors and will share test results here. Thanks |
Hi @blakerouse
Build details: This issue is not reproducible on AWS machines. Thanks |
Hi @EricDavisX
Build details: Thanks |
Hi @EricDavisX
Build details: Logs: Please let us know if anything else is required. |
Thanks for the continued follow up. This comment is interesting to unpack, I wonder if our process getting hung and timing out, but needlessly, then continuing on successfully after the fact. @michalpristas do you have any thoughts? @andresrc I'm curious if we should put this to our 'urgent review' list to spend time or if we want to accept the risk that other Ubutunu 20 hosts work ok, just the ones in the team vSphere cluster that seem to be configured with something preventative (or something delaying the success). |
Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane) |
I just tested this on Ubuntu 20.04. I used Staging Cloud to create a 7.14.2 stack. I enrolled the Elastic Agent 7.14.2 on the Ubuntu 20.04 into the Default Policy. I added the Endpoint Security integration and the Elastic Agent reported Healthy. I then upgraded the stack to 7.15.1 in the staging cloud. Once complete I then selected Upgrade agent from the Kibana UI. The 7.14.2 Elastic Agent upgraded to 7.15.1 successfully and reported Healthy. I let it run for a few hours and it remains Healthy. |
@amolnater-qasource Looking at the logs from your last comment I do see the following in the logs:
Seems Endpoint Security stayed degraded for up to 5 minutes. |
Hi @blakerouse VMs used: Build details: We observed below Inconsistent behaviours on installing-upgrading linux .tar agents:
Ubuntu 20 logs: We will re-attempt this once a new snapshot build will be available. cc: @EricDavisX |
@amolnater-qasource Are you testing upgrading an Elastic Agent that is also running the Fleet Server? I think we need to separate upgrading and Elastic Agent without a Fleet Server and upgrading an Elastic Agent with a Fleet Server. My testing was around using Fleet Server in the cloud, and with release versions of Stack and Elastic Agent, which do not have the current Kibana issue. Can you try the steps I performed to see if your host shows successful there? It would be great to have a baseline of a known working path so we can determine the bad path. |
We have updated the elastic-agent.yml with Build details: Thanks |
client timeout still, either increase timeout even more, check connectivity and if by any case connection is not dropped by firewall. |
Hi @michalpristas Please let us know if anything else is required. Thanks |
.tar
agent went Unhealthy on upgrade with Endpoint Security
Hi @EricDavisX
Build details: We are successfully able to upgrade Windows and MAC agents. Thanks |
hey @amolnater-qasource is it something still up to date? |
@blakerouse can you confirm which file and value can be changed on the Agent host files to increase the timeout to re-test this? We had discussed a 10 minute value might be high enough to add more confidence in the file download. |
We have a similar SDH that is reporting that changing that value does not fix the issue. So it might be that either the setting is not working or another timeout is occurring that we do not know about currently. |
Hi @jlind23
Build details: Please let us know if anything else is required from our end. |
@blakerouse how can we move this forward then? |
@jlind23 I am going to start working on a why its not working and a proper fix. |
Hi @blakerouse
cc: @jlind23 |
On further testing we are successfully able to upgrade 7.17 Linux .tar fleet server from Centos 8 (on 2nd attempt). |
Hey @blakerouse could you please take a look here? You were working on a fix, did you find anything? |
Hi @jlind23
Build details: Hence we are closing this issue. |
Hi @jlind23
TESTED WITH UBUNTU VSPHERE VM Logs: Hence we are re-opening this issue. |
I would expect 7.17 to 8.1 to also work. |
I would also expect the same 7.17->8.1. I am also part of the school that wait the first minor to upgrade.. |
I am experiencing similar issues while upgrading from 8.0.0 to 8.0.1. Some agents upgraded fine, others won't. The Fleet server is one of those stubborn ones. The base system is always Debian 11.2 on a vmWare machine. The files themselves download fine with curl and wget in less than two seconds. When the agent tries to download them, the files are created in the download directory, but stay at 0 bytes. I have tried downloading the files manually into the download folder and retry the update and it seems to work around the issue. I am just not sure if it uses the manually downloaded files if they exist or if the download randomly worked that time. |
As discussed with @blakerouse - `Closing that one in favour of: #104 |
Kibana version: 7.15.0 Snapshot Kibana Cloud environment
Host OS and Browser version: VSphere Ubuntu
and MAC, AllBuild details:
Preconditions:
Steps to reproduce:
7.14.1 release
agent.Unhealthy
after upgrade.Debug level Logs:
logs.zip
endpoint-000000.zip
Note:
qa-ubuntu20.04-desktop
and macqa-mac-bigsur-11.0.1-release-nosip-clone-base
Expected Result:
7.14.1
Ubuntu.tar
agent should upgrade to7.15.0
with Endpoint Security and should remainHealthy
.Screenshots:
The text was updated successfully, but these errors were encountered: