Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MAC-12 Agent]: Agent goes to offline state for more than 5 minutes and then gets back healthy when installed with --delay-enroll #139

Closed
amolnater-qasource opened this issue Nov 9, 2021 · 12 comments
Labels
bug Something isn't working impact:medium QA:Validated Validated by the QA Team Team:Elastic-Agent Label for the Agent team Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team

Comments

@amolnater-qasource
Copy link

Kibana version: 7.16.0 BC-3 Kibana Cloud environment

Host OS and Browser version: MAC, All

Build details:

Build: 45816
Commit: acaa761f4ce46680fd7cfbeba03a652c72dc786b
Artifact Link: https://staging.elastic.co/7.16.0-8dc8b6a1/downloads/beats/elastic-agent/elastic-agent-7.16.0-darwin-x86_64.tar.gz

Preconditions:

  1. 7.16.0 BC-3 Kibana Cloud environment should be available.

Steps to reproduce:

  1. Login to Kibana environment.
  2. Run Agent install command for MAC agent with --delay-enroll flag.
  3. Reboot your mac machine.
  4. After reboot, Agent appears Healthy on Fleet UI after reboot.
  5. Observe agent goes to Offline state for more than 5 minutes and then gets back to Healthy state.

Expected Result:
Agent should not go to offline state for more than 5 minutes when installed with --delay-enroll.

Note:

  • Agent goes Offline for 3 minutes when installed without --delay-enroll and then gets back Healthy.
  • However time for offline is long when installed with --delay-enroll flag.
@amolnater-qasource amolnater-qasource added bug Something isn't working impact:medium Team:Elastic-Agent Label for the Agent team labels Nov 9, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent (Team:Elastic-Agent)

@amolnater-qasource
Copy link
Author

@manishgupta-qasource Please review.

@manishgupta-qasource
Copy link

Reviewed & mention to @andresrc

CC: @EricDavisX

@jlind23
Copy link
Contributor

jlind23 commented Nov 9, 2021

@amolnater-qasource Why is this a bug? Because it's offline too much time? What was the time validated before?
@andresrc @nimarezainia do you remember what was validated?

@EricDavisX
Copy link
Contributor

@amolnater-qasource can you further elaborate... on steps 2 and 4/5? I think the --delay-enroll work-flow here is non-standard. Usually that option is used to install an Agent and the vm is shut-down and immediately saved off in that state (Agent has never ran). Then when it reboots the Agent comes alive. Right?

For step 2: If you are seeing the Agent come alive before reboot it is a separate bug, I believe, but we should reference the product Docs or NIma's stance if none exist.

For step 4/5

  1. After reboot, Agent appears Healthy on Fleet UI after reboot. 5.Observe agent goes to Offline state for more than 5 minutes and then gets back to Healthy state.

I think what you are saying is that the Agent show as Healthy after reboot (briefly) and then goes offline and then comes back after 5 minutes? If so, that timing is 2 minutes longer than a regular install without that option. Yes?

Presuming so, I think is confusing but maybe not an impactful bug to worry about fixing just now, so long as Agent works when it comes up fully and so long as Agent works in the above 'golden image' type use case. Would you mind testing that for comparison please? As we confirm the usual 'delay-enroll' spec we can confirm the test cases match - It needs specific steps. Exploratory tests are capable of finding interesting bugs and confusing behavior, but we may not fix them all. Good for the team to be aware and decide after we've confirmed what's going on.

I assume this doesn't happen on other Linux or macOS that we've tested with this Stack/Agent version?
...sorry for all the questions. work towards new OS support is always fun. :) Thanks for the testing!

@amolnater-qasource
Copy link
Author

Hi @jlind23
We have logged this issue as we expect agent to get back in running and Healthy state within 60-120 seconds.
However in this case for macOS 12 when installed with --delay-enroll flag agent gets offline for approximately 5 minutes.

We have revalidated this today with Default policy having only system integration on macOS 11 and macOS 12.
Details are shared below:

OS MAC Bigsur 11 MAC Monterey 12

Without –delay-enroll With –delay-enroll Without –delay-enroll With –delay-enroll
Observations: 1. Agent is installed and appears Healthy on Fleet UI, with no logs under Agent Logs tab. 1. Machine is reboot after running install command and after reboot appears Healthy on Fleet UI, with no logs under Agent Logs tab. 1. Agent is installed and appears Healthy on Fleet UI, with no logs under Agent Logs tab. 1. Machine is reboot after running install command and after reboot appears Healthy on Fleet UI, with no logs under Agent Logs tab.
2. Agent gets offline for 2 minutes and then gets back Healthy with logs under Agent Logs tab. 2. Agent gets offline for 2 minutes and then gets back Healthy with logs under Agent Logs tab. 2. Agent gets offline for 2-3 minutes and then gets back Healthy with logs under Agent Logs tab. 2. Agent gets offline for 4-5 minutes and then gets back Healthy with logs under Agent Logs tab.

cc: @EricDavisX
Please let us know if anything else is required from our end.
Thanks

@jlind23 jlind23 added the Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team label Nov 10, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

@jlind23
Copy link
Contributor

jlind23 commented Nov 10, 2021

@amolnater-qasource so the only difference is that on Macos 12 the delay-enroll increase the offline time right?

@amolnater-qasource
Copy link
Author

Hi @jlind23

on Macos 12 the delay-enroll increase the offline time right?

Yes, offline time is very high for --delay-enroll on macOS-12.
On agent being offline for a duration of 4-5 minutes, user may confuse it with inaccurate agent behaviour, as even no logs appears under Agent>Logs tab for this duration.

Thanks

@EricDavisX
Copy link
Contributor

this was deemed 'medium' impact and does not prevent usage, so it will not prevent us from citing 'support' of macOS 12 / M1 silicon environments. @nimarezainia @jlind23 confirming. I'm sending a wrap up email on macOS 12 support shortly

@jlind23
Copy link
Contributor

jlind23 commented Dec 2, 2021

adding @AndersonQ for information as he will take care of the M1 development.

@jlind23 jlind23 transferred this issue from elastic/beats Mar 7, 2022
@ph ph mentioned this issue Mar 7, 2022
23 tasks
@amolnater-qasource
Copy link
Author

Hi @jlind23
We have revalidated installing MAC 12 agent on 8.3 Snapshot using --delay-enroll flag.

  • Agent no longer goes to offline state on rebooting machine when installed with --delay-enroll flag.

Build details:
BUILD: 52512
COMMIT: df225b213b188c81888141cee2ec191424fc0649
Artifact Link: https://snapshots.elastic.co/8.3.0-5c1ff35f/downloads/beats/elastic-agent/elastic-agent-8.3.0-SNAPSHOT-darwin-x86_64.tar.gz

Hence we are closing this issue.
Thanks

@amolnater-qasource amolnater-qasource added the QA:Validated Validated by the QA Team label May 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working impact:medium QA:Validated Validated by the QA Team Team:Elastic-Agent Label for the Agent team Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team
Projects
None yet
Development

No branches or pull requests

5 participants