-
Notifications
You must be signed in to change notification settings - Fork 138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Inconsistent] Endpoint fails to connect to Agent after Stop-Start agent from Services on Windows #2547
Comments
@amolnater-qasource Please review. |
Secondary review for this ticket is Done. |
FYI @cmacknz |
Endpoint is indeed degraded in the attached diagnostics: - id: endpoint-default
state:
state: 2
message: 'Healthy: communicating with endpoint service'
units:
? unittype: 0
unitid: endpoint-default-f525194d-bfe6-4106-9047-e9ec8ef568cf
: state: 3
message: Applied policy {f525194d-bfe6-4106-9047-e9ec8ef568cf}
payload:
error:
code: 0
message: Success
? unittype: 1
unitid: endpoint-default
: state: 3
message: Applied policy {f525194d-bfe6-4106-9047-e9ec8ef568cf}
payload:
error:
code: 0
message: Success
features_idx: 0
version_info:
name: Endpoint
version: 8.7.1 When the agent restarts it appears to discover that endpoint is already running correctly according to the most recent logs: {"log.level":"info","@timestamp":"2023-04-25T09:06:57.751Z","log.origin":{"file.name":"coordinator/coordinator.go","file.line":983},"message":"Spawned new component endpoint-default: Starting: endpoint service runtime","log":{"source":"elastic-agent"},"component":{"id":"endpoint-default","state":"STARTING"},"ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2023-04-25T09:06:57.751Z","log.origin":{"file.name":"coordinator/coordinator.go","file.line":983},"message":"Spawned new unit endpoint-default: Starting: endpoint service runtime","log":{"source":"elastic-agent"},"component":{"id":"endpoint-default","state":"STARTING"},"unit":{"id":"endpoint-default","type":"output","state":"STARTING"},"ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2023-04-25T09:06:57.751Z","log.origin":{"file.name":"coordinator/coordinator.go","file.line":983},"message":"Spawned new unit endpoint-default-f525194d-bfe6-4106-9047-e9ec8ef568cf: Starting: endpoint service runtime","log":{"source":"elastic-agent"},"component":{"id":"endpoint-default","state":"STARTING"},"unit":{"id":"endpoint-default-f525194d-bfe6-4106-9047-e9ec8ef568cf","type":"input","state":"STARTING"},"ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2023-04-25T09:06:59.381Z","log.logger":"component.runtime.endpoint-default.service_runtime","log.origin":{"file.name":"runtime/service_command.go","file.line":69},"message":"2023-04-25 09:06:59: info: Main.cpp:390 Verifying existing installation","context":"command output","ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2023-04-25T09:06:59.381Z","log.logger":"component.runtime.endpoint-default.service_runtime","log.origin":{"file.name":"runtime/service_command.go","file.line":69},"message":"2023-04-25 09:06:59: info: InstallLib.cpp:630 Running [C:\\Program Files\\Elastic\\Endpoint\\elastic-endpoint.exe] [version --log stdout]","context":"command output","ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2023-04-25T09:07:00.318Z","log.logger":"component.runtime.endpoint-default.service_runtime","log.origin":{"file.name":"runtime/service_command.go","file.line":69},"message":"2023-04-25 09:07:00: info: InstallLib.cpp:669 Installed endpoint is expected version (version: 8.7.1, compiled: Fri Apr 21 14:00:00 2023, branch: 8.7, commit: 4addadb6f43eac8edccceba3b9f87a1e188d03b9)","context":"command output","ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2023-04-25T09:07:00.318Z","log.logger":"component.runtime.endpoint-default.service_runtime","log.origin":{"file.name":"runtime/service_command.go","file.line":69},"message":"2023-04-25 09:07:00: info: Util.cpp:2005 Endpoint Service is running.","context":"command output","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2023-04-25T09:07:22.432Z","log.origin":{"file.name":"coordinator/coordinator.go","file.line":989},"message":"Component state changed endpoint-default (STARTING->HEALTHY): Healthy: communicating with endpoint service","log":{"source":"elastic-agent"},"component":{"id":"endpoint-default","state":"HEALTHY","old_state":"STARTING"},"ecs.version":"1.6.0"}
{"log.level":"warn","@timestamp":"2023-04-25T09:07:22.432Z","log.origin":{"file.name":"coordinator/coordinator.go","file.line":987},"message":"Unit state changed endpoint-default (STARTING->DEGRADED): Applied policy {f525194d-bfe6-4106-9047-e9ec8ef568cf}","log":{"source":"elastic-agent"},"component":{"id":"endpoint-default","state":"HEALTHY"},"unit":{"id":"endpoint-default","type":"output","state":"DEGRADED","old_state":"STARTING"},"ecs.version":"1.6.0"}
{"log.level":"warn","@timestamp":"2023-04-25T09:07:42.436Z","log.origin":{"file.name":"coordinator/coordinator.go","file.line":987},"message":"Unit state changed endpoint-default-f525194d-bfe6-4106-9047-e9ec8ef568cf (STARTING->DEGRADED): Applied policy {f525194d-bfe6-4106-9047-e9ec8ef568cf}","log":{"source":"elastic-agent"},"component":{"id":"endpoint-default","state":"HEALTHY"},"unit":{"id":"endpoint-default-f525194d-bfe6-4106-9047-e9ec8ef568cf","type":"input","state":"DEGRADED","old_state":"STARTING"},"ecs.version":"1.6.0"} This appears to line up with this sequence from the endpoint logs: {"@timestamp":"2023-04-25T09:06:55.2715926Z","agent":{"id":"a24dc70b-9757-4b9d-92f3-1664a3113b42","type":"endpoint"},"ecs":{"version":"1.11.0"},"log":{"level":"error","origin":{"file":{"line":123,"name":"AgentConnectionInfo.cpp"}}},"message":"AgentConnectionInfo.cpp:123 Agent process is not root/admin or validation failed, disconnecting","process":{"pid":7680,"thread":{"id":7604}}}
{"@timestamp":"2023-04-25T09:06:55.2715926Z","agent":{"id":"a24dc70b-9757-4b9d-92f3-1664a3113b42","type":"endpoint"},"ecs":{"version":"1.11.0"},"log":{"level":"warning","origin":{"file":{"line":180,"name":"AgentConnectionInfo.cpp"}}},"message":"AgentConnectionInfo.cpp:180 Failed to established stage 1 connection to agent","process":{"pid":7680,"thread":{"id":7604}}}
{"@timestamp":"2023-04-25T09:06:55.2715926Z","agent":{"id":"a24dc70b-9757-4b9d-92f3-1664a3113b42","type":"endpoint"},"ecs":{"version":"1.11.0"},"log":{"level":"error","origin":{"file":{"line":646,"name":"AgentComms.cpp"}}},"message":"AgentComms.cpp:646 Unable to retrieve connection info from Agent(Agent is not running as root)","process":{"pid":7680,"thread":{"id":7604}}}
{"@timestamp":"2023-04-25T09:07:01.3065017Z","agent":{"id":"a24dc70b-9757-4b9d-92f3-1664a3113b42","type":"endpoint"},"ecs":{"version":"1.11.0"},"log":{"level":"info","origin":{"file":{"line":128,"name":"AgentConnectionInfo.cpp"}}},"message":"AgentConnectionInfo.cpp:128 Validated agent (6460) is root/admin","process":{"pid":7680,"thread":{"id":7604}}}
{"@timestamp":"2023-04-25T09:07:01.3065017Z","agent":{"id":"a24dc70b-9757-4b9d-92f3-1664a3113b42","type":"endpoint"},"ecs":{"version":"1.11.0"},"log":{"level":"info","origin":{"file":{"line":136,"name":"AgentConnectionInfo.cpp"}}},"message":"AgentConnectionInfo.cpp:136 Established stage 1 connection to agent","process":{"pid":7680,"thread":{"id":7604}}}
{"@timestamp":"2023-04-25T09:07:02.3156873Z","agent":{"id":"a24dc70b-9757-4b9d-92f3-1664a3113b42","type":"endpoint"},"ecs":{"version":"1.11.0"},"log":{"level":"info","origin":{"file":{"line":700,"name":"AgentComms.cpp"}}},"message":"AgentComms.cpp:700 Connecting to Agent.","process":{"pid":7680,"thread":{"id":7604}}}
{"@timestamp":"2023-04-25T09:07:17.2392486Z","agent":{"id":"a24dc70b-9757-4b9d-92f3-1664a3113b42","type":"endpoint"},"ecs":{"version":"1.11.0"},"log":{"level":"info","origin":{"file":{"line":236,"name":"BulkQueueConsumer.cpp"}}},"message":"BulkQueueConsumer.cpp:236 Sent 257 documents to Elasticsearch","process":{"pid":7680,"thread":{"id":4892}}}
{"@timestamp":"2023-04-25T09:07:22.431162Z","agent":{"id":"a24dc70b-9757-4b9d-92f3-1664a3113b42","type":"endpoint"},"ecs":{"version":"1.11.0"},"log":{"level":"info","origin":{"file":{"line":728,"name":"AgentComms.cpp"}}},"message":"AgentComms.cpp:728 Agent connection established.","process":{"pid":7680,"thread":{"id":7604}}}
{"@timestamp":"2023-04-25T09:07:22.431162Z","agent":{"id":"a24dc70b-9757-4b9d-92f3-1664a3113b42","type":"endpoint"},"ecs":{"version":"1.11.0"},"log":{"level":"info","origin":{"file":{"line":55,"name":"BigChief.cpp"}}},"message":"BigChief.cpp:55 Agent process notification for protection: 6460","process":{"pid":7680,"thread":{"id":7604}}}
{"@timestamp":"2023-04-25T09:07:22.431162Z","agent":{"id":"a24dc70b-9757-4b9d-92f3-1664a3113b42","type":"endpoint"},"ecs":{"version":"1.11.0"},"log":{"level":"info","origin":{"file":{"line":76,"name":"BigChief.cpp"}}},"message":"BigChief.cpp:76 Agent process [6460] protection result: Success","process":{"pid":7680,"thread":{"id":7604}}}
{"@timestamp":"2023-04-25T09:07:22.431162Z","agent":{"id":"a24dc70b-9757-4b9d-92f3-1664a3113b42","type":"endpoint"},"ecs":{"version":"1.11.0"},"log":{"level":"info","origin":{"file":{"line":485,"name":"AgentContext.cpp"}}},"message":"AgentContext.cpp:485 Successfully created reader/writer for CheckinV2Api","process":{"pid":7680,"thread":{"id":9132}}}
{"@timestamp":"2023-04-25T09:07:22.4338289Z","agent":{"id":"a24dc70b-9757-4b9d-92f3-1664a3113b42","type":"endpoint"},"ecs":{"version":"1.11.0"},"log":{"level":"info","origin":{"file":{"line":550,"name":"AgentContext.cpp"}}},"message":"AgentContext.cpp:550 Not applying config from agent as it has not changed","process":{"pid":7680,"thread":{"id":9132}}} To me it appears like endpoint correctly marks itself as DEGRADED when the agent process is detected as stopped but then never clears it. @brian-mckinney any insight into why endpoint never clears its DEGRADED status after it re-establishes a connection with the agent? How would the DEGRADED state here be fixed? It's not clear to me that this is an issue with agent itself. |
@cmacknz I think there is a logic bug.
cc: @nfritts |
@harshitgupta-qasource This might be resolved. Can we verify if this bug exists in 8.8 BC2? |
Hi Team, We have revalidated this issue on the latest 8.8.0 BC3 Kibana cloud environment and found it fixed now. Observations:
Build details: Hence, we are closing this issue and marking as QA: Validated. Thanks |
Hi Team While testing the 8.8.1 BC2 Kibana, we have observed this issue reproducible. Observations:
Build details: Agent Diagnostic logs: elastic-agent-diagnostics-2023-06-06T06-54-58Z-00.zip Hence, we are re-opening this issue. Please let us know if anything else is required from our end. Thanks! |
@nfritts @pierrehilbert can you please take a look at this again? |
Hi @jlind23 While testing the 8.8.2 BC1 Kibana, we have observed this issue is still reproducible. Observations:
Build details: Agent Diagnostic logs: Please let us know if anything else is required from our end. Thanks! |
Hi Team, We have revalidated this issue on the latest 8.8.2 BC2 and 8.9.0 BC1 Kibana cloud environment and found it fixed now. Observations:
Build details: VERSION: 8.8.2 Hence, we are closing this issue and marking as QA: Validated. Thanks |
Hi Team, While testing the 8.9.0 BC4 Kibana, we have observed this issue is again reproducible inconsistently. Observations:
Build details: Agent Diagnostic logs: Hence, we are reopening this issue and marking it as Inconsistent. Please let us know if anything else is required from our end. Thanks! |
Pinging @elastic/elastic-agent (Team:Elastic-Agent) |
|
Kibana version: 8.7.1 BC3 Kibana cloud environment
Host OS: Windows
Build details:
VERSION: 8.7.1
BUILD: 61211
COMMIT: fde13a9709d0191b3eccee3aa20af50be8ef7e7e
Artifact Link: https://staging.elastic.co/8.7.1-14a43a6a/summary-8.7.1.html
Preconditions:
Steps to reproduce:
Failed to connect to Agent
Agent Diagnostics:
elastic-agent-diagnostics-2023-04-25T09-30-56Z-00.zip
Screenshot
Expected Result:
Endpoint should connect to Agent after Stop-Start agent from Services on Windows
Impacted Testcases
#150205
#35378
The text was updated successfully, but these errors were encountered: