Skip to content
Merged
45 changes: 45 additions & 0 deletions troubleshoot/ingest/fleet/common-problems.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,7 @@ Find troubleshooting information for {{fleet}}, {{fleet-server}}, and {{agent}}
* [Hosted {{agent}} is offline](#hosted-agent-offline)
* [APM & {{fleet}} fails to upgrade to 8.x on {{ecloud}}](#hosted-agent-8-x-upgrade-fail)
* [Air-gapped {{agent}} upgrade can fail due to an inaccessible PGP key](#pgp-key-download-fail)
* [{{agent}} upgrade fails on Windows with exit status `0xc0000142`](#agent-upgrade-fail-windows)
* [{{agents}} are unable to connect after removing the {{fleet-server}} integration](#fleet-server-integration-removed)
* [{{agent}} Out of Memory errors on Kubernetes](#agent-oom-k8s)
* [Error when running {{agent}} commands with `sudo`](#agent-sudo-error)
Expand Down Expand Up @@ -652,6 +653,50 @@ curl -u elastic:<password> --request POST \
In versions 8.9 and above, an {{agent}} upgrade may fail when the upgrader can’t access a PGP key required to verify the binary signature. For details and a workaround, refer to the [PGP key download fails in an air-gapped environment](https://www.elastic.co/guide/en/fleet/8.9/release-notes-8.9.0.html#known-issue-3375) known issue in the version 8.9.0 Release Notes or to the [workaround documentation](https://github.com/elastic/elastic-agent/blob/main/docs/pgp-workaround.md) in the elastic-agent GitHub repository.


## {{agent}} upgrade fails on Windows with exit status `0xc0000142` [agent-upgrade-fail-windows]

During an {{agent}} upgrade on Windows, {{agent}} spawns a "watcher" process that monitors the upgrade process. Windows attempts to create a temporary console for this process. If Windows can't create this console, the watcher process initialization fails with error code `0xc0000142` (`STATUS_DLL_INIT_FAILED`), resulting in an upgrade failure. {{agent}} logs this error at the `info` level.

The error is caused by Windows [desktop heap exhaustion](https://learn.microsoft.com/en-us/troubleshoot/windows-server/performance/desktop-heap-limitation-out-of-memory). When {{agent}} runs as a [Windows service application](https://learn.microsoft.com/en-us/dotnet/framework/windows-services/introduction-to-windows-service-applications), it uses the service desktop, and shares the desktop heap with other running services. If a service process is using windowing resources, but is failing to release them, this may exhaust the desktop heap and affect {{agent}}.

:::{note}
Interactively-run instances of `elastic-agent.exe` are not subject to this limitation. Only instances running as a service are potentially affected.
:::

To resolve the issue, you can try the following:

- **Update {{agent}} immediately after a system reboot**

A system reboot destroys and recreates the desktop heap, resolving any prior exhaustion.
Because many memory leaks are gradual, updating {{agent}} immediately after a system reboot may allow {{agent}} to upgrade before the memory leaking application exhausts the desktop heap.

:::{tip}
A [cold startup](https://learn.microsoft.com/en-us/windows-hardware/drivers/kernel/distinguishing-fast-startup-from-wake-from-hibernation) resets kernel memory, but a fast startup or a wake from hibernation does not.
A regular reboot (for example, `shutdown /r /t 0`) results in a cold startup, and resets the desktop heap.
:::

- **Update third-party service applications**

As standard Windows tools such as Task Manager and Process Explorer do not attribute desktop heap usage by application, you have to consider updating all third-party processes that are running as a service. To list these applications, use the following PowerShell command:

```powershell
PS C:\> Get-Process | Where {$_.SI -eq 0} | Where {$_.MainModule.FileVersionInfo.ProductName -and (-not (($_.MainModule.FileVersionInfo.CompanyName -eq "Microsoft Corporation") -and ($_.MainModule.FileVersionInfo.ProductName -like "*Windows*"))) } | ForEach-Object { $_.MainModule.FileVersionInfo.ProductName + ' - ' + $_.Path }
```

You can then install any updates from the listed applications' manufacturers.

- **Terminate or uninstall third-party service applications**

You can try terminating or uninstalling non-critical third-party service applications before updating {{agent}}.
Terminating a process releases its desktop heap resources.

Note that the {{agent}} update process does not require a significant amount of desktop heap resources, so a successful {{agent}} update following the termination or uninstallation of a service application does not necessarily mean that the application was exhausting the desktop heap.

- **Resize the desktop heap**

As a short-term solution, follow the steps described in the [Microsoft guide](https://learn.microsoft.com/en-us/troubleshoot/windows-server/performance/desktop-heap-limitation-out-of-memory) to increase the size of the desktop heap. Note that if a service application is causing a memory leak, increasing the size of the desktop heap may only postpone the desktop heap exhaustion.


## {{agents}} are unable to connect after removing the {{fleet-server}} integration [fleet-server-integration-removed]

When you use {{fleet}}-managed {{agent}}, at least one {{agent}} needs to be running the [{{fleet-server}} integration](https://docs.elastic.co/integrations/fleet_server). In case the policy containing this integration is accidentally removed from {{agent}}, all other agents will not be able to be managed. However, the {{agents}} will continue to send data to their configured output.
Expand Down
Loading