Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Ingest Manager] Assigning policy/config takes too long #77613

Closed
mtojek opened this issue Sep 16, 2020 · 11 comments · Fixed by #78493
Closed

[Ingest Manager] Assigning policy/config takes too long #77613

mtojek opened this issue Sep 16, 2020 · 11 comments · Fixed by #78493
Assignees
Labels
Team:Fleet Team label for Observability Data Collection Fleet team

Comments

@mtojek
Copy link
Contributor

mtojek commented Sep 16, 2020

Hi,

@ycombinator and I are working on system tests for integrations. While debugging a basic use case with Apache logs, we identified an issue around assigning policy to the Agent:

➜  apache git:(system-test-apache) /Users/marcin.tojek/go/bin/elastic-package test system -v
2020/09/16 15:17:14 DEBUG Enable verbose logging
Run system tests for the package
2020/09/16 15:17:14  INFO setting up service...
2020/09/16 15:17:14 DEBUG setting up service using Docker Compose service deployer
2020/09/16 15:17:14 DEBUG running command: /usr/local/bin/docker-compose -f /Users/marcin.tojek/go/src/github.com/elastic/integrations/packages/apache/_dev/deploy/docker-compose.yml -p elastic-package-service up -d
Creating network "elastic-package-service_default" with the default driver
Creating elastic-package-service_apache_1 ... done
2020/09/16 15:17:16 DEBUG creating temp file /Users/marcin.tojek/.elastic-package/tmp/service_logs/stdout to hold service container elastic-package-service_apache_1 STDOUT
2020/09/16 15:17:16 DEBUG creating temp file /Users/marcin.tojek/.elastic-package/tmp/service_logs/stderr to hold service container elastic-package-service_apache_1 STDERR
2020/09/16 15:17:16 DEBUG redirecting service container elastic-package-service_apache_1 STDOUT and STDERR to temp files
2020/09/16 15:17:16 DEBUG attaching service container elastic-package-service_apache_1 to stack network elastic-package-stack_default
2020/09/16 15:17:16 DEBUG adding service container elastic-package-service_apache_1 internal ports to context
2020/09/16 15:17:16 DEBUG running command: /usr/local/bin/docker-compose -f /Users/marcin.tojek/go/src/github.com/elastic/integrations/packages/apache/_dev/deploy/docker-compose.yml -p elastic-package-service config
2020/09/16 15:17:17  INFO creating test policy...
2020/09/16 15:17:17 DEBUG POST http://127.0.0.1:5601/api/ingest_manager/agent_policies
2020/09/16 15:17:17 DEBUG {"name":"ep-test-system-apache-access-20200916T15:17:17Z","description":"test policy created by elastic-package test system for data stream apache/access","namespace":"ep"}
2020/09/16 15:17:19  INFO adding package datastream to test policy...
2020/09/16 15:17:19 DEBUG POST http://127.0.0.1:5601/api/ingest_manager/package_policies
2020/09/16 15:17:19 DEBUG {"name":"apache-access","description":"","namespace":"ep","policy_id":"f15ef160-f81e-11ea-89b3-47f49158ae7d","enabled":true,"output_id":"","inputs":[{"type":"logfile","enabled":true,"streams":[{"id":"logfile-apache.access","enabled":true,"data_stream":{"type":"logs","dataset":"apache.access"},"vars":{"paths":{"value":["/tmp/service_logs/stdout"],"type":"text"}}}],"vars":{}}],"package":{"name":"apache","title":"Apache","version":"0.1.4"}}
2020/09/16 15:17:28  INFO deleting old data in data stream...
2020/09/16 15:17:28  INFO assigning package datastream to agent...
2020/09/16 15:17:28 DEBUG PUT http://127.0.0.1:5601/api/ingest_manager/fleet/agents/4f9beed8-13a8-4f5b-bf4d-b5aaaf1897ed/reassign
2020/09/16 15:17:28 DEBUG { "policy_id": "f15ef160-f81e-11ea-89b3-47f49158ae7d" }
2020/09/16 15:17:29  INFO checking for expected data in data stream...
2020/09/16 15:17:29 DEBUG found 0 hits in logs-apache.access-ep data stream
2020/09/16 15:17:30 DEBUG found 0 hits in logs-apache.access-ep data stream
2020/09/16 15:17:31 DEBUG found 0 hits in logs-apache.access-ep data stream
2020/09/16 15:17:32 DEBUG found 0 hits in logs-apache.access-ep data stream
2020/09/16 15:17:33 DEBUG found 0 hits in logs-apache.access-ep data stream
(...)
2020/09/16 15:19:26 DEBUG found 0 hits in logs-apache.access-ep data stream
2020/09/16 15:19:27 DEBUG found 0 hits in logs-apache.access-ep data stream
2020/09/16 15:19:28 DEBUG found 0 hits in logs-apache.access-ep data stream
2020/09/16 15:19:29 DEBUG found 0 hits in logs-apache.access-ep data stream
System test for apache/access dataset failed
2020/09/16 15:19:30 DEBUG reassigning original policy back to agent...
2020/09/16 15:19:30 DEBUG PUT http://127.0.0.1:5601/api/ingest_manager/fleet/agents/4f9beed8-13a8-4f5b-bf4d-b5aaaf1897ed/reassign
2020/09/16 15:19:30 DEBUG { "policy_id": "5aec14b0-f81e-11ea-89b3-47f49158ae7d" }
2020/09/16 15:19:30 DEBUG deleting test policy...
2020/09/16 15:19:30 DEBUG POST http://127.0.0.1:5601/api/ingest_manager/agent_policies/delete
2020/09/16 15:19:30 DEBUG { "agentPolicyId": "f15ef160-f81e-11ea-89b3-47f49158ae7d" }
2020/09/16 15:19:33  INFO tearing down service...
2020/09/16 15:19:33  INFO tearing down service using docker compose runner
2020/09/16 15:19:33 DEBUG running command: /usr/local/bin/docker-compose -f /Users/marcin.tojek/go/src/github.com/elastic/integrations/packages/apache/_dev/deploy/docker-compose.yml -p elastic-package-service down
Stopping elastic-package-service_apache_1 ... done
Removing elastic-package-service_apache_1 ... done
Removing network elastic-package-service_default
Error: error running package system tests: system test for apache/access dataset failed

The reason behind this is the timeout problem on the Agent side:

{"log.level":"error","@timestamp":"2020-09-16T13:15:00.450Z","log.origin":{"file.name":"application/fleet_gateway.go","file.line":176},"message":"Could not communicate with Checking API will retry, error: fail to checkin to fleet: Post \"http://kibana:5601/api/ingest_manager/fleet/agents/4f9beed8-13a8-4f5b-bf4d-b5aaaf1897ed/checkin?\": net/http: request canceled (Client.Timeout exceeded while awaiting headers)","ecs.version":"1.5.0"}
{"log.level":"debug","@timestamp":"2020-09-16T13:16:45.721Z","log.origin":{"file.name":"kibana/client.go","file.line":170},"message":"Request method: POST, path: /api/ingest_manager/fleet/agents/4f9beed8-13a8-4f5b-bf4d-b5aaaf1897ed/checkin","ecs.version":"1.5.0"}
{"log.level":"error","@timestamp":"2020-09-16T13:18:15.620Z","log.origin":{"file.name":"application/fleet_gateway.go","file.line":176},"message":"Could not communicate with Checking API will retry, error: fail to checkin to fleet: Post \"http://kibana:5601/api/ingest_manager/fleet/agents/4f9beed8-13a8-4f5b-bf4d-b5aaaf1897ed/checkin?\": net/http: request canceled (Client.Timeout exceeded while awaiting headers)","ecs.version":"1.5.0"}

As it's important for the system tests to be quick and simple, the workaround like xpack.ingestManager.fleet.pollingRequestTimeout: 60000 doesn't seem to be the right choice. We appreciate any optimization on the path here to make the test execution shorter.

@mtojek mtojek added the Team:Fleet Team label for Observability Data Collection Fleet team label Sep 16, 2020
@elasticmachine
Copy link
Contributor

Pinging @elastic/ingest-management (Team:Ingest Management)

@EricDavisX
Copy link
Contributor

are we sure the policy isn't applied? a lack of data may be related to a lack of activity on the host possibly? I don't think we're not seeing this in manual usage or other e2e tests. @mdelapenya has helped set up the Agent / Fleet e2e and he could help discuss our usage if desired.

@nchaulet
Copy link
Member

I do not thing the fleet checkin timeout is the issue here, the timeout is used in the checkin if there is no config change. If there is a config change the checkin will be immediate. If there is a config change the request will be the timeout duration.

@mtojek
Copy link
Contributor Author

mtojek commented Sep 16, 2020

are we sure the policy isn't applied? a lack of data may be related to a lack of activity on the host possibly?

In our setup everything is in place - in this case sample access logs. Once the filebeat picks the config and starts processing files it is matter of seconds.

@ph
Copy link
Contributor

ph commented Sep 16, 2020

as noted from @nchaulet the change should be immediate, I would be assuming the action_store has indeed the latest config.

@michalpristas and @nchaulet can you pair on that issue.

@mtojek what is the delay that you are seeing?

@mtojek
Copy link
Contributor Author

mtojek commented Sep 16, 2020

Here is a trace from the latest failed test run:

➜  apache git:(system-test-apache) /Users/marcin.tojek/go/bin/elastic-package test system -v
2020/09/16 17:51:05 DEBUG Enable verbose logging
Run system tests for the package
2020/09/16 17:51:05  INFO setting up service...
2020/09/16 17:51:05 DEBUG setting up service using Docker Compose service deployer
2020/09/16 17:51:05 DEBUG running command: /usr/local/bin/docker-compose -f /Users/marcin.tojek/go/src/github.com/elastic/integrations/packages/apache/_dev/deploy/docker-compose.yml -p elastic-package-service up -d
Creating network "elastic-package-service_default" with the default driver
Creating elastic-package-service_apache_1 ... done
2020/09/16 17:51:07 DEBUG creating temp file /Users/marcin.tojek/.elastic-package/tmp/service_logs/stdout to hold service container elastic-package-service_apache_1 STDOUT
2020/09/16 17:51:07 DEBUG creating temp file /Users/marcin.tojek/.elastic-package/tmp/service_logs/stderr to hold service container elastic-package-service_apache_1 STDERR
2020/09/16 17:51:07 DEBUG redirecting service container elastic-package-service_apache_1 STDOUT and STDERR to temp files
2020/09/16 17:51:07 DEBUG attaching service container elastic-package-service_apache_1 to stack network elastic-package-stack_default
2020/09/16 17:51:07 DEBUG adding service container elastic-package-service_apache_1 internal ports to context
2020/09/16 17:51:07 DEBUG running command: /usr/local/bin/docker-compose -f /Users/marcin.tojek/go/src/github.com/elastic/integrations/packages/apache/_dev/deploy/docker-compose.yml -p elastic-package-service config
2020/09/16 17:51:07  INFO creating test policy...
2020/09/16 17:51:07 DEBUG POST http://127.0.0.1:5601/api/ingest_manager/agent_policies
2020/09/16 17:51:07 DEBUG {"name":"ep-test-system-apache-access-20200916T17:51:07Z","description":"test policy created by elastic-package test system for data stream apache/access","namespace":"ep"}
2020/09/16 17:51:10  INFO adding package datastream to test policy...
2020/09/16 17:51:10 DEBUG POST http://127.0.0.1:5601/api/ingest_manager/package_policies
2020/09/16 17:51:10 DEBUG {"name":"apache-access","description":"","namespace":"ep","policy_id":"6f68abe0-f834-11ea-b5c3-6f5f5f464533","enabled":true,"output_id":"","inputs":[{"type":"logfile","enabled":true,"streams":[{"id":"logfile-apache.access","enabled":true,"data_stream":{"type":"logs","dataset":"apache.access"},"vars":{"paths":{"value":["/tmp/service_logs/stdout"],"type":"text"}}}],"vars":{}}],"package":{"name":"apache","title":"Apache","version":"0.1.4"}}
2020/09/16 17:51:19  INFO deleting old data in data stream...
2020/09/16 17:51:19  INFO assigning package datastream to agent...
2020/09/16 17:51:19 DEBUG PUT http://127.0.0.1:5601/api/ingest_manager/fleet/agents/1ccef44c-3a32-4a19-94a9-85674cd3508e/reassign
2020/09/16 17:51:19 DEBUG { "policy_id": "6f68abe0-f834-11ea-b5c3-6f5f5f464533" }
2020/09/16 17:51:20  INFO checking for expected data in data stream...
2020/09/16 17:51:20 DEBUG found 0 hits in logs-apache.access-ep data stream
2020/09/16 17:51:21 DEBUG found 0 hits in logs-apache.access-ep data stream
2020/09/16 17:51:22 DEBUG found 0 hits in logs-apache.access-ep data stream
2020/09/16 17:51:23 DEBUG found 0 hits in logs-apache.access-ep data stream
2020/09/16 17:51:24 DEBUG found 0 hits in logs-apache.access-ep data stream
(...)
2020/09/16 17:53:16 DEBUG found 0 hits in logs-apache.access-ep data stream
2020/09/16 17:53:17 DEBUG found 0 hits in logs-apache.access-ep data stream
2020/09/16 17:53:18 DEBUG found 0 hits in logs-apache.access-ep data stream
2020/09/16 17:53:19 DEBUG found 0 hits in logs-apache.access-ep data stream
2020/09/16 17:53:20 DEBUG found 0 hits in logs-apache.access-ep data stream
System test for apache/access dataset failed
2020/09/16 17:53:21 DEBUG reassigning original policy back to agent...
2020/09/16 17:53:21 DEBUG PUT http://127.0.0.1:5601/api/ingest_manager/fleet/agents/1ccef44c-3a32-4a19-94a9-85674cd3508e/reassign
2020/09/16 17:53:21 DEBUG { "policy_id": "e0f87020-f833-11ea-b5c3-6f5f5f464533" }
2020/09/16 17:53:21 DEBUG deleting test policy...
2020/09/16 17:53:21 DEBUG POST http://127.0.0.1:5601/api/ingest_manager/agent_policies/delete
2020/09/16 17:53:21 DEBUG { "agentPolicyId": "6f68abe0-f834-11ea-b5c3-6f5f5f464533" }
2020/09/16 17:53:24  INFO tearing down service...
2020/09/16 17:53:24  INFO tearing down service using docker compose runner
2020/09/16 17:53:24 DEBUG running command: /usr/local/bin/docker-compose -f /Users/marcin.tojek/go/src/github.com/elastic/integrations/packages/apache/_dev/deploy/docker-compose.yml -p elastic-package-service down
Stopping elastic-package-service_apache_1 ... done
Removing elastic-package-service_apache_1 ... done
Removing network elastic-package-service_default
Error: error running package system tests: system test for apache/access dataset failed

17:51:19 ~ 17:53:21 - we assumed that 2 minutes should more than enough.

For the second time:

2020/09/16 17:55:44 DEBUG adding service container elastic-package-service_apache_1 internal ports to context
2020/09/16 17:55:44 DEBUG running command: /usr/local/bin/docker-compose -f /Users/marcin.tojek/go/src/github.com/elastic/integrations/packages/apache/_dev/deploy/docker-compose.yml -p elastic-package-service config
2020/09/16 17:55:44  INFO creating test policy...
2020/09/16 17:55:44 DEBUG POST http://127.0.0.1:5601/api/ingest_manager/agent_policies
2020/09/16 17:55:44 DEBUG {"name":"ep-test-system-apache-access-20200916T17:55:44Z","description":"test policy created by elastic-package test system for data stream apache/access","namespace":"ep"}
2020/09/16 17:55:47  INFO adding package datastream to test policy...
2020/09/16 17:55:47 DEBUG POST http://127.0.0.1:5601/api/ingest_manager/package_policies
2020/09/16 17:55:47 DEBUG {"name":"apache-access","description":"","namespace":"ep","policy_id":"14726d60-f835-11ea-b5c3-6f5f5f464533","enabled":true,"output_id":"","inputs":[{"type":"logfile","enabled":true,"streams":[{"id":"logfile-apache.access","enabled":true,"data_stream":{"type":"logs","dataset":"apache.access"},"vars":{"paths":{"value":["/tmp/service_logs/stdout"],"type":"text"}}}],"vars":{}}],"package":{"name":"apache","title":"Apache","version":"0.1.4"}}
2020/09/16 17:55:50  INFO deleting old data in data stream...
2020/09/16 17:55:50  INFO assigning package datastream to agent...
2020/09/16 17:55:50 DEBUG PUT http://127.0.0.1:5601/api/ingest_manager/fleet/agents/1ccef44c-3a32-4a19-94a9-85674cd3508e/reassign
2020/09/16 17:55:50 DEBUG { "policy_id": "14726d60-f835-11ea-b5c3-6f5f5f464533" }
2020/09/16 17:55:51  INFO checking for expected data in data stream...
2020/09/16 17:55:51 DEBUG found 0 hits in logs-apache.access-ep data stream
2020/09/16 17:55:52 DEBUG found 0 hits in logs-apache.access-ep data stream
2020/09/16 17:55:53 DEBUG found 0 hits in logs-apache.access-ep data stream
2020/09/16 17:55:54 DEBUG found 0 hits in logs-apache.access-ep data stream
2020/09/16 17:55:55 DEBUG found 0 hits in logs-apache.access-ep data stream
2020/09/16 17:55:56 DEBUG found 0 hits in logs-apache.access-ep data stream
(...)
2020/09/16 17:57:45 DEBUG found 0 hits in logs-apache.access-ep data stream
2020/09/16 17:57:46 DEBUG found 0 hits in logs-apache.access-ep data stream
2020/09/16 17:57:47 DEBUG found 0 hits in logs-apache.access-ep data stream
2020/09/16 17:57:48 DEBUG found 0 hits in logs-apache.access-ep data stream
2020/09/16 17:57:49 DEBUG found 90 hits in logs-apache.access-ep data stream
System test for apache/access dataset passed!

17:55:50 ~ 17:57:49 - it passed just before timeout.

Here is another successful run:

➜  apache git:(system-test-apache) elastic-package test -v
2020/09/16 18:03:21 DEBUG Enable verbose logging
Run test suite for the package
Run system tests for the package
2020/09/16 18:03:21  INFO setting up service...
2020/09/16 18:03:21 DEBUG setting up service using Docker Compose service deployer
2020/09/16 18:03:21 DEBUG running command: /usr/local/bin/docker-compose -f /Users/marcin.tojek/go/src/github.com/elastic/integrations/packages/apache/_dev/deploy/docker-compose.yml -p elastic-package-service up -d
Creating network "elastic-package-service_default" with the default driver
Creating elastic-package-service_apache_1 ... done
2020/09/16 18:03:23 DEBUG creating temp file /Users/marcin.tojek/.elastic-package/tmp/service_logs/stdout to hold service container elastic-package-service_apache_1 STDOUT
2020/09/16 18:03:23 DEBUG creating temp file /Users/marcin.tojek/.elastic-package/tmp/service_logs/stderr to hold service container elastic-package-service_apache_1 STDERR
2020/09/16 18:03:23 DEBUG redirecting service container elastic-package-service_apache_1 STDOUT and STDERR to temp files
2020/09/16 18:03:23 DEBUG attaching service container elastic-package-service_apache_1 to stack network elastic-package-stack_default
2020/09/16 18:03:23 DEBUG adding service container elastic-package-service_apache_1 internal ports to context
2020/09/16 18:03:23 DEBUG running command: /usr/local/bin/docker-compose -f /Users/marcin.tojek/go/src/github.com/elastic/integrations/packages/apache/_dev/deploy/docker-compose.yml -p elastic-package-service config
2020/09/16 18:03:24  INFO creating test policy...
2020/09/16 18:03:24 DEBUG POST http://127.0.0.1:5601/api/ingest_manager/agent_policies
2020/09/16 18:03:24 DEBUG {"name":"ep-test-system-apache-access-20200916T18:03:24Z","description":"test policy created by elastic-package test system for data stream apache/access","namespace":"ep"}
2020/09/16 18:03:26  INFO adding package datastream to test policy...
2020/09/16 18:03:26 DEBUG POST http://127.0.0.1:5601/api/ingest_manager/package_policies
2020/09/16 18:03:26 DEBUG {"name":"apache-access","description":"","namespace":"ep","policy_id":"26496c90-f836-11ea-b5c3-6f5f5f464533","enabled":true,"output_id":"","inputs":[{"type":"logfile","enabled":true,"streams":[{"id":"logfile-apache.access","enabled":true,"data_stream":{"type":"logs","dataset":"apache.access"},"vars":{"paths":{"value":["/tmp/service_logs/stdout"],"type":"text"}}}],"vars":{}}],"package":{"name":"apache","title":"Apache","version":"0.1.4"}}
2020/09/16 18:03:30  INFO deleting old data in data stream...
2020/09/16 18:03:30  INFO assigning package datastream to agent...
2020/09/16 18:03:30 DEBUG PUT http://127.0.0.1:5601/api/ingest_manager/fleet/agents/1ccef44c-3a32-4a19-94a9-85674cd3508e/reassign
2020/09/16 18:03:30 DEBUG { "policy_id": "26496c90-f836-11ea-b5c3-6f5f5f464533" }
2020/09/16 18:03:31  INFO checking for expected data in data stream...
2020/09/16 18:03:31 DEBUG found 0 hits in logs-apache.access-ep data stream
2020/09/16 18:03:32 DEBUG found 0 hits in logs-apache.access-ep data stream
2020/09/16 18:03:33 DEBUG found 0 hits in logs-apache.access-ep data stream
2020/09/16 18:03:34 DEBUG found 0 hits in logs-apache.access-ep data stream
2020/09/16 18:03:35 DEBUG found 0 hits in logs-apache.access-ep data stream
2020/09/16 18:03:36 DEBUG found 0 hits in logs-apache.access-ep data stream
2020/09/16 18:03:37 DEBUG found 0 hits in logs-apache.access-ep data stream
2020/09/16 18:03:38 DEBUG found 0 hits in logs-apache.access-ep data stream
2020/09/16 18:03:39 DEBUG found 0 hits in logs-apache.access-ep data stream
2020/09/16 18:03:40 DEBUG found 0 hits in logs-apache.access-ep data stream
2020/09/16 18:03:41 DEBUG found 0 hits in logs-apache.access-ep data stream
2020/09/16 18:03:42 DEBUG found 0 hits in logs-apache.access-ep data stream
2020/09/16 18:03:43 DEBUG found 0 hits in logs-apache.access-ep data stream
2020/09/16 18:03:44 DEBUG found 0 hits in logs-apache.access-ep data stream
2020/09/16 18:03:45 DEBUG found 0 hits in logs-apache.access-ep data stream
2020/09/16 18:03:46 DEBUG found 0 hits in logs-apache.access-ep data stream
2020/09/16 18:03:47 DEBUG found 0 hits in logs-apache.access-ep data stream
2020/09/16 18:03:48 DEBUG found 0 hits in logs-apache.access-ep data stream
2020/09/16 18:03:49 DEBUG found 16 hits in logs-apache.access-ep data stream
System test for apache/access dataset passed!

As you see this one it's relatively quick.

@nchaulet
Copy link
Member

@mtojek let me investigate a little more but looks like there is case I did not cover with config reassignment

@ycombinator
Copy link
Contributor

Hi @nchaulet, any progress on this issue? Just wondering since I'm about to merge a PR with a workaround in place: https://github.com/elastic/elastic-package/pull/111/files#diff-6d81fc25f6617cc18c88e703716754b9R23-R24. Thanks!

@nchaulet
Copy link
Member

nchaulet commented Oct 1, 2020

@ycombinator I started on this but is a bigger change than what I thought #78493

@ycombinator
Copy link
Contributor

Thanks for the update @nchaulet. I've subscribed to your PR now.

@ycombinator
Copy link
Contributor

Thanks for getting this fixed @nchaulet. Appreciate it! ❤️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:Fleet Team label for Observability Data Collection Fleet team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants