Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nested modules sometimes fail to propagate exports #6199

Closed
wildum opened this issue Jan 19, 2024 · 1 comment · Fixed by #6212
Closed

Nested modules sometimes fail to propagate exports #6199

wildum opened this issue Jan 19, 2024 · 1 comment · Fixed by #6212
Assignees
Labels
bug Something isn't working flow Related to Grafana Agent Flow frozen-due-to-age Locked due to a period of inactivity. Please open new issues or PRs if more discussion is needed.

Comments

@wildum
Copy link
Contributor

wildum commented Jan 19, 2024

What's wrong?

This branch contains the test that sometimes triggers the bug and a hacky fix: https://github.com/grafana/agent/compare/experiment-modules-bug?expand=1

More context from @rfratto after a quick investigation: the worker pool is shared between all controllers, and is used for evaluating updated components. items are queued into the worker pool using SubmitWithKey.
note this lne:
Adding a job with a key that is already queued is a no-op (even if the submitted function is different).
the bug is that the key is the local ID of a component, not the global ID. because you have multiple components with the local ID of testcomponents.passthrough.pt, the worker queue is getting confused and is dropping events for one/both of them.
it's essentially a race condition which depends on whether the worker queue processes events from both components quickly enough. if it doesn't, events for one of them get dropped.

Without the fix, you can avoid the problem by having a different label on every component. It also seems that you need a different label on the export blocks. We don't know yet why this is the case for the exports block, this part should be investigated as it might reveal another bug.

Steps to reproduce

To trigger the bug you can modify the test of the branch above and put the same label on the same components (for example put the label "pt" on both "testcomponents.passthrough". If you run the test without the hacky fix several time you will see that it sometimes either fail or it pauses for about a second before succeeding.

System information

No response

Software version

v0.39.0

Configuration

No response

Logs

Here is paused for one second:
ts=2024-01-19T17:34:14.091856Z level=info msg="finished node evaluation" controller_id=module.file.default/module.file.default node_id=export.output duration=5.708µs
ts=2024-01-19T17:34:15.231655Z level=info msg="starting complete graph evaluation" controller_id=module.file.default trace_id=5b53b50575f1983e1f94ecbcd863632f
@wildum wildum added bug Something isn't working flow Related to Grafana Agent Flow labels Jan 19, 2024
@rfratto
Copy link
Member

rfratto commented Jan 19, 2024

We don't know yet why this is the case for the exports block, this part should be investigated as it might reveal another bug.

I looked more into this and I have an explanation now: the export blocks are also submitted to the worker queue for processing, and they're impacted by the same local key issue: they're submitted to the queue as export.output, and so some changed exports were getting ignored.

That means the hacky fix in the branch is the right general approach: items must be submitted to the worker queue with a global unique ID as the worker queue is shared.

@thampiotr thampiotr self-assigned this Jan 22, 2024
@github-actions github-actions bot added the frozen-due-to-age Locked due to a period of inactivity. Please open new issues or PRs if more discussion is needed. label Feb 21, 2024
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 21, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working flow Related to Grafana Agent Flow frozen-due-to-age Locked due to a period of inactivity. Please open new issues or PRs if more discussion is needed.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants