Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Verify all receivers to make sure they follow the correct order of operations #5909

Open
tigrannajaryan opened this issue Oct 25, 2021 · 12 comments
Labels
comp: receiver Receiver never stale Issues marked with this label will be never staled and automatically removed

Comments

@tigrannajaryan
Copy link
Member

tigrannajaryan commented Oct 25, 2021

See open-telemetry/opentelemetry-collector#4262:

// The receivers that receive data via a protocol that support acknowledgements
// MUST follow this order of operations:
// - Receive data from some sender (typically from a network).
// - Push received data to the pipeline by calling nextConsumer.Consume*() function.
// - Acknowledge successful data receipt to the sender if Consume*() succeeded or
//   return a failure to the sender if Consume*() returned an error.
// This ensures there are strong delivery guarantees once the data is acknowledged
// by the Collector.

Ideally we want this to be verified by automated tests.

UPDATE: In the spirit of the same strong delivery guarantees, the receivers that use checkpointing (e.g. filelog receiver) must save the checkpoint AFTER they push the data to the pipeline (after Consume*() call returns).

@github-actions
Copy link
Contributor

github-actions bot commented Nov 7, 2022

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

@github-actions github-actions bot added the Stale label Nov 7, 2022
povilasv referenced this issue in coralogix/opentelemetry-collector-contrib Dec 19, 2022
Signed-off-by: Bogdan <bogdandrutu@gmail.com>

Signed-off-by: Bogdan <bogdandrutu@gmail.com>
@github-actions
Copy link
Contributor

github-actions bot commented Apr 3, 2023

This issue has been closed as inactive because it has been stale for 120 days with no activity.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Apr 3, 2023
@tigrannajaryan tigrannajaryan reopened this Apr 3, 2023
@tigrannajaryan
Copy link
Member Author

@open-telemetry/collector-maintainers is there a way to prevent certain issues from being closed as inactive? The fact that this was inactive for 120 days does not make it less true :-)

@atoulme atoulme added the never stale Issues marked with this label will be never staled and automatically removed label Apr 3, 2023
@atoulme
Copy link
Contributor

atoulme commented Apr 3, 2023

You can add the "never stale" tag to the issue. I have done this now.

@atoulme
Copy link
Contributor

atoulme commented Apr 17, 2023

Is this a manual audit? We can add a test as part of the component test suite but not sure how to test receivers that actively scrape endpoints.

@tigrannajaryan
Copy link
Member Author

Automated tests are of course much more preferable. For periodic scrapers this is probably less important (if datapoints is lost typically nothing catastrophic happens since they are usually cumulative). I would focus on the network receivers.

@tigrannajaryan tigrannajaryan changed the title Audit all receivers to make sure the follow the correct order operations Verify all receivers to make sure the follow the correct order operations Apr 18, 2023
@tigrannajaryan
Copy link
Member Author

Updated the description to also explain expectations from non-network receivers that use checkpointing.

@atoulme
Copy link
Contributor

atoulme commented Apr 18, 2023

OK so if I was to be methodical about it, I would do this:

  • add metadata entry that identifies which receivers do checkpointing (probably a warning entry)
  • open a master issue for all those and for each, run an issue that manually audits the behavior
  • open a separate issue down the line to investigate automated testing going forward.

@dmitryax
Copy link
Member

@atoulme looks like a good plan, thanks! Couple Qs:

add metadata entry that identifies which receivers do checkpointing (probably a warning entry)

Not sure I understand why a warning is needed here. Checkpointing is needed for any receiver doing continuous reading from an external source, e.g., a file. So it's a preferred behavior, not a warning.

open a master issue for all those and for each, run an issue that manually audits the behavior

I believe this issue can serve that purpose. We can create subtasks here for each receiver.

@atoulme
Copy link
Contributor

atoulme commented Apr 18, 2023

I agree on the warning thing. The status metadata currently can contain warnings, or at least that's what we're working towards. Warnings are over broad in their definition right now - you can think of them as "features", or "particularities'. One such warning being discussed is a component being "stateful" (with quotes because being stateful is itself a discussion). So if you prefer a more precise term, we can take that up and discuss. I just want to point out that we have a way to identify components that use checkpointing as part of their metadata.

@dmitryax
Copy link
Member

I just want to point out that we have a way to identify components that use checkpointing as part of their metadata.

Makes sense. We can probably introduce more categorization to the receivers, e.g.:

  • push model network receivers (return errors in case of failures)
  • periodic scraping of external sources (drops are allowed)
  • continuous reading of external sources (with checkpointing)

@atoulme
Copy link
Contributor

atoulme commented Apr 18, 2023

Peeling off #21049 as a separate issue to address this, lgtm!

@tigrannajaryan tigrannajaryan changed the title Verify all receivers to make sure the follow the correct order operations Verify all receivers to make sure they follow the correct order operations Apr 18, 2023
@tigrannajaryan tigrannajaryan changed the title Verify all receivers to make sure they follow the correct order operations Verify all receivers to make sure they follow the correct order of operations Apr 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp: receiver Receiver never stale Issues marked with this label will be never staled and automatically removed
Projects
None yet
Development

No branches or pull requests

4 participants