Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[otel-arrow/receiver] Receiver concurrency fixes; readability improvements & restructuring #205

Merged
merged 20 commits into from
May 31, 2024

Conversation

jmacd
Copy link
Contributor

@jmacd jmacd commented May 28, 2024

Restructure receiver code to improve readability. There are a number of metrics that are incremented when a batch starts being processed and are decremented when the batch is finished, but the control flow that maintained the balance of these updates was convoluted.

The root-cause of #204 is that Arrow batches meant for a consumer to be processed in order were processed out-of-order. There was a large function body which served two purposes: consume Arrow data of the appropriate kind, enter data for the pipeline to consume next. This had to be split into two parts and should have been done as part of #181. (I, as reviewer, missed this and find, in hindsight, that the code is not easy to follow.)

This improves the code structure by moving all stateful aspects of starting/finishing a request into a new inFlightData object which has a deferrable method to finish the request. Here, we keep:

  1. The inFlightWG done count
  2. The active requests metric
  3. The active items metric
  4. The active bytes metric
  5. The bytes-acquired from the semaphore
  6. A per-request span covering Arrow decode
  7. Netstat-related instrumentation

Authorization now happens before acquiring from the semaphore.

A number of fmt.Errorf() calls are replaced with status.Errorf(...) and a specific error code. The tests are updated to be more specific. Several Arrow tests were accidentally canceling the test before an expected error condition was actually tested, they have been audited and improved.

One new concurrent-receiver test was added.

Fixes #204.

@jmacd jmacd changed the title Receiver concurrency fixes [otel-arrow/receiver] Receiver concurrency fixes; readability improvements & restructuring May 29, 2024
@jmacd jmacd requested a review from kristinapathak May 29, 2024 00:20
@jmacd
Copy link
Contributor Author

jmacd commented May 29, 2024

@kristinapathak Proposing to add you as a reviewer in this repository. This change is a substantial rewrite but not a substantial change of behavior, so it's worth a careful review and should help you learn the receiver code very well. Thank you! 😁 🚀

@jmacd jmacd marked this pull request as ready for review May 29, 2024 00:23
Copy link

@kristinapathak kristinapathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few questions. The arrow.go file is long - might be worth splitting up in a separate PR. 🙂

Copy link
Contributor

@lquerel lquerel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, it seems much clearer to me, but I have a small doubt about the test (see my comment regarding the last unit test). It's probably because I haven't understood everything. With a few additional explanations, I would certainly be able to approve this commit.

@jmacd
Copy link
Contributor Author

jmacd commented May 30, 2024

Reviewers, please take another look.

@jmacd
Copy link
Contributor Author

jmacd commented May 31, 2024

For the test flake above, #206

Copy link
Contributor

@lquerel lquerel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jmacd jmacd merged commit c5c9d7e into open-telemetry:main May 31, 2024
2 checks passed
@jmacd jmacd deleted the jmacd/deconcur branch May 31, 2024 20:38
codeboten pushed a commit to open-telemetry/opentelemetry-collector-contrib that referenced this pull request Jun 11, 2024
[![Mend
Renovate](https://app.renovatebot.com/images/banner.svg)](https://renovatebot.com)

This PR contains the following updates:

| Package | Change | Age | Adoption | Passing | Confidence |
|---|---|---|---|---|---|
|
[github.com/open-telemetry/otel-arrow](https://togithub.com/open-telemetry/otel-arrow)
| `v0.23.0` -> `v0.24.0` |
[![age](https://developer.mend.io/api/mc/badges/age/go/github.com%2fopen-telemetry%2fotel-arrow/v0.24.0?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|
[![adoption](https://developer.mend.io/api/mc/badges/adoption/go/github.com%2fopen-telemetry%2fotel-arrow/v0.24.0?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|
[![passing](https://developer.mend.io/api/mc/badges/compatibility/go/github.com%2fopen-telemetry%2fotel-arrow/v0.23.0/v0.24.0?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|
[![confidence](https://developer.mend.io/api/mc/badges/confidence/go/github.com%2fopen-telemetry%2fotel-arrow/v0.23.0/v0.24.0?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|

---

> [!WARNING]
> Some dependencies could not be looked up. Check the Dependency
Dashboard for more information.

---

### Release Notes

<details>
<summary>open-telemetry/otel-arrow
(github.com/open-telemetry/otel-arrow)</summary>

###
[`v0.24.0`](https://togithub.com/open-telemetry/otel-arrow/releases/tag/v0.24.0)

[Compare
Source](https://togithub.com/open-telemetry/otel-arrow/compare/v0.23.0...v0.24.0)

Jitter is applied to once per process, not once per stream.
[open-telemetry/otel-arrow#199
Network statistics tracing instrumentation simplified.
[open-telemetry/otel-arrow#201
Protocol includes use of more gRPC codes.
[open-telemetry/otel-arrow#202
Receiver concurrency bugfix.
[open-telemetry/otel-arrow#205
Concurrent batch processor size==0 bugfix.
[open-telemetry/otel-arrow#208
New integration testing.
[open-telemetry/otel-arrow#210
Use gRPC Status codes in the Arrow exporter.
[open-telemetry/otel-arrow#211
Fix stream-shutdown race in Arrow receiver.
[open-telemetry/otel-arrow#212
Avoid work for already-canceled requests.
[open-telemetry/otel-arrow#213
Call IPCReader.Err() after reader loop.
[open-telemetry/otel-arrow#215
Update to Arrow-Go v16.1.0.
[open-telemetry/otel-arrow#218
Update to OpenTelemetry Collector v0.102.x.
[open-telemetry/otel-arrow#219

</details>

---

### Configuration

📅 **Schedule**: Branch creation - "on tuesday" (UTC), Automerge - At any
time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you
are satisfied.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the
rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update
again.

---

- [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check
this box

---

This PR has been generated by [Mend
Renovate](https://www.mend.io/free-developer-tools/renovate/). View
repository job log
[here](https://developer.mend.io/github/open-telemetry/opentelemetry-collector-contrib).

<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNy4zOTMuMCIsInVwZGF0ZWRJblZlciI6IjM3LjM5My4wIiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6WyJkZXBlbmRlbmNpZXMiLCJyZW5vdmF0ZWJvdCJdfQ==-->

---------

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: opentelemetrybot <107717825+opentelemetrybot@users.noreply.github.com>
Co-authored-by: Yang Song <songy23@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Receiver crash report: concurrency bug
3 participants