Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[processor/interval] Implement the main logic #32054

Merged
merged 32 commits into from
Apr 30, 2024

Conversation

RichieSams
Copy link
Contributor

@RichieSams RichieSams commented Apr 1, 2024

Description:

This PR implements the main logic of this new processor.

Link to tracking Issue:
#29461
This is a re-opening of 30827

Testing:
I added a test for the main aggregation / export behavior, but I need to add more to test state expiry

Documentation:

I updated the README with the updated configs, but I should add some examples as well.

Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label Apr 16, 2024
@RichieSams RichieSams force-pushed the processor/interval branch 2 times, most recently from dd599e7 to a902abc Compare April 17, 2024 21:07
@github-actions github-actions bot removed the Stale label Apr 18, 2024
@RichieSams RichieSams marked this pull request as ready for review April 18, 2024 12:16
@RichieSams
Copy link
Contributor Author

@sh0rez @jpkrohling This PR is ready for review now.

Can you help me understand why the tests are failing? It says that the go.sum is missing a reference to github.com/open-telemetry/opentelemetry-collector-contrib/internal/exp/metrics

But in the go.mod I have:

replace github.com/open-telemetry/opentelemetry-collector-contrib/internal/exp/metrics => ../../internal/exp/metrics

Which matches all the other processors. Locally it compiles fine. So I'm a bit stumped...

@RichieSams
Copy link
Contributor Author

RichieSams commented Apr 18, 2024

I just discovered golden.ReadMetrics(filePath string) from another component. Which could really simplify my test code. I'll take a look at using that

@RichieSams
Copy link
Contributor Author

@sh0rez @djaglowski @jpkrohling I have updated the tests, and this is ready for review again. (Though I'm still stumped why the CI tests are failing... )

Copy link
Contributor

@sh0rez sh0rez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears to me this processor can be simplified quite a lot (and be more efficient) by having aggregate create what export needs. Some pseudo-code of the idea:

  • keep the last datapoint per stream in memory. replace with other datapoint if has more recent .Timestamp().
  • store them in a pmetric.Metrics. Populate resource, scope and metric as needed from incoming sample. have some maps for lookups (they store only the pointer types, no actual data)
  • export is now trivial: return the already built pmetric.Metrics, clear the state
type State struct {
	raw pmetric.Metrics

	rms map[identity.Resource]pmetric.ResourceMetrics
	sms map[identity.Scope]pmetric.ScopeMetrics
	ms map[identity.Metric]pmetric.Metric

	nums map[identity.Stream]pmetric.NumberDataPoint
}

func (s State) Update(res Resource, scope Scope, m Metric) {
	lock()
	defer unlock()

	for _, dp := range md.DataPoints {
		id := identity.OfStream(dp)
		aggr := s.Number(id)
		if aggr.Ts < dp.Ts {
			dp.CopyTo(aggr)
		}
	}
}

func (s State) Number(id identity.Stream, res Resource, scope Scope, m Metric) pmetric.NumberDataPoint {
	// returns or creates the pmetric.NumberDataPoint in s.nums for given stream.
	// populates the required rms, sms and ms in raw as needed
}

func (s State) Export() pmetric.Metrics {
	lock()
	defer unlock()

	out := s.raw
	s.raw = pmetric.NewMetrics()
	clear(s.rms); clear(s.sms); clear(s.ms); clear(s.nums)

	return out
}

I think this could minimize memory usage and requires a lot less data conversion.

Does this make sense, or did I miss something important?

internal/exp/metrics/streams/streams.go Outdated Show resolved Hide resolved
internal/exp/metrics/streams/streams.go Outdated Show resolved Hide resolved
processor/intervalprocessor/README.md Outdated Show resolved Hide resolved
processor/intervalprocessor/README.md Outdated Show resolved Hide resolved
processor/intervalprocessor/config.go Outdated Show resolved Hide resolved
processor/intervalprocessor/internal/metrics/metrics.go Outdated Show resolved Hide resolved
processor/intervalprocessor/processor.go Outdated Show resolved Hide resolved
processor/intervalprocessor/processor.go Outdated Show resolved Hide resolved
processor/intervalprocessor/processor.go Outdated Show resolved Hide resolved
processor/intervalprocessor/processor.go Outdated Show resolved Hide resolved
@RichieSams
Copy link
Contributor Author

It appears to me this processor can be simplified quite a lot (and be more efficient) by having aggregate create what export needs

I like your idea! It would also solve the issue that NewMetric() should only be used in testing. I'll take a swing at it today

@RichieSams
Copy link
Contributor Author

@sh0rez Updated. Ready for re-review as you're able :)

@RichieSams RichieSams force-pushed the processor/interval branch 4 times, most recently from 6de69ee to 920a778 Compare April 23, 2024 19:47
@RichieSams RichieSams requested a review from sh0rez April 24, 2024 12:36
@sh0rez
Copy link
Contributor

sh0rez commented Apr 24, 2024

@RichieSams any specific reason to force-push this PR? makes reviewing really hard, as recent changes can't be easily identified

processor/intervalprocessor/processor.go Outdated Show resolved Hide resolved
"go.opentelemetry.io/collector/pdata/pmetric"
)

type DataPointSlice[DP DataPoint[DP]] interface {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these types are only locally used once in aggregateDataPoints, doesn't need an extra package, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that package has gotten much much smaller since the original design. That said, I'm working on a PR for another component, and needed the same interfaces. So I'm potentially going to put them in internal/exp/metrics/streams. See:

RichieSams@d16a215#diff-45d5f87d8148e341ebb68b4125ebe4296ccfbb1a9b63a45009358e7a0b617d26

Thoughts?

}

func (p *Processor) exportMetrics() {
md := func() pmetric.Metrics {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the closure?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the lock only lasts for the fetch, and not for the duration of ConsumeMetrics. Since defer is on function exit, not scope exit.

@RichieSams
Copy link
Contributor Author

RichieSams commented Apr 24, 2024

@sh0rez

any specific reason to force-push this PR? makes reviewing really hard, as recent changes can't be easily identified

Heh, a combination of habits. At work we're not allowed to have merge commits. So I generally always rebase and squash commits locally and force push. Github has little buttons to show the diff, but looking at it, it's not ideal, since it includes all the changes from main as well.

This project always squashes, so I'll do a merge commit next time to catch up to HEAD. Then you can see my changes as individual commits.

@RichieSams
Copy link
Contributor Author

@djaglowski @evan-bradley @TylerHelmuth Any further questions or concerns?

@RichieSams
Copy link
Contributor Author

Can someone re-kick the coverage task. It's failing due to rate-limiting:

info - 2024-04-29 18:27:29,491 -- The PR is happening in a forked repo. Using tokenless upload.
info - 2024-04-29 18:27:29,671 -- Process Commit creating complete
debug - 2024-04-29 18:27:29,671 -- Commit creating result --- {"result": "RequestResult(error=RequestError(code='HTTP Error 429', params={}, description='{\"detail\":\"Tokenless has reached GitHub rate limit. Please upload using a token: https://docs.codecov.com/docs/adding-the-codecov-token. Expected available in 418 seconds.\"}'), warnings=[], status_code=429, text='{\"detail\":\"Tokenless has reached GitHub rate limit. Please upload using a token: https://docs.codecov.com/docs/adding-the-codecov-token. Expected available in 418 seconds.\"}')"}
error - 2024-04-29 18:27:29,672 -- Commit creating failed: {"detail":"Tokenless has reached GitHub rate limit. Please upload using a token: https://docs.codecov.com/docs/adding-the-codecov-token. Expected available in 418 seconds."}
Error: Codecov: Failed to properly create commit: The process '/home/runner/work/_actions/Wandalen/codecov-action/dist/codecov' failed with exit code 1
Error: Process returned exit code 1
Launched as "/home/runner/runners/2.316.0/externals/node20/bin/node /home/runner/work/_actions/Wandalen/wretry.action/v3.4.0_js_action/src/Runner.mjs /home/runner/work/_actions/Wandalen/codecov-action/dist/index.js"
Launched at "/home/runner/work/opentelemetry-collector-contrib/opentelemetry-collector-contrib"
Attempts exhausted, made 10 attempts :
Attempt #1 started at : 18:24:56 GMT+0000 (Coordinated Universal Time)
Attempt #2 started at : 18:25:13 GMT+0000 (Coordinated Universal Time)
Attempt #3 started at : 18:25:30 GMT+0000 (Coordinated Universal Time)
Attempt #4 started at : 18:25:47 GMT+0000 (Coordinated Universal Time)
Attempt #5 started at : 18:26:04 GMT+0000 (Coordinated Universal Time)
Attempt #6 started at : 18:26:20 GMT+0000 (Coordinated Universal Time)
Attempt #7 started at : 18:26:37 GMT+0000 (Coordinated Universal Time)
Attempt #8 started at : 18:26:54 GMT+0000 (Coordinated Universal Time)
Attempt #9 started at : 18:27:11 GMT+0000 (Coordinated Universal Time)
Attempt #10 started at : 18:27:28 GMT+0000 (Coordinated Universal Time)

@RichieSams
Copy link
Contributor Author

Is this ok to merge if I update to main once more?

@djaglowski djaglowski merged commit 4f96a92 into open-telemetry:main Apr 30, 2024
170 checks passed
@github-actions github-actions bot added this to the next release milestone Apr 30, 2024
@RichieSams RichieSams deleted the processor/interval branch April 30, 2024 14:53
rimitchell pushed a commit to rimitchell/opentelemetry-collector-contrib that referenced this pull request May 8, 2024
Description:

This PR implements the main logic of this new processor.

Link to tracking Issue:

open-telemetry#29461
This is a re-opening of
[30827](open-telemetry#30827)

Testing:
I added a test for the main aggregation / export behavior, but I need to
add more to test state expiry

Documentation:

I updated the README with the updated configs, but I should add some
examples as well.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants