Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pkg/stanza/operator/recombine] aggregate the latter part of the split-log due to triggering the size limit #21241

Closed
yutingcaicyt opened this issue Apr 28, 2023 · 3 comments
Assignees
Labels
enhancement New feature or request pkg/stanza

Comments

@yutingcaicyt
Copy link
Contributor

Component(s)

pkg/stanza

Is your feature request related to a problem? Please describe.

At present, when using the recombine operator to combine the log entries with 'is_first_entry', if a batch triggers the size limit, it will aggregate the entries in the batch and flush it. However, when subsequent entries arrive, they will be sent one by one. For example, if we set the "max_batch_size" as 50, a complete log that consists of 100 entries comes into the recombine operator will be split like 50 + 1 + 1...+ 1, but I hope it will be 50 + 50.

Describe the solution you'd like

The reason for this phenomenon is this code:
`
// When matching on first entry, never batch partial first. Just emit immediately

case !matches && r.matchIndicatesFirst() && r.batchMap[s] == nil:
	r.addToBatch(ctx, e, s)
	return r.flushSource(s)

`
One way to solve this problem is that we can control the 'flushSource' and not delete the 'sourceBatch' from the batchMap when the size limit is triggered.

Describe alternatives you've considered

`
// When matching on first entry, never batch partial first. Just emit immediately

case !matches && r.matchIndicatesFirst() && r.batchMap[s] == nil:
	r.addToBatch(ctx, e, s)
	return r.flushSource(s)

`
Another way is deleting these codes directly. I think the code may be not necessary if only the 'is_first_entry' is correctly configured. But this requires more thinking because it will change the original logic.

Additional context

If possible I would like to work on this feature.

@yutingcaicyt yutingcaicyt added enhancement New feature or request needs triage New item requiring triage labels Apr 28, 2023
@github-actions
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@atoulme atoulme removed the needs triage New item requiring triage label Apr 29, 2023
@atoulme
Copy link
Contributor

atoulme commented Apr 29, 2023

Happy to assign to you to look into it. Would you please create a performance test to see how this affects the performance?

@yutingcaicyt
Copy link
Contributor Author

Happy to assign to you to look into it. Would you please create a performance test to see how this affects the performance?

OK, I will test it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request pkg/stanza
Projects
None yet
Development

No branches or pull requests

3 participants