Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: use proper key formating as per stanza persister #3879

Merged
merged 2 commits into from
Dec 12, 2023

Conversation

VihasMakwana
Copy link
Contributor

@VihasMakwana VihasMakwana commented Nov 2, 2023

Description: Migrating offsets from SCK to SCK-Otel doesn't work. This is because of incorrect keys we use to populate the boltdb cache.
The proper keys should be
file_input.knownFiles, journald_input.lastReadCursor
But we prepend $. to them and it causes log replication.
Refer scoped persister for the current format.

Note: I'm currently in the process of testing this on my Mac. I'd appreciate you're review on this by then
Testing: Manually tested it on SCK and SCK-OTeL with locally built image

@VihasMakwana VihasMakwana requested review from a team as code owners November 2, 2023 21:37
Copy link
Contributor

github-actions bot commented Nov 2, 2023

CLA Assistant Lite bot All contributors have signed the CLA ✍️ ✅

@VihasMakwana
Copy link
Contributor Author

I have read the CLA Document and I hereby sign the CLA

srv-gh-o11y-gdi-cla added a commit to splunk/cla-agreement that referenced this pull request Nov 2, 2023
@VihasMakwana
Copy link
Contributor Author

VihasMakwana commented Nov 3, 2023

Behavior after fixing. You can pull registry.hub.docker.com/vihasdocker/splunk to test it out.
In the below image, SCK ingested till Hello 46.
I continued with SCK-OTeL (with registry.hub.docker.com/vihasdocker/splunk) and it continued from Hello 47. No duplicates.

Screenshot 2023-11-03 at 5 22 48 PM

@VihasMakwana
Copy link
Contributor Author

Following is the current behavior (buggy migration). As you can see, each event is read twice. Event though fluentd left at Hello 46. This is because of the incorrect key we use in boltdb.

Screenshot 2023-11-03 at 5 27 52 PM

Offset int64
Fingerprint *Fingerprint
Offset int64
FileAttributes map[string]any
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what FileAttributes doing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're trying to convert fluentd's format to the following struct as per OTeL's format. We get Offset and Fingerprint by reading the fluentd's pos_file. But the field FileAttributes is left nil.
This causes a panic when the filelogreceiver tries to access this field while decoding it (causes nil reference). Setting it to an empty map resolves this.

Screenshot 2023-11-04 at 1 28 06 AM

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, thanks!

@@ -110,7 +110,7 @@ func (m *Migrator) MigrateCustomPos(matches []string) {
if err != nil {
log.Printf("error creating a new DB client for host file checkpoints: %v", err)
}
err = client.Set("$.file_input.knownFiles", buf.Bytes())
err = client.Set("file_input.knownFiles", buf.Bytes())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have we always had this issue? Is there a breaking change in stanza that occurred?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, not sure tbh. Need to dig in through commit history.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am pretty sure we have functional tests on the chart that test this. If not, we must create some or we will have more surprises.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have any tests for migration from SCK. I'll create a PR in the chart repo covering this part.
If I understand correctly, we're moving to Golang-based functional tests, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@atoulme here's a PR to add functional test cases covering this. signalfx/splunk-otel-collector-chart#1024
PTAL! Thanks

@atoulme
Copy link
Contributor

atoulme commented Nov 29, 2023

Please fix the CI and we can look at getting this in.

@VihasMakwana
Copy link
Contributor Author

VihasMakwana commented Nov 30, 2023

Please fix the CI and we can look at getting this in.

Done.

@atoulme
Copy link
Contributor

atoulme commented Dec 7, 2023

Please rebase to get the fix on hadoop tests.

@atoulme
Copy link
Contributor

atoulme commented Dec 8, 2023

I rebased for you.

@atoulme atoulme merged commit 3e563ae into signalfx:main Dec 12, 2023
44 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Dec 12, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants