-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[pkg/stanza] Fileconsumer TestFlushPeriodEOF test failing on Windows #32715
Comments
Pinging code owners:
%s See Adding Labels via Comments if you do not have permissions to add labels yourself. |
Could this be related to #32100? @OverOrion @ChrsMark |
I think it is highly likely to be related. I'll try to have a look at it this week, not sure if I can before SIG though. EDIT: maybe it has something to do with using \r\n instead of \n? |
That's likely :) (windows 🙈). |
So for some reason on Windows the second token is not collected if we don't append a new line ( It seems to be fixed at #32734. |
See #32715 This also adds a bit more debugging info for other tests which fail on the same expectation, since it's not very obvious what was expected vs actually found.
…telemetry#32946) See open-telemetry#32715 This also adds a bit more debugging info for other tests which fail on the same expectation, since it's not very obvious what was expected vs actually found.
When I checked the related code there wasn't anything that stood out Maybe in the future it would be worth investigating all the skipped tests on Windows, hopefully they could be (re)enabled. I think this issue could be closed since #32946 got merged @crobert-1. |
I still think that we might have an issue on windows 🤔 . To my mind the test clearly proves that on windows we don't manage the "hanging" situation properly (see #32734 (comment)), right? |
Yes we should leave this open until the issue is fixed on windows. Skipping is just a hack to not get in everyone else's way. |
I took a look at this and it seems a frequent issue when running tests on Windows: timing and synchronization. From my debugging the test code is relying on the 5ns flush period to ensure that the read of the second token, the You can see the timing differences with the following code: package main
import (
"fmt"
"runtime"
"time"
)
func main() {
// Count how many times the loop runs until 5 nanoseconds have passed.
start := time.Now()
count := 0
for {
count++
if time.Since(start) > 5*time.Nanosecond {
break
}
}
end := time.Now()
fmt.Println("GOOS:", runtime.GOOS)
fmt.Println("Count:", count)
fmt.Println("Duration:", end.Sub(start))
}
|
Thanks for looking into and explaining this @pjanotti. The same problem is likely behind many of the flaky tests we've previously disabled on windows. I spent a little time reading up on this and found that the core Golang packages have some strategies for dealing with this in their own tests and benchmarks. Notably, I found this PR which appears to implement a solution, but the latest source has moved away from it for reasons I didn't find. I think it would be quite a benefit to the repo if we could figure out a reasonable solution like this and use it whenever timing is involved in tests. |
Maybe something like https://github.com/golang/go/blob/5881ae742fc7eaa9b7d61b4ba37598c42aaa4753/src/testing/testing_windows.go#L47-L70 could do the trick here? However, is my assumption correct that the time precision issue can hit a real use case as well instead of only the tests? I wonder if instead of if time.Since(s.LastDataChange) > period we could define+use a golang/go#67066 also looks related. |
@ChrsMark, I looked briefly at the most recent |
In any case this issue looks to be a valid one. Shall we remove the |
Component(s)
pkg/stanza
Describe the issue you're reporting
Failing CI/CD links:
https://github.com/open-telemetry/opentelemetry-collector-contrib/actions/runs/8849403200/job/24301322178
https://github.com/open-telemetry/opentelemetry-collector-contrib/actions/runs/8850888073/job/24306184699?pr=32529
Failure output:
This has been failing pretty consistently (but not quite everytime) for the last few days, I haven't been able to determine why it recently started failing more frequently.
The text was updated successfully, but these errors were encountered: