Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intermittent test failures in Kafka integration #1735

Closed
tomas-langer opened this issue May 6, 2020 · 4 comments
Closed

Intermittent test failures in Kafka integration #1735

tomas-langer opened this issue May 6, 2020 · 4 comments
Assignees
Labels
bug Something isn't working MP P1
Projects
Milestone

Comments

@tomas-langer
Copy link
Member

Branch master
KafkaCdiExtensionTest fails on line 468 with timeout - as the timeout is configured to 30 seconds, this is most likely a race condition or a bug.
See https://builds.helidon.io/51A9EA2726241E0093DFA5D5139F3E6B/logs/14

@tomas-langer tomas-langer added bug Something isn't working MP labels May 6, 2020
@tomas-langer tomas-langer added this to Needs triage in Backlog via automation May 6, 2020
@m0mus m0mus added this to the 2.0.0 milestone May 7, 2020
@m0mus m0mus added the P2 label May 7, 2020
@m0mus m0mus moved this from Needs triage to High priority in Backlog May 7, 2020
@danielkec danielkec self-assigned this May 11, 2020
@danielkec
Copy link
Contributor

Confirmed flaky-kafka-test.log

@jbescos
Copy link
Member

jbescos commented May 11, 2020

I missed this, I can take it.

@jbescos
Copy link
Member

jbescos commented May 11, 2020

I am not able to reproduce this locally in my computer after 40 tries.

I don't think there is a race condition, before every test we wait that the KafkaConsumer has the partitions assigned. This is done in the @beforeeach after the comment 'Wait till consumers are ready'. Only after that, we produce new events, so consumers should be ready to consume them.

It could be a bug, but I should be able to reproduce it after so many tries.

Other possibility:

In the attached file of Daniel we can see that there are many events after the line '==========> test error()' related to the test someEventsNoAckWithOnePartition().

That is because in the test someEventsNoAckWithOnePartition() some events are not committed, so every new test execution after that is reading that events again and again, overloading a bit.

Correct me if I'm wrong but this tool we have for building is twice slow than the previous one we had. Maybe it is because of that.

For the time being I will decrease the number of events in the test so we can verify if this theory is correct or not.

@danielkec
Copy link
Contributor

Fixed by #1918

Backlog automation moved this from High priority to Closed Jun 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working MP P1
Projects
Backlog
  
Closed
Development

No branches or pull requests

5 participants