Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TestWindow_Distribution is flaky #1586

Closed
Luflosi opened this issue Feb 26, 2022 · 3 comments · Fixed by #1589
Closed

TestWindow_Distribution is flaky #1586

Luflosi opened this issue Feb 26, 2022 · 3 comments · Fixed by #1589
Assignees
Labels
effort/hours Estimated to take one or several hours exp/beginner Can be confidently tackled by newcomers good first issue Good issue for new contributors kind/bug A bug in existing code (including security flaws) P1 High: Likely tackled by core team if no one steps up

Comments

@Luflosi
Copy link

Luflosi commented Feb 26, 2022

Additional information:

  • OS: macOS
  • IPFS Cluster version: 0.14.5
  • Installation method: built from source

Describe the bug:
I tried updating the version of ipfs-cluster in Nixpkgs from 0.14.4 to 0.14.5: NixOS/nixpkgs#160460. A bot built the new version and ran the test suite but the test TestWindow_Distribution failed four times: 1, 2, 3, 4
Sometimes the test TestWindow_Distribution/increasing_latency_distribution failed and sometimes TestWindow_Distribution/random_latency_distribution failed.
Here are the important error messages from all test failures in an arbitrary order:

  • window_test.go:400: want: [4 3 3 2 2 1 1], got: [4 3 3 3 3 1 1]
  • window_test.go:400: want: [4 3 3 2 2 1 1], got: [5 4 3 2 2 1 1]
  • window_test.go:400: want: [18 11 8 7 9 3 1 4], got: [19 12 9 8 9 4 1 5]
  • window_test.go:400: want: [18 11 8 7 9 3 1 4], got: [19 12 9 8 9 4 1 5]
  • window_test.go:400: want: [18 11 8 7 9 3 1 4], got: [18 11 8 7 9 4 1 4]

The test failures only occurred on macOS (both x86-64 and aarch64), while the tests did not fail on Linux, but I don't think this has anything to do with the OS. Maybe the macOS build bots are just slower or more overloaded than the Linux bots.
As you can see above, the values in the got array are often one higher than the values in the want array. I think the tests depend on the precise timing of the scheduler and general execution speed. A simple fix would be to sleep for 100s of milliseconds by multiplying with 100 instead of sleeping for 10s of milliseconds here:
https://github.com/ipfs/ipfs-cluster/blob/9e35b7bc9ba7fa3ad6153531ebd6d6a8c985d7d4/monitor/metrics/window_test.go#L372-L379
Alternatively, allowing the values in the got array to be one higher than the values in the want array would also work.

@Luflosi Luflosi added kind/bug A bug in existing code (including security flaws) need/triage Needs initial labeling and prioritization labels Feb 26, 2022
@welcome
Copy link

welcome bot commented Feb 26, 2022

Thank you for submitting your first issue to this repository! A maintainer will be here shortly to triage and review.
In the meantime, please double-check that you have provided all the necessary information to make this process easy! Any information that can help save additional round trips is useful! We currently aim to give initial feedback within two business days. If this does not happen, feel free to leave a comment.
Please keep an eye on how this issue will be labeled, as labels give an overview of priorities, assignments and additional actions requested by the maintainers:

  • "Priority" labels will show how urgent this is for the team.
  • "Status" labels will show if this is ready to be worked on, blocked, or in progress.
  • "Need" labels will indicate if additional input or analysis is required.

Finally, remember to use https://discuss.ipfs.io if you just need general support.

@hsanjuan
Copy link
Collaborator

Thank you for this report. I think this function is a left over from #939 where we removed the part of the code that used it. I have no idea why it would start failing though. I am inclined to just remove it and the tests.

@hsanjuan hsanjuan added effort/hours Estimated to take one or several hours exp/beginner Can be confidently tackled by newcomers good first issue Good issue for new contributors help wanted Seeking public contribution on this issue P1 High: Likely tackled by core team if no one steps up and removed need/triage Needs initial labeling and prioritization labels Feb 26, 2022
hsanjuan added a commit that referenced this issue Feb 28, 2022
@hsanjuan hsanjuan added this to the Release v0.14.6 milestone Feb 28, 2022
@hsanjuan hsanjuan removed the help wanted Seeking public contribution on this issue label Feb 28, 2022
@hsanjuan hsanjuan self-assigned this Feb 28, 2022
@Luflosi
Copy link
Author

Luflosi commented Feb 28, 2022

Thanks for the quick fix!

Luflosi added a commit to Luflosi/nixpkgs that referenced this issue Mar 1, 2022
https://github.com/ipfs/ipfs-cluster/releases/tag/v0.14.5
The patch removes a flaky test, see ipfs-cluster/ipfs-cluster#1586.
Also build ipfs-cluster with Go 1.17 as it builds fine and upstream is even using that version.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
effort/hours Estimated to take one or several hours exp/beginner Can be confidently tackled by newcomers good first issue Good issue for new contributors kind/bug A bug in existing code (including security flaws) P1 High: Likely tackled by core team if no one steps up
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants