Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/telemetry/internal/upload: TestRun_Concurrent failures #67737

Open
gopherbot opened this issue May 30, 2024 · 15 comments
Open

x/telemetry/internal/upload: TestRun_Concurrent failures #67737

gopherbot opened this issue May 30, 2024 · 15 comments
Assignees
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. telemetry x/telemetry issues
Milestone

Comments

@gopherbot
Copy link
Contributor

#!watchflakes
default <- pkg == "golang.org/x/telemetry/internal/upload" && test == "TestRun_Concurrent"

Issue created automatically to collect these failures.

Example (log):

=== RUN   TestRun_Concurrent
    run_test.go:66: 17:16:43.368589 findwork.go:35: Finding work: mode on, asof 2023-05-31 00:00:00 +0000 UTC
    run_test.go:66: 17:16:43.368622 findwork.go:35: Finding work: mode on, asof 2023-05-31 00:00:00 +0000 UTC
    run_test.go:66: 17:16:43.368646 findwork.go:35: Finding work: mode on, asof 2023-05-31 00:00:00 +0000 UTC
    run_test.go:66: 17:16:43.369582 findwork.go:35: Finding work: mode on, asof 2023-05-31 00:00:00 +0000 UTC
    run_test.go:66: 17:16:43.369752 findwork.go:50: Collecting count file upload.test-devel-darwin-amd64-2024-05-08.v1.count
    run_test.go:66: 17:16:43.369861 findwork.go:50: Collecting count file upload.test-devel-darwin-amd64-2024-05-08.v1.count
    run_test.go:66: 17:16:43.369616 findwork.go:50: Collecting count file upload.test-devel-darwin-amd64-2024-05-08.v1.count
    run_test.go:66: 17:16:43.370231 findwork.go:50: Collecting count file upload.test-devel-darwin-amd64-2024-05-09.v1.count
    run_test.go:66: 17:16:43.370245 findwork.go:50: Collecting count file upload.test-devel-darwin-amd64-2024-05-09.v1.count
...
    run_test.go:66: 17:16:43.388046 reports.go:113: remove /Users/swarming/.swarming/w/ir/x/t/TestRun_Concurrent2569094852/001/local/upload.test-devel-darwin-amd64-2024-05-21.v1.count: no such file or directory failed to remove /Users/swarming/.swarming/w/ir/x/t/TestRun_Concurrent2569094852/001/local/upload.test-devel-darwin-amd64-2024-05-21.v1.count
    run_test.go:66: 17:16:43.388069 reports.go:64: Failed to create report for 2024-05-23: local report /Users/swarming/.swarming/w/ir/x/t/TestRun_Concurrent2569094852/001/local/local.2024-05-23.json already exists
    run_test.go:66: 17:16:43.388083 run.go:157: Uploading 0 reports
    run_test.go:66: 17:16:43.387658 reports.go:113: remove /Users/swarming/.swarming/w/ir/x/t/TestRun_Concurrent2569094852/001/local/upload.test-devel-darwin-amd64-2024-05-21.v1.count: no such file or directory failed to remove /Users/swarming/.swarming/w/ir/x/t/TestRun_Concurrent2569094852/001/local/upload.test-devel-darwin-amd64-2024-05-21.v1.count
    run_test.go:66: 17:16:43.388107 reports.go:64: Failed to create report for 2024-05-23: local report /Users/swarming/.swarming/w/ir/x/t/TestRun_Concurrent2569094852/001/local/local.2024-05-23.json already exists
    run_test.go:66: 17:16:43.388133 run.go:157: Uploading 1 reports
    run_test.go:66: 17:16:43.389401 upload.go:111: Uploaded 2024-05-23.json to "http://127.0.0.1:51325/2024-05-23"
    run_test.go:66: 17:16:43.389493 upload.go:111: Uploaded 2024-05-16.json to "http://127.0.0.1:51325/2024-05-16"
    run_test.go:593: uploaded 13 total observations, want 14
--- FAIL: TestRun_Concurrent (0.46s)

watchflakes

@gopherbot gopherbot added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label May 30, 2024
@gopherbot
Copy link
Contributor Author

Found new dashboard test flakes for:

#!watchflakes
default <- pkg == "golang.org/x/telemetry/internal/upload" && test == "TestRun_Concurrent"
2024-05-22 23:36 x_telemetry-go1.21-darwin-amd64_13 telemetry@39ace7a4 release-branch.go1.21@9488a444 x/telemetry/internal/upload.TestRun_Concurrent (log)
=== RUN   TestRun_Concurrent
    run_test.go:66: 17:16:43.368589 findwork.go:35: Finding work: mode on, asof 2023-05-31 00:00:00 +0000 UTC
    run_test.go:66: 17:16:43.368622 findwork.go:35: Finding work: mode on, asof 2023-05-31 00:00:00 +0000 UTC
    run_test.go:66: 17:16:43.368646 findwork.go:35: Finding work: mode on, asof 2023-05-31 00:00:00 +0000 UTC
    run_test.go:66: 17:16:43.369582 findwork.go:35: Finding work: mode on, asof 2023-05-31 00:00:00 +0000 UTC
    run_test.go:66: 17:16:43.369752 findwork.go:50: Collecting count file upload.test-devel-darwin-amd64-2024-05-08.v1.count
    run_test.go:66: 17:16:43.369861 findwork.go:50: Collecting count file upload.test-devel-darwin-amd64-2024-05-08.v1.count
    run_test.go:66: 17:16:43.369616 findwork.go:50: Collecting count file upload.test-devel-darwin-amd64-2024-05-08.v1.count
    run_test.go:66: 17:16:43.370231 findwork.go:50: Collecting count file upload.test-devel-darwin-amd64-2024-05-09.v1.count
    run_test.go:66: 17:16:43.370245 findwork.go:50: Collecting count file upload.test-devel-darwin-amd64-2024-05-09.v1.count
...
    run_test.go:66: 17:16:43.388046 reports.go:113: remove /Users/swarming/.swarming/w/ir/x/t/TestRun_Concurrent2569094852/001/local/upload.test-devel-darwin-amd64-2024-05-21.v1.count: no such file or directory failed to remove /Users/swarming/.swarming/w/ir/x/t/TestRun_Concurrent2569094852/001/local/upload.test-devel-darwin-amd64-2024-05-21.v1.count
    run_test.go:66: 17:16:43.388069 reports.go:64: Failed to create report for 2024-05-23: local report /Users/swarming/.swarming/w/ir/x/t/TestRun_Concurrent2569094852/001/local/local.2024-05-23.json already exists
    run_test.go:66: 17:16:43.388083 run.go:157: Uploading 0 reports
    run_test.go:66: 17:16:43.387658 reports.go:113: remove /Users/swarming/.swarming/w/ir/x/t/TestRun_Concurrent2569094852/001/local/upload.test-devel-darwin-amd64-2024-05-21.v1.count: no such file or directory failed to remove /Users/swarming/.swarming/w/ir/x/t/TestRun_Concurrent2569094852/001/local/upload.test-devel-darwin-amd64-2024-05-21.v1.count
    run_test.go:66: 17:16:43.388107 reports.go:64: Failed to create report for 2024-05-23: local report /Users/swarming/.swarming/w/ir/x/t/TestRun_Concurrent2569094852/001/local/local.2024-05-23.json already exists
    run_test.go:66: 17:16:43.388133 run.go:157: Uploading 1 reports
    run_test.go:66: 17:16:43.389401 upload.go:111: Uploaded 2024-05-23.json to "http://127.0.0.1:51325/2024-05-23"
    run_test.go:66: 17:16:43.389493 upload.go:111: Uploaded 2024-05-16.json to "http://127.0.0.1:51325/2024-05-16"
    run_test.go:593: uploaded 13 total observations, want 14
--- FAIL: TestRun_Concurrent (0.46s)
2024-05-22 23:36 x_telemetry-go1.22-darwin-amd64_11 telemetry@39ace7a4 release-branch.go1.22@74565750 x/telemetry/internal/upload.TestRun_Concurrent (log)
=== RUN   TestRun_Concurrent
    run_test.go:66: 18:17:30.218262 findwork.go:35: Finding work: mode on, asof 2023-05-31 00:00:00 +0000 UTC
    run_test.go:66: 18:17:30.218366 findwork.go:35: Finding work: mode on, asof 2023-05-31 00:00:00 +0000 UTC
    run_test.go:66: 18:17:30.218650 findwork.go:50: Collecting count file upload.test-devel-darwin-amd64-2024-05-08.v1.count
    run_test.go:66: 18:17:30.218810 findwork.go:50: Collecting count file upload.test-devel-darwin-amd64-2024-05-08.v1.count
    run_test.go:66: 18:17:30.218918 findwork.go:50: Collecting count file upload.test-devel-darwin-amd64-2024-05-09.v1.count
    run_test.go:66: 18:17:30.218911 findwork.go:35: Finding work: mode on, asof 2023-05-31 00:00:00 +0000 UTC
    run_test.go:66: 18:17:30.219034 findwork.go:35: Finding work: mode on, asof 2023-05-31 00:00:00 +0000 UTC
    run_test.go:66: 18:17:30.219058 findwork.go:50: Collecting count file upload.test-devel-darwin-amd64-2024-05-08.v1.count
    run_test.go:66: 18:17:30.219069 findwork.go:50: Collecting count file upload.test-devel-darwin-amd64-2024-05-09.v1.count
...
    run_test.go:66: 18:17:30.226116 reports.go:64: Failed to create report for 2024-05-23: local report /Users/swarming/.swarming/w/ir/x/t/TestRun_Concurrent2249583122/001/local/local.2024-05-23.json already exists
    run_test.go:66: 18:17:30.226158 reports.go:113: remove /Users/swarming/.swarming/w/ir/x/t/TestRun_Concurrent2249583122/001/local/upload.test-devel-darwin-amd64-2024-05-20.v1.count: no such file or directory failed to remove /Users/swarming/.swarming/w/ir/x/t/TestRun_Concurrent2249583122/001/local/upload.test-devel-darwin-amd64-2024-05-20.v1.count
    run_test.go:66: 18:17:30.226175 reports.go:113: remove /Users/swarming/.swarming/w/ir/x/t/TestRun_Concurrent2249583122/001/local/upload.test-devel-darwin-amd64-2024-05-21.v1.count: no such file or directory failed to remove /Users/swarming/.swarming/w/ir/x/t/TestRun_Concurrent2249583122/001/local/upload.test-devel-darwin-amd64-2024-05-21.v1.count
    run_test.go:66: 18:17:30.226184 reports.go:64: Failed to create report for 2024-05-23: local report /Users/swarming/.swarming/w/ir/x/t/TestRun_Concurrent2249583122/001/local/local.2024-05-23.json already exists
    run_test.go:66: 18:17:30.226191 run.go:157: Uploading 0 reports
    run_test.go:66: 18:17:30.226164 run.go:157: Uploading 0 reports
    run_test.go:66: 18:17:30.226540 upload.go:111: Uploaded 2024-05-23.json to "http://127.0.0.1:58936/2024-05-23"
    run_test.go:66: 18:17:30.226624 upload.go:111: Uploaded 2024-05-16.json to "http://127.0.0.1:58936/2024-05-16"
    run_test.go:593: uploaded 13 total observations, want 14
--- FAIL: TestRun_Concurrent (0.20s)
2024-05-22 23:36 x_telemetry-go1.22-darwin-amd64_14 telemetry@39ace7a4 release-branch.go1.22@74565750 x/telemetry/internal/upload.TestRun_Concurrent (log)
=== RUN   TestRun_Concurrent
    run_test.go:66: 18:58:02.893007 findwork.go:35: Finding work: mode on, asof 2023-05-31 00:00:00 +0000 UTC
    run_test.go:66: 18:58:02.895442 findwork.go:50: Collecting count file upload.test-devel-darwin-amd64-2024-05-08.v1.count
    run_test.go:66: 18:58:02.896315 findwork.go:35: Finding work: mode on, asof 2023-05-31 00:00:00 +0000 UTC
    run_test.go:66: 18:58:02.897332 findwork.go:50: Collecting count file upload.test-devel-darwin-amd64-2024-05-09.v1.count
    run_test.go:66: 18:58:02.897724 findwork.go:50: Collecting count file upload.test-devel-darwin-amd64-2024-05-08.v1.count
    run_test.go:66: 18:58:02.904669 findwork.go:50: Collecting count file upload.test-devel-darwin-amd64-2024-05-10.v1.count
    run_test.go:66: 18:58:02.908755 findwork.go:35: Finding work: mode on, asof 2023-05-31 00:00:00 +0000 UTC
    run_test.go:66: 18:58:02.908466 findwork.go:50: Collecting count file upload.test-devel-darwin-amd64-2024-05-09.v1.count
    run_test.go:66: 18:58:02.909854 findwork.go:50: Collecting count file upload.test-devel-darwin-amd64-2024-05-08.v1.count
...
    run_test.go:66: 18:58:02.982806 findwork.go:97: Already uploaded: 2024-05-23.json
    run_test.go:66: 18:58:02.983172 reports.go:34: Last week: 2024-05-23, today: 2024-05-30
    run_test.go:66: 18:58:02.983244 run.go:157: Uploading 0 reports
    run_test.go:66: 18:58:02.994336 findwork.go:35: Finding work: mode on, asof 2023-05-31 00:00:00 +0000 UTC
    run_test.go:66: 18:58:02.994948 findwork.go:97: Already uploaded: 2024-05-16.json
    run_test.go:66: 18:58:02.995046 findwork.go:97: Already uploaded: 2024-05-23.json
    run_test.go:66: 18:58:02.995576 reports.go:34: Last week: 2024-05-23, today: 2024-05-30
    run_test.go:66: 18:58:02.995652 run.go:157: Uploading 0 reports
    run_test.go:593: uploaded 13 total observations, want 14
--- FAIL: TestRun_Concurrent (0.98s)
2024-05-22 23:36 x_telemetry-go1.22-windows-amd64 telemetry@39ace7a4 release-branch.go1.22@74565750 x/telemetry/internal/upload.TestRun_Concurrent (log)
=== RUN   TestRun_Concurrent
    run_test.go:66: 18:25:34.050596 findwork.go:35: Finding work: mode on, asof 2023-05-31 00:00:00 +0000 UTC
    run_test.go:66: 18:25:34.050596 findwork.go:35: Finding work: mode on, asof 2023-05-31 00:00:00 +0000 UTC
    run_test.go:66: 18:25:34.050596 findwork.go:50: Collecting count file upload.test-devel-windows-amd64-2024-05-08.v1.count
    run_test.go:66: 18:25:34.050596 findwork.go:50: Collecting count file upload.test-devel-windows-amd64-2024-05-08.v1.count
    run_test.go:66: 18:25:34.050596 findwork.go:35: Finding work: mode on, asof 2023-05-31 00:00:00 +0000 UTC
    run_test.go:66: 18:25:34.051487 findwork.go:50: Collecting count file upload.test-devel-windows-amd64-2024-05-09.v1.count
    run_test.go:66: 18:25:34.051487 findwork.go:35: Finding work: mode on, asof 2023-05-31 00:00:00 +0000 UTC
    run_test.go:66: 18:25:34.051487 findwork.go:50: Collecting count file upload.test-devel-windows-amd64-2024-05-08.v1.count
    run_test.go:66: 18:25:34.051487 findwork.go:35: Finding work: mode on, asof 2023-05-31 00:00:00 +0000 UTC
...
    run_test.go:66: 18:25:34.061359 reports.go:113: remove C:\b\s\w\ir\x\t\TestRun_Concurrent2928643688\001\local\upload.test-devel-windows-amd64-2024-05-18.v1.count: The system cannot find the file specified. failed to remove C:\b\s\w\ir\x\t\TestRun_Concurrent2928643688\001\local\upload.test-devel-windows-amd64-2024-05-18.v1.count
    run_test.go:66: 18:25:34.061359 reports.go:113: remove C:\b\s\w\ir\x\t\TestRun_Concurrent2928643688\001\local\upload.test-devel-windows-amd64-2024-05-19.v1.count: The system cannot find the file specified. failed to remove C:\b\s\w\ir\x\t\TestRun_Concurrent2928643688\001\local\upload.test-devel-windows-amd64-2024-05-19.v1.count
    run_test.go:66: 18:25:34.061359 reports.go:113: remove C:\b\s\w\ir\x\t\TestRun_Concurrent2928643688\001\local\upload.test-devel-windows-amd64-2024-05-20.v1.count: The system cannot find the file specified. failed to remove C:\b\s\w\ir\x\t\TestRun_Concurrent2928643688\001\local\upload.test-devel-windows-amd64-2024-05-20.v1.count
    run_test.go:66: 18:25:34.062229 reports.go:113: remove C:\b\s\w\ir\x\t\TestRun_Concurrent2928643688\001\local\upload.test-devel-windows-amd64-2024-05-21.v1.count: The system cannot find the file specified. failed to remove C:\b\s\w\ir\x\t\TestRun_Concurrent2928643688\001\local\upload.test-devel-windows-amd64-2024-05-21.v1.count
    run_test.go:66: 18:25:34.062229 reports.go:64: Failed to create report for 2024-05-23: local report C:\b\s\w\ir\x\t\TestRun_Concurrent2928643688\001\local\local.2024-05-23.json already exists
    run_test.go:66: 18:25:34.062229 run.go:157: Uploading 1 reports
    run_test.go:66: 18:25:34.063844 upload.go:111: Uploaded 2024-05-16.json to "http://127.0.0.1:49870/2024-05-16"
    run_test.go:66: 18:25:34.063844 upload.go:111: Uploaded 2024-05-23.json to "http://127.0.0.1:49870/2024-05-23"
    run_test.go:593: uploaded 13 total observations, want 14
--- FAIL: TestRun_Concurrent (0.30s)

watchflakes

@gopherbot gopherbot added the telemetry x/telemetry issues label May 30, 2024
@gopherbot gopherbot added this to the Unreleased milestone May 30, 2024
@gopherbot
Copy link
Contributor Author

Found new dashboard test flakes for:

#!watchflakes
default <- pkg == "golang.org/x/telemetry/internal/upload" && test == "TestRun_Concurrent"
2024-06-03 22:45 x_telemetry-gotip-linux-arm telemetry@f2b69109 go@535f81e7 x/telemetry/internal/upload.TestRun_Concurrent (log)
=== RUN   TestRun_Concurrent
    run_test.go:66: 15:37:57.528458 findwork.go:35: Finding work: mode on, asof 2023-06-05 00:00:00 +0000 UTC
    run_test.go:66: 15:37:57.528573 findwork.go:50: Collecting count file upload.test-devel-linux-arm-2024-05-14.v1.count
    run_test.go:66: 15:37:57.528621 findwork.go:50: Collecting count file upload.test-devel-linux-arm-2024-05-15.v1.count
    run_test.go:66: 15:37:57.528463 findwork.go:35: Finding work: mode on, asof 2023-06-05 00:00:00 +0000 UTC
    run_test.go:66: 15:37:57.528753 findwork.go:50: Collecting count file upload.test-devel-linux-arm-2024-05-14.v1.count
    run_test.go:66: 15:37:57.528814 findwork.go:50: Collecting count file upload.test-devel-linux-arm-2024-05-15.v1.count
    run_test.go:66: 15:37:57.528863 findwork.go:50: Collecting count file upload.test-devel-linux-arm-2024-05-16.v1.count
    run_test.go:66: 15:37:57.528913 findwork.go:50: Collecting count file upload.test-devel-linux-arm-2024-05-17.v1.count
    run_test.go:66: 15:37:57.528961 findwork.go:50: Collecting count file upload.test-devel-linux-arm-2024-05-18.v1.count
...
    run_test.go:66: 15:37:57.531860 reports.go:113: remove /home/swarming/.swarming/w/ir/x/t/TestRun_Concurrent3059370928/001/local/upload.test-devel-linux-arm-2024-05-24.v1.count: no such file or directory failed to remove /home/swarming/.swarming/w/ir/x/t/TestRun_Concurrent3059370928/001/local/upload.test-devel-linux-arm-2024-05-24.v1.count
    run_test.go:66: 15:37:57.531877 reports.go:113: remove /home/swarming/.swarming/w/ir/x/t/TestRun_Concurrent3059370928/001/local/upload.test-devel-linux-arm-2024-05-25.v1.count: no such file or directory failed to remove /home/swarming/.swarming/w/ir/x/t/TestRun_Concurrent3059370928/001/local/upload.test-devel-linux-arm-2024-05-25.v1.count
    run_test.go:66: 15:37:57.531896 reports.go:113: remove /home/swarming/.swarming/w/ir/x/t/TestRun_Concurrent3059370928/001/local/upload.test-devel-linux-arm-2024-05-26.v1.count: no such file or directory failed to remove /home/swarming/.swarming/w/ir/x/t/TestRun_Concurrent3059370928/001/local/upload.test-devel-linux-arm-2024-05-26.v1.count
    run_test.go:66: 15:37:57.531913 reports.go:113: remove /home/swarming/.swarming/w/ir/x/t/TestRun_Concurrent3059370928/001/local/upload.test-devel-linux-arm-2024-05-27.v1.count: no such file or directory failed to remove /home/swarming/.swarming/w/ir/x/t/TestRun_Concurrent3059370928/001/local/upload.test-devel-linux-arm-2024-05-27.v1.count
    run_test.go:66: 15:37:57.531931 reports.go:64: Failed to create report for 2024-05-29: local report /home/swarming/.swarming/w/ir/x/t/TestRun_Concurrent3059370928/001/local/local.2024-05-29.json already exists
    run_test.go:66: 15:37:57.531944 run.go:157: Uploading 0 reports
    run_test.go:66: 15:37:57.531905 upload.go:111: Uploaded 2024-05-22.json to "http://127.0.0.1:39247/2024-05-22"
    run_test.go:66: 15:37:57.531994 upload.go:54: open /home/swarming/.swarming/w/ir/x/t/TestRun_Concurrent3059370928/001/local/2024-05-29.json: no such file or directory reading /home/swarming/.swarming/w/ir/x/t/TestRun_Concurrent3059370928/001/local/2024-05-29.json
    run_test.go:593: uploaded 21 total observations, want 14
--- FAIL: TestRun_Concurrent (0.07s)

watchflakes

@gopherbot
Copy link
Contributor Author

Found new dashboard test flakes for:

#!watchflakes
default <- pkg == "golang.org/x/telemetry/internal/upload" && test == "TestRun_Concurrent"
2024-06-21 19:41 x_telemetry-gotip-darwin-amd64_11 telemetry@a740542b go@e8ee1dc4 x/telemetry/internal/upload.TestRun_Concurrent (log)
=== RUN   TestRun_Concurrent
    run_test.go:66: 17:56:25.996832 findwork.go:35: Finding work: mode on, asof 2023-06-24 00:00:00 +0000 UTC
    run_test.go:66: 17:56:25.997280 findwork.go:35: Finding work: mode on, asof 2023-06-24 00:00:00 +0000 UTC
    run_test.go:66: 17:56:25.997360 findwork.go:50: Collecting count file upload.test-devel-darwin-amd64-2024-06-01.v1.count
    run_test.go:66: 17:56:25.997388 findwork.go:50: Collecting count file upload.test-devel-darwin-amd64-2024-06-01.v1.count
    run_test.go:66: 17:56:25.997675 findwork.go:50: Collecting count file upload.test-devel-darwin-amd64-2024-06-02.v1.count
    run_test.go:66: 17:56:25.997719 findwork.go:35: Finding work: mode on, asof 2023-06-24 00:00:00 +0000 UTC
    run_test.go:66: 17:56:25.997793 findwork.go:50: Collecting count file upload.test-devel-darwin-amd64-2024-06-01.v1.count
    run_test.go:66: 17:56:25.997819 findwork.go:35: Finding work: mode on, asof 2023-06-24 00:00:00 +0000 UTC
    run_test.go:66: 17:56:25.997837 findwork.go:50: Collecting count file upload.test-devel-darwin-amd64-2024-06-02.v1.count
...
    run_test.go:66: 17:56:26.004049 findwork.go:35: Finding work: mode on, asof 2023-06-24 00:00:00 +0000 UTC
    run_test.go:66: 17:56:26.004074 findwork.go:71: Uploadable: 2024-06-09.json
    run_test.go:66: 17:56:26.004125 findwork.go:71: Uploadable: 2024-06-16.json
    run_test.go:66: 17:56:26.004188 reports.go:34: Last week: , today: 2024-06-23
    run_test.go:66: 17:56:26.004197 run.go:157: Uploading 2 reports
    run_test.go:66: 17:56:26.004518 upload.go:80: Failed to acquire lock /Users/swarming/.swarming/w/ir/x/t/TestRun_Concurrent2517570491/001/upload/2024-06-09.json.lock: open /Users/swarming/.swarming/w/ir/x/t/TestRun_Concurrent2517570491/001/upload/2024-06-09.json.lock: file exists
    run_test.go:66: 17:56:26.004645 upload.go:111: Uploaded 2024-06-09.json to "http://127.0.0.1:50119/2024-06-09"
    run_test.go:66: 17:56:26.004818 upload.go:111: Uploaded 2024-06-16.json to "http://127.0.0.1:50119/2024-06-16"
    run_test.go:593: uploaded 13 total observations, want 14
--- FAIL: TestRun_Concurrent (0.18s)
2024-06-21 19:41 x_telemetry-gotip-darwin-arm64_11 telemetry@a740542b go@e8ee1dc4 x/telemetry/internal/upload.TestRun_Concurrent (log)
=== RUN   TestRun_Concurrent
    run_test.go:66: 17:56:43.716603 findwork.go:35: Finding work: mode on, asof 2023-06-24 00:00:00 +0000 UTC
    run_test.go:66: 17:56:43.716686 findwork.go:50: Collecting count file upload.test-devel-darwin-arm64-2024-06-01.v1.count
    run_test.go:66: 17:56:43.716715 findwork.go:35: Finding work: mode on, asof 2023-06-24 00:00:00 +0000 UTC
    run_test.go:66: 17:56:43.716726 findwork.go:50: Collecting count file upload.test-devel-darwin-arm64-2024-06-02.v1.count
    run_test.go:66: 17:56:43.716758 findwork.go:50: Collecting count file upload.test-devel-darwin-arm64-2024-06-01.v1.count
    run_test.go:66: 17:56:43.716761 findwork.go:50: Collecting count file upload.test-devel-darwin-arm64-2024-06-03.v1.count
    run_test.go:66: 17:56:43.716788 findwork.go:50: Collecting count file upload.test-devel-darwin-arm64-2024-06-02.v1.count
    run_test.go:66: 17:56:43.716821 findwork.go:50: Collecting count file upload.test-devel-darwin-arm64-2024-06-04.v1.count
    run_test.go:66: 17:56:43.716826 findwork.go:35: Finding work: mode on, asof 2023-06-24 00:00:00 +0000 UTC
...
    run_test.go:66: 17:56:43.719520 reports.go:113: remove /opt/s/w/ir/x/t/TestRun_Concurrent4277224640/001/local/upload.test-devel-darwin-arm64-2024-06-11.v1.count: no such file or directory failed to remove /opt/s/w/ir/x/t/TestRun_Concurrent4277224640/001/local/upload.test-devel-darwin-arm64-2024-06-11.v1.count
    run_test.go:66: 17:56:43.719527 reports.go:113: remove /opt/s/w/ir/x/t/TestRun_Concurrent4277224640/001/local/upload.test-devel-darwin-arm64-2024-06-12.v1.count: no such file or directory failed to remove /opt/s/w/ir/x/t/TestRun_Concurrent4277224640/001/local/upload.test-devel-darwin-arm64-2024-06-12.v1.count
    run_test.go:66: 17:56:43.719537 reports.go:113: remove /opt/s/w/ir/x/t/TestRun_Concurrent4277224640/001/local/upload.test-devel-darwin-arm64-2024-06-13.v1.count: no such file or directory failed to remove /opt/s/w/ir/x/t/TestRun_Concurrent4277224640/001/local/upload.test-devel-darwin-arm64-2024-06-13.v1.count
    run_test.go:66: 17:56:43.719545 reports.go:113: remove /opt/s/w/ir/x/t/TestRun_Concurrent4277224640/001/local/upload.test-devel-darwin-arm64-2024-06-14.v1.count: no such file or directory failed to remove /opt/s/w/ir/x/t/TestRun_Concurrent4277224640/001/local/upload.test-devel-darwin-arm64-2024-06-14.v1.count
    run_test.go:66: 17:56:43.719549 reports.go:64: Failed to create report for 2024-06-16: local report /opt/s/w/ir/x/t/TestRun_Concurrent4277224640/001/local/local.2024-06-16.json already exists
    run_test.go:66: 17:56:43.719553 run.go:157: Uploading 0 reports
    run_test.go:66: 17:56:43.719748 upload.go:111: Uploaded 2024-06-09.json to "http://127.0.0.1:50679/2024-06-09"
    run_test.go:66: 17:56:43.719831 upload.go:111: Uploaded 2024-06-16.json to "http://127.0.0.1:50679/2024-06-16"
    run_test.go:593: uploaded 13 total observations, want 14
--- FAIL: TestRun_Concurrent (0.07s)

watchflakes

@gopherbot
Copy link
Contributor Author

Found new dashboard test flakes for:

#!watchflakes
default <- pkg == "golang.org/x/telemetry/internal/upload" && test == "TestRun_Concurrent"
2024-06-24 21:43 x_telemetry-go1.21-darwin-amd64-nocgo telemetry@210d439e release-branch.go1.21@dfb00670 x/telemetry/internal/upload.TestRun_Concurrent (log)
=== RUN   TestRun_Concurrent
    run_test.go:66: 19:13:28.334433 findwork.go:35: Finding work: mode on, asof 2023-06-26 00:00:00 +0000 UTC
    run_test.go:66: 19:13:28.334727 findwork.go:35: Finding work: mode on, asof 2023-06-26 00:00:00 +0000 UTC
    run_test.go:66: 19:13:28.334868 findwork.go:35: Finding work: mode on, asof 2023-06-26 00:00:00 +0000 UTC
    run_test.go:66: 19:13:28.335085 findwork.go:50: Collecting count file upload.test-devel-darwin-amd64-2024-06-03.v1.count
    run_test.go:66: 19:13:28.335225 findwork.go:50: Collecting count file upload.test-devel-darwin-amd64-2024-06-03.v1.count
    run_test.go:66: 19:13:28.335280 findwork.go:50: Collecting count file upload.test-devel-darwin-amd64-2024-06-03.v1.count
    run_test.go:66: 19:13:28.335449 findwork.go:50: Collecting count file upload.test-devel-darwin-amd64-2024-06-04.v1.count
    run_test.go:66: 19:13:28.335464 findwork.go:50: Collecting count file upload.test-devel-darwin-amd64-2024-06-04.v1.count
    run_test.go:66: 19:13:28.335556 findwork.go:50: Collecting count file upload.test-devel-darwin-amd64-2024-06-04.v1.count
...
    run_test.go:66: 19:13:28.346808 upload.go:80: Failed to acquire lock /Users/swarming/.swarming/w/ir/x/t/TestRun_Concurrent2987023683/001/upload/2024-06-11.json.lock: open /Users/swarming/.swarming/w/ir/x/t/TestRun_Concurrent2987023683/001/upload/2024-06-11.json.lock: file exists
    run_test.go:66: 19:13:28.347559 upload.go:111: Uploaded 2024-06-11.json to "http://127.0.0.1:49287/2024-06-11"
    run_test.go:66: 19:13:28.347867 upload.go:111: Uploaded 2024-06-18.json to "http://127.0.0.1:49287/2024-06-18"
    run_test.go:66: 19:13:28.359325 findwork.go:35: Finding work: mode on, asof 2023-06-26 00:00:00 +0000 UTC
    run_test.go:66: 19:13:28.359426 findwork.go:97: Already uploaded: 2024-06-11.json
    run_test.go:66: 19:13:28.359443 findwork.go:97: Already uploaded: 2024-06-18.json
    run_test.go:66: 19:13:28.359506 reports.go:34: Last week: 2024-06-18, today: 2024-06-25
    run_test.go:66: 19:13:28.359519 run.go:157: Uploading 0 reports
    run_test.go:593: uploaded 13 total observations, want 14
--- FAIL: TestRun_Concurrent (0.40s)
2024-06-24 21:43 x_telemetry-go1.21-darwin-amd64_14 telemetry@210d439e release-branch.go1.21@dfb00670 x/telemetry/internal/upload.TestRun_Concurrent (log)
=== RUN   TestRun_Concurrent
    run_test.go:66: 19:11:04.892926 findwork.go:35: Finding work: mode on, asof 2023-06-26 00:00:00 +0000 UTC
    run_test.go:66: 19:11:04.893910 findwork.go:35: Finding work: mode on, asof 2023-06-26 00:00:00 +0000 UTC
    run_test.go:66: 19:11:04.894110 findwork.go:50: Collecting count file upload.test-devel-darwin-amd64-2024-06-03.v1.count
    run_test.go:66: 19:11:04.894714 findwork.go:50: Collecting count file upload.test-devel-darwin-amd64-2024-06-03.v1.count
    run_test.go:66: 19:11:04.894845 findwork.go:50: Collecting count file upload.test-devel-darwin-amd64-2024-06-04.v1.count
    run_test.go:66: 19:11:04.895121 findwork.go:50: Collecting count file upload.test-devel-darwin-amd64-2024-06-04.v1.count
    run_test.go:66: 19:11:04.895161 findwork.go:50: Collecting count file upload.test-devel-darwin-amd64-2024-06-05.v1.count
    run_test.go:66: 19:11:04.895300 findwork.go:50: Collecting count file upload.test-devel-darwin-amd64-2024-06-05.v1.count
    run_test.go:66: 19:11:04.895521 findwork.go:50: Collecting count file upload.test-devel-darwin-amd64-2024-06-06.v1.count
...
    run_test.go:66: 19:11:04.908140 reports.go:64: Failed to create report for 2024-06-11: local report /Users/swarming/.swarming/w/ir/x/t/TestRun_Concurrent2257027360/001/local/local.2024-06-11.json already exists
    run_test.go:66: 19:11:04.908298 upload.go:111: Uploaded 2024-06-18.json to "http://127.0.0.1:52771/2024-06-18"
    run_test.go:66: 19:11:04.908367 reports.go:113: remove /Users/swarming/.swarming/w/ir/x/t/TestRun_Concurrent2257027360/001/local/upload.test-devel-darwin-amd64-2024-06-11.v1.count: no such file or directory failed to remove /Users/swarming/.swarming/w/ir/x/t/TestRun_Concurrent2257027360/001/local/upload.test-devel-darwin-amd64-2024-06-11.v1.count
    run_test.go:66: 19:11:04.908401 reports.go:113: remove /Users/swarming/.swarming/w/ir/x/t/TestRun_Concurrent2257027360/001/local/upload.test-devel-darwin-amd64-2024-06-12.v1.count: no such file or directory failed to remove /Users/swarming/.swarming/w/ir/x/t/TestRun_Concurrent2257027360/001/local/upload.test-devel-darwin-amd64-2024-06-12.v1.count
    run_test.go:66: 19:11:04.908509 reports.go:113: remove /Users/swarming/.swarming/w/ir/x/t/TestRun_Concurrent2257027360/001/local/upload.test-devel-darwin-amd64-2024-06-13.v1.count: no such file or directory failed to remove /Users/swarming/.swarming/w/ir/x/t/TestRun_Concurrent2257027360/001/local/upload.test-devel-darwin-amd64-2024-06-13.v1.count
    run_test.go:66: 19:11:04.908541 reports.go:113: remove /Users/swarming/.swarming/w/ir/x/t/TestRun_Concurrent2257027360/001/local/upload.test-devel-darwin-amd64-2024-06-14.v1.count: no such file or directory failed to remove /Users/swarming/.swarming/w/ir/x/t/TestRun_Concurrent2257027360/001/local/upload.test-devel-darwin-amd64-2024-06-14.v1.count
    run_test.go:66: 19:11:04.908554 reports.go:64: Failed to create report for 2024-06-18: local report /Users/swarming/.swarming/w/ir/x/t/TestRun_Concurrent2257027360/001/local/local.2024-06-18.json already exists
    run_test.go:66: 19:11:04.908567 run.go:157: Uploading 0 reports
    run_test.go:593: uploaded 13 total observations, want 14
--- FAIL: TestRun_Concurrent (0.40s)

watchflakes

@findleyr findleyr self-assigned this Jul 2, 2024
@findleyr
Copy link
Contributor

findleyr commented Jul 2, 2024

Looks like a real race. I will take a look.

@findleyr
Copy link
Contributor

findleyr commented Jul 2, 2024

Not a race: looks like a timezone / day boundary bug 🤦.

@gopherbot
Copy link
Contributor Author

Change https://go.dev/cl/596316 mentions this issue: internal/upload: use UTC time in TestRun_Concurrent

@gopherbot
Copy link
Contributor Author

Found new dashboard test flakes for:

#!watchflakes
default <- pkg == "golang.org/x/telemetry/internal/upload" && test == "TestRun_Concurrent"
2024-07-01 17:54 x_telemetry-go1.21-windows-arm64 telemetry@4e29c787 release-branch.go1.21@12e9b968 x/telemetry/internal/upload.TestRun_Concurrent (log)
=== RUN   TestRun_Concurrent
    run_test.go:66: 21:03:41.860501 findwork.go:35: Finding work: mode on, asof 2023-07-03 00:00:00 +0000 UTC
    run_test.go:66: 21:03:41.861858 findwork.go:50: Collecting count file upload.test-devel-windows-arm64-2024-06-11.v1.count
    run_test.go:66: 21:03:41.861858 findwork.go:50: Collecting count file upload.test-devel-windows-arm64-2024-06-12.v1.count
    run_test.go:66: 21:03:41.861858 findwork.go:50: Collecting count file upload.test-devel-windows-arm64-2024-06-13.v1.count
    run_test.go:66: 21:03:41.861858 findwork.go:50: Collecting count file upload.test-devel-windows-arm64-2024-06-14.v1.count
    run_test.go:66: 21:03:41.861858 findwork.go:50: Collecting count file upload.test-devel-windows-arm64-2024-06-15.v1.count
    run_test.go:66: 21:03:41.861858 findwork.go:50: Collecting count file upload.test-devel-windows-arm64-2024-06-16.v1.count
    run_test.go:66: 21:03:41.861858 findwork.go:50: Collecting count file upload.test-devel-windows-arm64-2024-06-17.v1.count
    run_test.go:66: 21:03:41.862877 findwork.go:50: Collecting count file upload.test-devel-windows-arm64-2024-06-18.v1.count
...
    run_test.go:66: 21:03:41.870249 upload.go:80: Failed to acquire lock C:\Users\swarming\.swarming\w\ir\x\t\TestRun_Concurrent2162911338\001\upload\2024-06-19.json.lock: open C:\Users\swarming\.swarming\w\ir\x\t\TestRun_Concurrent2162911338\001\upload\2024-06-19.json.lock: The file exists.
    run_test.go:66: 21:03:41.870781 upload.go:80: Failed to acquire lock C:\Users\swarming\.swarming\w\ir\x\t\TestRun_Concurrent2162911338\001\upload\2024-06-12.json.lock: open C:\Users\swarming\.swarming\w\ir\x\t\TestRun_Concurrent2162911338\001\upload\2024-06-12.json.lock: The file exists.
    run_test.go:66: 21:03:41.869954 upload.go:80: Failed to acquire lock C:\Users\swarming\.swarming\w\ir\x\t\TestRun_Concurrent2162911338\001\upload\2024-06-12.json.lock: open C:\Users\swarming\.swarming\w\ir\x\t\TestRun_Concurrent2162911338\001\upload\2024-06-12.json.lock: The file exists.
    run_test.go:66: 21:03:41.871311 upload.go:111: Uploaded 2024-06-12.json to "http://127.0.0.1:59857/2024-06-12"
    run_test.go:66: 21:03:41.871859 upload.go:111: Uploaded 2024-06-19.json to "http://127.0.0.1:59857/2024-06-19"
    run_test.go:66: 21:03:41.871859 upload.go:80: Failed to acquire lock C:\Users\swarming\.swarming\w\ir\x\t\TestRun_Concurrent2162911338\001\upload\2024-06-26.json.lock: open C:\Users\swarming\.swarming\w\ir\x\t\TestRun_Concurrent2162911338\001\upload\2024-06-26.json.lock: The file exists.
    run_test.go:66: 21:03:41.872907 upload.go:111: Uploaded 2024-06-26.json to "http://127.0.0.1:59857/2024-06-26"
    run_test.go:66: 21:03:41.872907 upload.go:111: Uploaded 2024-06-19.json to "http://127.0.0.1:59857/2024-06-19"
    run_test.go:593: uploaded 21 total observations, want 14
--- FAIL: TestRun_Concurrent (0.59s)

watchflakes

@gopherbot
Copy link
Contributor Author

Found new dashboard test flakes for:

#!watchflakes
default <- pkg == "golang.org/x/telemetry/internal/upload" && test == "TestRun_Concurrent"
2024-07-03 14:31 x_telemetry-gotip-windows-arm64 telemetry@5dc3c48f go@7d19d508 x/telemetry/internal/upload.TestRun_Concurrent (log)
=== RUN   TestRun_Concurrent
    run_test.go:67: uploader #0: 19:16:21.122773 findwork.go:35: Finding work: mode on asof 2023-07-04 00:00:00 +0000 UTC
    run_test.go:67: uploader #0: 19:16:21.122773 findwork.go:50: Collecting count file upload.test-devel-windows-arm64-2024-06-12.v1.count
    run_test.go:67: uploader #0: 19:16:21.122773 findwork.go:50: Collecting count file upload.test-devel-windows-arm64-2024-06-13.v1.count
    run_test.go:67: uploader #0: 19:16:21.122773 findwork.go:50: Collecting count file upload.test-devel-windows-arm64-2024-06-14.v1.count
    run_test.go:67: uploader #0: 19:16:21.122773 findwork.go:50: Collecting count file upload.test-devel-windows-arm64-2024-06-15.v1.count
    run_test.go:67: uploader #0: 19:16:21.122773 findwork.go:50: Collecting count file upload.test-devel-windows-arm64-2024-06-16.v1.count
    run_test.go:67: uploader #0: 19:16:21.122773 findwork.go:50: Collecting count file upload.test-devel-windows-arm64-2024-06-17.v1.count
    run_test.go:67: uploader #0: 19:16:21.123820 findwork.go:50: Collecting count file upload.test-devel-windows-arm64-2024-06-18.v1.count
    run_test.go:67: uploader #0: 19:16:21.123820 findwork.go:50: Collecting count file upload.test-devel-windows-arm64-2024-06-19.v1.count
...
    run_test.go:67: uploader #3: 19:16:21.142872 reports.go:34: Last week: 2024-06-20, today: 2024-07-03
    run_test.go:67: uploader #3: 19:16:21.142872 run.go:157: Uploading 1 reports
    run_test.go:67: uploader #1: 19:16:21.142872 upload.go:80: Failed to acquire lock C:\Users\swarming\.swarming\w\ir\x\t\TestRun_Concurrent3441354803\001\upload\2024-06-27.json.lock: open C:\Users\swarming\.swarming\w\ir\x\t\TestRun_Concurrent3441354803\001\upload\2024-06-27.json.lock: The file exists.
    run_test.go:67: uploader #2: 19:16:21.142872 upload.go:80: Failed to acquire lock C:\Users\swarming\.swarming\w\ir\x\t\TestRun_Concurrent3441354803\001\upload\2024-06-27.json.lock: open C:\Users\swarming\.swarming\w\ir\x\t\TestRun_Concurrent3441354803\001\upload\2024-06-27.json.lock: The file exists.
    run_test.go:67: uploader #3: 19:16:21.143384 upload.go:80: Failed to acquire lock C:\Users\swarming\.swarming\w\ir\x\t\TestRun_Concurrent3441354803\001\upload\2024-06-27.json.lock: open C:\Users\swarming\.swarming\w\ir\x\t\TestRun_Concurrent3441354803\001\upload\2024-06-27.json.lock: The file exists.
    run_test.go:67: uploader #4: 19:16:21.143384 upload.go:111: Uploaded 2024-06-27.json to "http://127.0.0.1:50692/2024-06-27"
    run_test.go:67: uploader #0: 19:16:21.143384 upload.go:111: Uploaded 2024-06-20.json to "http://127.0.0.1:50692/2024-06-20"
    run_test.go:67: uploader #0: 19:16:21.143936 upload.go:54: open C:\Users\swarming\.swarming\w\ir\x\t\TestRun_Concurrent3441354803\001\local\2024-06-27.json: The system cannot find the file specified. reading C:\Users\swarming\.swarming\w\ir\x\t\TestRun_Concurrent3441354803\001\local\2024-06-27.json
    run_test.go:597: uploaded 21 total observations, want 14
--- FAIL: TestRun_Concurrent (0.34s)

watchflakes

@gopherbot
Copy link
Contributor Author

Found new dashboard test flakes for:

#!watchflakes
default <- pkg == "golang.org/x/telemetry/internal/upload" && test == "TestRun_Concurrent"
2024-07-11 21:24 x_telemetry-go1.23-windows-arm64 telemetry@9d76c3d5 release-branch.go1.23@7dff7439 x/telemetry/internal/upload.TestRun_Concurrent (log)
=== RUN   TestRun_Concurrent
    run_test.go:67: uploader #4: 21:26:10.574207 findwork.go:35: Finding work: mode on asof 2023-07-12 00:00:00 +0000 UTC
    run_test.go:67: uploader #4: 21:26:10.574207 findwork.go:50: Collecting count file upload.test-go1.23rc1-windows-arm64-2024-06-20.v1.count
    run_test.go:67: uploader #4: 21:26:10.574207 findwork.go:50: Collecting count file upload.test-go1.23rc1-windows-arm64-2024-06-21.v1.count
    run_test.go:67: uploader #4: 21:26:10.575238 findwork.go:50: Collecting count file upload.test-go1.23rc1-windows-arm64-2024-06-22.v1.count
    run_test.go:67: uploader #4: 21:26:10.575238 findwork.go:50: Collecting count file upload.test-go1.23rc1-windows-arm64-2024-06-23.v1.count
    run_test.go:67: uploader #4: 21:26:10.575238 findwork.go:50: Collecting count file upload.test-go1.23rc1-windows-arm64-2024-06-24.v1.count
    run_test.go:67: uploader #4: 21:26:10.575238 findwork.go:50: Collecting count file upload.test-go1.23rc1-windows-arm64-2024-06-25.v1.count
    run_test.go:67: uploader #4: 21:26:10.575238 findwork.go:50: Collecting count file upload.test-go1.23rc1-windows-arm64-2024-06-26.v1.count
    run_test.go:67: uploader #4: 21:26:10.575238 findwork.go:50: Collecting count file upload.test-go1.23rc1-windows-arm64-2024-06-27.v1.count
...
    run_test.go:67: uploader #0: 21:26:10.592651 upload.go:80: Failed to acquire lock C:\Users\swarming\.swarming\w\ir\x\t\TestRun_Concurrent3567831069\001\upload\2024-07-03.json.lock: open C:\Users\swarming\.swarming\w\ir\x\t\TestRun_Concurrent3567831069\001\upload\2024-07-03.json.lock: The file exists.
    run_test.go:67: uploader #1: 21:26:10.594179 upload.go:111: Uploaded 2024-06-26.json to "http://127.0.0.1:53164/2024-06-26"
    run_test.go:67: uploader #1: 21:26:10.594179 upload.go:54: open C:\Users\swarming\.swarming\w\ir\x\t\TestRun_Concurrent3567831069\001\local\2024-07-03.json: The process cannot access the file because it is being used by another process. reading C:\Users\swarming\.swarming\w\ir\x\t\TestRun_Concurrent3567831069\001\local\2024-07-03.json
    run_test.go:67: uploader #4: 21:26:10.594713 upload.go:111: Uploaded 2024-07-03.json to "http://127.0.0.1:53164/2024-07-03"
    run_test.go:67: uploader #3: 21:26:10.594179 upload.go:80: Failed to acquire lock C:\Users\swarming\.swarming\w\ir\x\t\TestRun_Concurrent3567831069\001\upload\2024-07-03.json.lock: open C:\Users\swarming\.swarming\w\ir\x\t\TestRun_Concurrent3567831069\001\upload\2024-07-03.json.lock: The file exists.
    run_test.go:67: uploader #4: 21:26:10.595434 upload.go:80: Failed to acquire lock C:\Users\swarming\.swarming\w\ir\x\t\TestRun_Concurrent3567831069\001\upload\2024-07-10.json.lock: open C:\Users\swarming\.swarming\w\ir\x\t\TestRun_Concurrent3567831069\001\upload\2024-07-10.json.lock: The file exists.
    run_test.go:67: uploader #1: 21:26:10.595947 upload.go:111: Uploaded 2024-07-10.json to "http://127.0.0.1:53164/2024-07-10"
    run_test.go:67: uploader #2: 21:26:10.595947 upload.go:111: Uploaded 2024-07-03.json to "http://127.0.0.1:53164/2024-07-03"
    run_test.go:597: uploaded 21 total observations, want 14
--- FAIL: TestRun_Concurrent (0.33s)

watchflakes

@gopherbot gopherbot reopened this Jul 12, 2024
@findleyr
Copy link
Contributor

Ah, I real race. Looking.

@gopherbot
Copy link
Contributor Author

Change https://go.dev/cl/598036 mentions this issue: internal/upload: fix upload locking

@findleyr
Copy link
Contributor

With x/tools/cmd/stress, I was able to reproduce this failure easily.

The CL above fixes the original locking logic to serve its intended purpose. But many more races exist around deleting counter files:

  • Counter files can be missing or empty when creating the report
  • Counter files can be missing or empty when computing the counter date span.

The fix above may reduce the flake rate to undetectable levels, but uploading is not concurrency safe. We need lock around the entire process of collecting, parsing, and aggregating reports, not just the final upload step, or give up on making uploading concurrency safe and rely on the Go command's existing guards.

@findleyr findleyr modified the milestones: Unreleased, Go1.24 Jul 12, 2024
gopherbot pushed a commit to golang/telemetry that referenced this issue Jul 12, 2024
The upload locking logic left a brief race that could lead to duplicate
uploads, as the existence of an uploaded report was checked before
locking, not after. Fix this.

As described in golang/go#67737, more races exist that could lead to
broken or partial uploads, but this should prevent overcounting uploads,
as was originally intended.

This closes golang/go#65970, as it is the last thing I feel comfortable
doing relative to the upload process. More refactoring will have to wait
until 1.24.

For golang/go#67737
Fixes golang/go#65970

Change-Id: Iadd58402bbe3fb32b4daf479d8d800eaef47c370
Reviewed-on: https://go-review.googlesource.com/c/telemetry/+/598036
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Hyang-Ah Hana Kim <hyangah@gmail.com>
Auto-Submit: Robert Findley <rfindley@google.com>
@gopherbot
Copy link
Contributor Author

Found new dashboard test flakes for:

#!watchflakes
default <- pkg == "golang.org/x/telemetry/internal/upload" && test == "TestRun_Concurrent"
2024-07-12 20:54 x_telemetry-go1.21-linux-ppc64_power10 telemetry@1f628dd3 release-branch.go1.21@e073febe x/telemetry/internal/upload.TestRun_Concurrent (log)
=== RUN   TestRun_Concurrent
    run_test.go:67: uploader #1: 20:55:04.072748 findwork.go:35: Finding work: mode on asof 2023-07-13 00:00:00 +0000 UTC
    run_test.go:67: uploader #1: 20:55:04.073105 findwork.go:50: Collecting count file upload.test-devel-linux-ppc64-2024-06-21.v1.count
    run_test.go:67: uploader #1: 20:55:04.073283 findwork.go:50: Collecting count file upload.test-devel-linux-ppc64-2024-06-22.v1.count
    run_test.go:67: uploader #1: 20:55:04.073329 findwork.go:50: Collecting count file upload.test-devel-linux-ppc64-2024-06-23.v1.count
    run_test.go:67: uploader #1: 20:55:04.073545 findwork.go:50: Collecting count file upload.test-devel-linux-ppc64-2024-06-24.v1.count
    run_test.go:67: uploader #1: 20:55:04.073811 findwork.go:50: Collecting count file upload.test-devel-linux-ppc64-2024-06-25.v1.count
    run_test.go:67: uploader #1: 20:55:04.073926 findwork.go:50: Collecting count file upload.test-devel-linux-ppc64-2024-06-26.v1.count
    run_test.go:67: uploader #1: 20:55:04.074127 findwork.go:50: Collecting count file upload.test-devel-linux-ppc64-2024-06-27.v1.count
    run_test.go:67: uploader #1: 20:55:04.074325 findwork.go:50: Collecting count file upload.test-devel-linux-ppc64-2024-06-28.v1.count
...
    run_test.go:67: uploader #0: 20:55:04.082128 findwork.go:97: Already uploaded: 2024-06-25.json
    run_test.go:67: uploader #0: 20:55:04.082260 findwork.go:97: Already uploaded: 2024-07-02.json
    run_test.go:67: uploader #0: 20:55:04.082277 findwork.go:97: Already uploaded: 2024-07-09.json
    run_test.go:67: uploader #0: 20:55:04.082335 reports.go:34: Last week: 2024-07-09, today: 2024-07-12
    run_test.go:67: uploader #0: 20:55:04.082350 run.go:157: Uploading 2 reports
    run_test.go:67: uploader #0: 20:55:04.082375 upload.go:54: open /home/swarming/.swarming/w/ir/x/t/TestRun_Concurrent1538619139/001/local/2024-07-02.json: no such file or directory reading /home/swarming/.swarming/w/ir/x/t/TestRun_Concurrent1538619139/001/local/2024-07-02.json
    run_test.go:67: uploader #0: 20:55:04.082402 upload.go:54: open /home/swarming/.swarming/w/ir/x/t/TestRun_Concurrent1538619139/001/local/2024-07-09.json: no such file or directory reading /home/swarming/.swarming/w/ir/x/t/TestRun_Concurrent1538619139/001/local/2024-07-09.json
    run_test.go:67: uploader #4: 20:55:04.082242 upload.go:54: open /home/swarming/.swarming/w/ir/x/t/TestRun_Concurrent1538619139/001/local/2024-07-09.json: no such file or directory reading /home/swarming/.swarming/w/ir/x/t/TestRun_Concurrent1538619139/001/local/2024-07-09.json
    run_test.go:589: unexpected end of JSON input
--- FAIL: TestRun_Concurrent (0.04s)
2024-07-12 20:54 x_telemetry-go1.21-linux-ppc64_power8 telemetry@1f628dd3 release-branch.go1.21@e073febe x/telemetry/internal/upload.TestRun_Concurrent (log)
=== RUN   TestRun_Concurrent
    run_test.go:67: uploader #4: 20:55:14.969167 findwork.go:35: Finding work: mode on asof 2023-07-13 00:00:00 +0000 UTC
    run_test.go:67: uploader #4: 20:55:14.969462 findwork.go:50: Collecting count file upload.test-devel-linux-ppc64-2024-06-21.v1.count
    run_test.go:67: uploader #4: 20:55:14.969676 findwork.go:50: Collecting count file upload.test-devel-linux-ppc64-2024-06-22.v1.count
    run_test.go:67: uploader #4: 20:55:14.969895 findwork.go:50: Collecting count file upload.test-devel-linux-ppc64-2024-06-23.v1.count
    run_test.go:67: uploader #4: 20:55:14.969950 findwork.go:50: Collecting count file upload.test-devel-linux-ppc64-2024-06-24.v1.count
    run_test.go:67: uploader #4: 20:55:14.970206 findwork.go:50: Collecting count file upload.test-devel-linux-ppc64-2024-06-25.v1.count
    run_test.go:67: uploader #4: 20:55:14.970444 findwork.go:50: Collecting count file upload.test-devel-linux-ppc64-2024-06-26.v1.count
    run_test.go:67: uploader #4: 20:55:14.970677 findwork.go:50: Collecting count file upload.test-devel-linux-ppc64-2024-06-27.v1.count
    run_test.go:67: uploader #4: 20:55:14.970875 findwork.go:50: Collecting count file upload.test-devel-linux-ppc64-2024-06-28.v1.count
...
    run_test.go:67: uploader #0: 20:55:14.980839 reports.go:34: Last week: , today: 2024-07-12
    run_test.go:67: uploader #0: 20:55:14.980852 run.go:157: Uploading 3 reports
    run_test.go:67: uploader #3: 20:55:14.981444 upload.go:111: Uploaded 2024-07-07.json to "http://127.0.0.1:44341/2024-07-07"
    run_test.go:67: uploader #3: 20:55:14.981515 upload.go:80: Failed to acquire lock /home/swarming/.swarming/w/ir/x/t/TestRun_Concurrent2137502188/001/upload/2024-06-23.json.lock: open /home/swarming/.swarming/w/ir/x/t/TestRun_Concurrent2137502188/001/upload/2024-06-23.json.lock: file exists
    run_test.go:67: uploader #0: 20:55:14.981856 upload.go:111: Uploaded 2024-06-23.json to "http://127.0.0.1:44341/2024-06-23"
    run_test.go:67: uploader #0: 20:55:14.981921 upload.go:80: Failed to acquire lock /home/swarming/.swarming/w/ir/x/t/TestRun_Concurrent2137502188/001/upload/2024-06-30.json.lock: open /home/swarming/.swarming/w/ir/x/t/TestRun_Concurrent2137502188/001/upload/2024-06-30.json.lock: file exists
    run_test.go:67: uploader #0: 20:55:14.981954 upload.go:54: open /home/swarming/.swarming/w/ir/x/t/TestRun_Concurrent2137502188/001/local/2024-07-07.json: no such file or directory reading /home/swarming/.swarming/w/ir/x/t/TestRun_Concurrent2137502188/001/local/2024-07-07.json
    run_test.go:67: uploader #3: 20:55:14.982016 upload.go:111: Uploaded 2024-06-30.json to "http://127.0.0.1:44341/2024-06-30"
    run_test.go:589: unexpected end of JSON input
--- FAIL: TestRun_Concurrent (0.08s)
2024-07-12 20:54 x_telemetry-go1.22-linux-arm telemetry@1f628dd3 release-branch.go1.22@4e548f2c x/telemetry/internal/upload.TestRun_Concurrent (log)
=== RUN   TestRun_Concurrent
    run_test.go:67: uploader #4: 20:59:25.420736 findwork.go:35: Finding work: mode on asof 2023-07-13 00:00:00 +0000 UTC
    run_test.go:67: uploader #4: 20:59:25.420830 findwork.go:50: Collecting count file upload.test-devel-linux-arm-2024-06-21.v1.count
    run_test.go:67: uploader #4: 20:59:25.420876 findwork.go:50: Collecting count file upload.test-devel-linux-arm-2024-06-22.v1.count
    run_test.go:67: uploader #4: 20:59:25.420917 findwork.go:50: Collecting count file upload.test-devel-linux-arm-2024-06-23.v1.count
    run_test.go:67: uploader #4: 20:59:25.420973 findwork.go:50: Collecting count file upload.test-devel-linux-arm-2024-06-24.v1.count
    run_test.go:67: uploader #4: 20:59:25.421017 findwork.go:50: Collecting count file upload.test-devel-linux-arm-2024-06-25.v1.count
    run_test.go:67: uploader #4: 20:59:25.421058 findwork.go:50: Collecting count file upload.test-devel-linux-arm-2024-06-26.v1.count
    run_test.go:67: uploader #4: 20:59:25.421096 findwork.go:50: Collecting count file upload.test-devel-linux-arm-2024-06-27.v1.count
    run_test.go:67: uploader #4: 20:59:25.421141 findwork.go:50: Collecting count file upload.test-devel-linux-arm-2024-06-28.v1.count
...
    run_test.go:67: uploader #4: 20:59:25.425038 upload.go:111: Uploaded 2024-07-02.json to "http://127.0.0.1:34677/2024-07-02"
    run_test.go:67: uploader #1: 20:59:25.425373 upload.go:111: Uploaded 2024-07-02.json to "http://127.0.0.1:34677/2024-07-02"
    run_test.go:67: uploader #3: 20:59:25.429488 findwork.go:35: Finding work: mode on asof 2023-07-13 00:00:00 +0000 UTC
    run_test.go:67: uploader #3: 20:59:25.429565 findwork.go:97: Already uploaded: 2024-06-25.json
    run_test.go:67: uploader #3: 20:59:25.429590 findwork.go:97: Already uploaded: 2024-07-02.json
    run_test.go:67: uploader #3: 20:59:25.429601 findwork.go:97: Already uploaded: 2024-07-09.json
    run_test.go:67: uploader #3: 20:59:25.429635 reports.go:34: Last week: 2024-07-09, today: 2024-07-12
    run_test.go:67: uploader #3: 20:59:25.429652 run.go:157: Uploading 0 reports
    run_test.go:597: uploaded 21 total observations, want 14
--- FAIL: TestRun_Concurrent (0.07s)

watchflakes

@gopherbot
Copy link
Contributor Author

Found new dashboard test flakes for:

#!watchflakes
default <- pkg == "golang.org/x/telemetry/internal/upload" && test == "TestRun_Concurrent"
2024-07-19 15:28 x_telemetry-go1.22-linux-riscv64 telemetry@4b97e47a release-branch.go1.22@4c50f916 x/telemetry/internal/upload.TestRun_Concurrent (log)
=== RUN   TestRun_Concurrent
    run_test.go:67: uploader #2: 23:39:36.751468 findwork.go:35: Finding work: mode on asof 2023-07-20 00:00:00 +0000 UTC
    run_test.go:67: uploader #2: 23:39:36.751873 findwork.go:50: Collecting count file upload.test-devel-linux-riscv64-2024-06-28.v1.count
    run_test.go:67: uploader #2: 23:39:36.752110 findwork.go:50: Collecting count file upload.test-devel-linux-riscv64-2024-06-29.v1.count
    run_test.go:67: uploader #2: 23:39:36.752332 findwork.go:50: Collecting count file upload.test-devel-linux-riscv64-2024-06-30.v1.count
    run_test.go:67: uploader #2: 23:39:36.752548 findwork.go:50: Collecting count file upload.test-devel-linux-riscv64-2024-07-01.v1.count
    run_test.go:67: uploader #2: 23:39:36.752787 findwork.go:50: Collecting count file upload.test-devel-linux-riscv64-2024-07-02.v1.count
    run_test.go:67: uploader #2: 23:39:36.753004 findwork.go:50: Collecting count file upload.test-devel-linux-riscv64-2024-07-03.v1.count
    run_test.go:67: uploader #2: 23:39:36.753226 findwork.go:50: Collecting count file upload.test-devel-linux-riscv64-2024-07-04.v1.count
    run_test.go:67: uploader #2: 23:39:36.753433 findwork.go:50: Collecting count file upload.test-devel-linux-riscv64-2024-07-05.v1.count
...
    run_test.go:67: uploader #2: 23:39:36.774268 reports.go:268: Created 2024-07-18.json, deleting 1 count files
    run_test.go:67: uploader #2: 23:39:36.774510 reports.go:113: remove /home/swarming/.swarming/w/ir/x/t/TestRun_Concurrent1546992897/001/local/upload.test-devel-linux-riscv64-2024-07-11.v1.count: no such file or directory failed to remove /home/swarming/.swarming/w/ir/x/t/TestRun_Concurrent1546992897/001/local/upload.test-devel-linux-riscv64-2024-07-11.v1.count
    run_test.go:67: uploader #2: 23:39:36.774722 reports.go:68: Ready to upload: 2024-07-18.json
    run_test.go:67: uploader #2: 23:39:36.774879 run.go:157: Uploading 3 reports
    run_test.go:67: uploader #2: 23:39:36.775131 upload.go:54: open /home/swarming/.swarming/w/ir/x/t/TestRun_Concurrent1546992897/001/local/2024-07-04.json: no such file or directory reading /home/swarming/.swarming/w/ir/x/t/TestRun_Concurrent1546992897/001/local/2024-07-04.json
    run_test.go:67: uploader #2: 23:39:36.775350 upload.go:54: open /home/swarming/.swarming/w/ir/x/t/TestRun_Concurrent1546992897/001/local/2024-07-11.json: no such file or directory reading /home/swarming/.swarming/w/ir/x/t/TestRun_Concurrent1546992897/001/local/2024-07-11.json
    run_test.go:67: uploader #2: 23:39:36.775585 upload.go:54: open /home/swarming/.swarming/w/ir/x/t/TestRun_Concurrent1546992897/001/local/2024-07-18.json: no such file or directory reading /home/swarming/.swarming/w/ir/x/t/TestRun_Concurrent1546992897/001/local/2024-07-18.json
    run_test.go:589: error unmarshalling uploaded report: unexpected end of JSON input
        contents:
--- FAIL: TestRun_Concurrent (0.27s)

watchflakes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. telemetry x/telemetry issues
Projects
Status: Active
Development

No branches or pull requests

2 participants