Attempting to fix seemingly random failures on CI #224

dotsdl · 2024-01-09T23:40:44Z

Unfortunately, unable to reproduce failures like this locally. Working hypothesis is that use of fixtures is impacting us in hard to pin-down ways, so scoping them down gradually here.

codecov-commenter · 2024-01-09T23:51:00Z

Codecov Report

Attention: 1 lines in your changes are missing coverage. Please review.

Comparison is base (fe8551d) 82.19% compared to head (6b451a3) 81.75%.

Files	Patch %	Lines
alchemiscale/interface/client.py	50.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #224      +/-   ##
==========================================
- Coverage   82.19%   81.75%   -0.45%     
==========================================
  Files          23       23              
  Lines        2937     2937              
==========================================
- Hits         2414     2401      -13     
- Misses        523      536      +13

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

dotsdl · 2024-01-10T20:35:13Z

After much gnashing of teeth, I think I may have narrowed down the cause of the random CI failures to a race condition. Hypothesis that has yet to be invalidated:

following the call to user_client.create_network completes, but then when user_client.get_network_transformations is called later, no Transformation ScopedKeys are returned, suggesting that the query to server doesn't see them yet
adding a while loop to call get_network_transformations repeatedly until Transformations are populated appears to address the issue
- mostly: there are cases where the Transformation objects are present, but their connecting nodes aren't fully populated yet
it remains unclear how this race condition could be present; it may be the case that the calls to Neo4j to create the AlchemicalNetwork via py2neo fully return before Neo4j has finished creating all nodes and relationships in the DB, and the low amount of total processing allocated on the CI worker presents conditions such that this creation may be incomplete before the next call in the tests occurs

The solution to this may be to try pulling the full AlchemicalNetwork, and do this until it succeeds, then progress to the remaining tests. A note on why this is necessary would be sufficient.

dotsdl · 2024-01-11T03:53:55Z

6 successful attempts in a row! I think we might be good. 😁

dotsdl added 2 commits January 9, 2024 16:40

Attempting to fix seemingly random failures on CI

39a41a2

Merge branch 'main' into ci-stochastic-failure-fix

0182771

dotsdl added 6 commits January 9, 2024 17:18

Attempting to reduce fixture magic

1879b34

Attempting to pin down possible race condition

7c2f762

Fix to race condition fix attempt

394bf56

Added exception to try to understand what's happening

e29a1e2

More poking around for potential race condition

a121259

Black!

405ee79

dotsdl added 3 commits January 10, 2024 13:41

Restored old form of test_get_task_failures

bd8b062

Trying out this fix for race condition

c0ac12f

Removing commented out attempts

910266c

dotsdl requested a review from ianmkenney January 10, 2024 23:39

dotsdl added 2 commits January 10, 2024 18:41

Still seeing weird behavior; another attempt

176830e

Merge branch 'main' into ci-stochastic-failure-fix

19bf90f

dotsdl added 2 commits January 11, 2024 11:01

Added one more guardrail to tests that appear impacted by race condition

b7723d4

Small change to guardrail

6b451a3

dotsdl merged commit 697a936 into main Jan 11, 2024
4 checks passed

dotsdl deleted the ci-stochastic-failure-fix branch January 11, 2024 19:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Attempting to fix seemingly random failures on CI #224

Attempting to fix seemingly random failures on CI #224

dotsdl commented Jan 9, 2024 •

edited

Loading

codecov-commenter commented Jan 9, 2024 •

edited

Loading

dotsdl commented Jan 10, 2024 •

edited

Loading

dotsdl commented Jan 11, 2024

Attempting to fix seemingly random failures on CI #224

Attempting to fix seemingly random failures on CI #224

Conversation

dotsdl commented Jan 9, 2024 • edited Loading

codecov-commenter commented Jan 9, 2024 • edited Loading

Codecov Report

dotsdl commented Jan 10, 2024 • edited Loading

dotsdl commented Jan 11, 2024

dotsdl commented Jan 9, 2024 •

edited

Loading

codecov-commenter commented Jan 9, 2024 •

edited

Loading

dotsdl commented Jan 10, 2024 •

edited

Loading