Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wait for NFS file synchronization #7927

Merged
merged 1 commit into from
Jun 12, 2024
Merged

Conversation

DanSava
Copy link
Contributor

@DanSava DanSava commented May 17, 2024

Issue
Resolves #7788
Resolves #7715

  • PR title captures the intent of the changes, and is fitting for release notes.
  • Added appropriate release note label
  • Commit history is consistent and clean, in line with the contribution guidelines.
  • Make sure tests pass locally (after every commit!)

When applicable

  • When there are user facing changes: Updated documentation
  • New behavior or changes to existing untested code: Ensured that unit tests are added (See Ground Rules).
  • Large PR: Prepare changes in small commits for more convenient review
  • Bug fix: Add regression test for the bug
  • Bug fix: Create Backport PR to latest release

@DanSava DanSava force-pushed the nfs_reliability branch 7 times, most recently from 68e9f2b to 3e4ef61 Compare May 24, 2024 09:47
@berland berland changed the title Wait for nsf file synchronization Wait for NFS file synchronization May 24, 2024
@DanSava DanSava force-pushed the nfs_reliability branch 2 times, most recently from 6f62342 to e3c98e6 Compare May 27, 2024 07:39
@codecov-commenter
Copy link

codecov-commenter commented May 27, 2024

Codecov Report

Attention: Patch coverage is 94.64286% with 6 lines in your changes missing coverage. Please review.

Project coverage is 86.57%. Comparing base (37c4d20) to head (f0e1f39).
Report is 5 commits behind head on main.

Files Patch % Lines
src/ert/scheduler/scheduler.py 88.37% 5 Missing ⚠️
src/ert/scheduler/job.py 97.14% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #7927      +/-   ##
==========================================
+ Coverage   86.51%   86.57%   +0.06%     
==========================================
  Files         383      383              
  Lines       23742    23830      +88     
  Branches      629      635       +6     
==========================================
+ Hits        20540    20632      +92     
+ Misses       3126     3125       -1     
+ Partials       76       73       -3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@DanSava DanSava force-pushed the nfs_reliability branch 6 times, most recently from eae7f66 to 1afef4a Compare May 28, 2024 07:33
@DanSava DanSava self-assigned this May 28, 2024
Copy link
Contributor

@eivindjahren eivindjahren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some initial comments to simplify reading to make it easier to look for errors.

src/_ert_forward_model_runner/runner.py Outdated Show resolved Hide resolved
src/ert/config/ert_config.py Outdated Show resolved Hide resolved
src/ert/ensemble_evaluator/evaluator.py Outdated Show resolved Hide resolved
src/ert/ensemble_evaluator/evaluator.py Outdated Show resolved Hide resolved
src/ert/ensemble_evaluator/evaluator.py Outdated Show resolved Hide resolved
src/ert/scheduler/scheduler.py Outdated Show resolved Hide resolved
src/ert/scheduler/scheduler.py Outdated Show resolved Hide resolved
src/ert/scheduler/scheduler.py Outdated Show resolved Hide resolved
src/ert/scheduler/job.py Outdated Show resolved Hide resolved
src/ert/scheduler/job.py Outdated Show resolved Hide resolved
@DanSava DanSava force-pushed the nfs_reliability branch 2 times, most recently from 9a5a685 to 029556f Compare June 3, 2024 11:20
@DanSava DanSava force-pushed the nfs_reliability branch 4 times, most recently from 5513084 to a1543c6 Compare June 6, 2024 10:23
@DanSava DanSava requested a review from eivindjahren June 6, 2024 11:22
@eivindjahren
Copy link
Contributor

eivindjahren commented Jun 7, 2024

Looks like one of the tests are flaky:

FAILED tests/unit_tests/ensemble_evaluator/test_scheduler.py::test_scheduler_receives_checksum_and_waits_for_disk_sync - RuntimeError: Task <Task pending name='Task-1839' coro=<Event.wait() running at /opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/asyncio/locks.py:309> cb=[_done_callback() at /opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/site-packages/_ert/async_utils.py:39, _release_waiter(<Future pendi...c6aee8520>()]>)() at /opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/asyncio/tasks.py:429]> got Future <Future pending> attached to a different loop

https://github.com/equinor/ert/actions/runs/9414224549/job/25932738059?pr=7927

Copy link
Contributor

@eivindjahren eivindjahren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good now!

Ensure that the files generated by a job match their expected checksums. It waits for the job's runpath to appear in the scheduler's checksum dictionary, verifies the checksums of the files, and logs any errors or discrepancies.
@DanSava DanSava merged commit c1d2e21 into equinor:main Jun 12, 2024
38 checks passed
@DanSava DanSava deleted the nfs_reliability branch June 12, 2024 12:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants