Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consolidate config file and dict paths for substitution config #3887

Conversation

valentin-krasontovitsch
Copy link
Contributor

Issue
Resolves #3802

Approach
Small changes here and there to make the acceptance test in question pass.

I recommend reading commit by commit, it will make the review easierer (I suppose)

Pre review checklist

  • Added appropriate release note label
  • PR title captures the intent of the changes, and is fitting for release notes.
  • Commit history is consistent and clean, in line with the contribution guidelines.

Adding labels helps the maintainers when writing release notes. This is the list of release note labels.

Copy link
Contributor

@eivindjahren eivindjahren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Just need to finish making the tests pass and rebase the "Fix formatting" commit :)

@valentin-krasontovitsch valentin-krasontovitsch force-pushed the fix-subst-config-3802 branch 4 times, most recently from 5b831ee to 2a18f6c Compare September 14, 2022 20:12
@@ -55,7 +55,7 @@ static queue_config_type *queue_config_alloc_empty() {
queue_config->driver_type = NULL_DRIVER;
queue_config->user_mode = false;
queue_config->max_submit = 2; // Default value
queue_config->num_cpu = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this change would be the correct if this code was following normal conventions. However, I think something non-standard is going on. num_cpu=0 is clearly not correct, but perhaps this is signalling some sort of "don't force the number of cpus" behavior. I think we will have to dig a bit to see what is going on before we make a change here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm gonna try to investigate a bit then! thanks for pointing it out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

found the first hint in src/ert/ensemble_evaluator/builder/_ensemble_builder.py, in the function from_legacy, lines 124 ff:

    num_cpu = res_config.queue_config.num_cpu
    if num_cpu == 0:
        num_cpu = res_config.ecl_config.num_cpu

... with this change, we will never get the number of CPUs from the ECL config. the ecl config uses ecl_util_get_num_cpu found in lib/ecl/ecl_util.cpp, which tries to read the # of cpus from the DATA file, and returns a default value of 1 if it doesn't find nothing in the DATA file.

under what circumstances would num_cpu be zero in queue config? the line I changed is in queue_config_alloc_empty, which is called in queue_config_alloc_default. after queue_config_alloc_empty, there is a call to queue_config_init, which actually tries to parse CPU_NUM from the config content. So this is only the case if CPU_NUM is not given in the config file!

as a short aside, the config dict path is different, it tries to assign the config dict value for NUM_CPU directly. Wonder what happens if that's None 🤔

so let's say NUM_CPU is not present in the config content, and thus in the config file, and say it is set in a DATA file, using whatever mechanism is expected there, so that ecl_config.num_cpu is actually greater than 1. Then we have a problem, as the old behaviour was to prefer the ecl_config number, but after this change, we would just go with 1. So perhaps I should change the above lines into something like

    queue_config_num_cpu = res_config.queue_config.num_cpu
    res_config_num_cpu = res_config.ecl_config.num_cpu
    if queue_config_num_cpu == 1 and res_config_num_cpu >= queue_config_num_cpu:
        num_cpu = res_config_num_cpu
    else:
        num_cpu = queue_config_num_cpu

even though this is kind of verbose and maybe too complicated? but it might reflect the intended logic better?

this also makes me wonder why so far at least three different configs (subst, queue, ecl) need / use a number of CPUs. perhaps this particular issue is something that can be resolved down the line, during our current milestone, in a more meaningful way (then guessing what the logic should be here)...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and we have the same pattern in src/ert/_c_wrappers/job_queue/queue.py, in add_job_from_run_arg:

    num_cpu = res_config.queue_config.num_cpu
    if num_cpu == 0:
        num_cpu = res_config.ecl_config.num_cpu

what do you think about the solution I proposed above, @eivindjahren ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that seem to be all instances of usages of queue_config's num_cpu

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like this code is signaling that 0 means use res_config.ecl_config.num_cpu. That is not great, but I think we should clean that up in a separate PR. Isn't it possible to just recreate how this number is instantiated in this PR and then clean this particular behavior up separately?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'll see if i can get away in this PR without those changes. i think something made me introduce them, but perhaps it was just confusion ; )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed that commit, and rather pushed it with changes as discussed here in #3919

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh and i fixed the test that was failing and triggered me to do the changes in question to work without those changes - basically just needed to make sure that the same value for NUM_CPU was present both in config file and config dict (the former was not the case)

@eivindjahren
Copy link
Contributor

I believe we can merge everything except bb97391 which we will have to investigate further before we do anything.

@valentin-krasontovitsch
Copy link
Contributor Author

@eivindjahren i think i have resolved the problematic commit (removed it and fixed the failing test), and all checks have passed (yeahy), so I would appreciate a hopefully final review - you may focus on the last four commits, if you wish, which can also be rebased to make things more compact

Copy link
Contributor

@eivindjahren eivindjahren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe your refactoring of the test might be doing something incorrectly. Other than that squash ef7b95e and ef7b95e and I think we are ready to go!

@valentin-krasontovitsch
Copy link
Contributor Author

valentin-krasontovitsch commented Sep 21, 2022

squash ef7b95e and ef7b95e and I think we are ready to go!

i'm sorry that's the same commit twice, probably you meant two different commits here?

ping @eivindjahren

@eivindjahren
Copy link
Contributor

eivindjahren commented Sep 22, 2022

So I am thinking we should have "Write whole config dict for tests" and "Correct test config file writing" in the same commit and we drop "Refactoring in test" and
"Remove breaking line in test". Then we can merge.

@codecov-commenter
Copy link

Codecov Report

Merging #3887 (0078a08) into main (9ac2efd) will increase coverage by 0.03%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main    #3887      +/-   ##
==========================================
+ Coverage   58.56%   58.59%   +0.03%     
==========================================
  Files         545      545              
  Lines       41463    41447      -16     
  Branches     3765     3765              
==========================================
+ Hits        24281    24287       +6     
+ Misses      16181    16160      -21     
+ Partials     1001     1000       -1     
Impacted Files Coverage Δ
src/ert/_c_wrappers/enkf/subst_config.py 92.40% <100.00%> (+0.19%) ⬆️
src/ert/services/_base_service.py 90.04% <0.00%> (-0.46%) ⬇️
src/ert/shared/main.py 80.69% <0.00%> (ø)
src/ert/services/_storage_main.py 26.31% <0.00%> (+0.26%) ⬆️
src/clib/lib/res_util/block_fs.cpp 67.62% <0.00%> (+0.40%) ⬆️
src/ert/gui/tools/plot/plot_window.py 20.11% <0.00%> (+0.65%) ⬆️
src/ert/services/storage_service.py 75.47% <0.00%> (+19.91%) ⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@@ -349,6 +349,7 @@ def test_res_config_dict_constructor(setup_case):
# add a missing entries to config file
with open("user_config.ert", "a+") as ert_file:
ert_file.write("JOB_SCRIPT ../../../script.sh\n")
ert_file.write("NUM_CPU 0\n")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This entry is optional. Why do we need to add it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because in the config dict, it's set to zero, and the config file path sets that value to 1 if it's not given. so it's to make the test pass.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we want the config_dict path to also set it to 1 then?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably, but what i'm trying to say is that there's a config dict config_data_new which literally contains the NUM_CPU key, with value 1, in this test - see line 373 / 374. So since the test sets a specific value in the config dict, it seemed appropriate to set an explicit value in the file as well (in particular since the specific value differs from the config file path default value).

honestly off the top of my head right now i don't know what the config dict path does by default, and i think it's outside the scope of this PR (it's actually in the scope of #3918 and #3919)

i believe fixing these defaults was part of changes that we agreed to put in their own PR

Copy link
Contributor

@eivindjahren eivindjahren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SubstConfig has different default when passed with config_dict and filname
3 participants