Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Supported integer suffix for step headings in the user config file - config_reader #473

Closed

Conversation

sverhoeven
Copy link
Contributor

@sverhoeven sverhoeven commented Jun 10, 2022

You are about to submit a new Pull Request. Before continuing make sure you read the contributing guidelines and you comply with the following criteria:

  • You have stick to Python. Talk with us before adding other programming languages to HADDOCK3
  • Your PR is about CNS
  • Your code is well documented: proper docstrings and explanatory comments for those tricky parts
  • You structured the code into small functions as much as possible. You can use classes if there's a (state) purpose
  • code follows our coding style
  • You wrote tests for the new code
  • tox tests pass. Run tox command inside the repository folder
  • -test.cfg examples execute without errors. Inside examples/ run python run_tests.py -b
  • PR does not add any install dependencies unless permission granted by the HADDOCK team
  • PR does not break licensing
  • Your PR is about writing documentation for already existing code 🔥
  • Your PR is about writing tests for already existing code :godmode:

In https://github.com/i-VRESSE/workflow-builder I use a toml library to write the config file. The toml library does not support repeats of the same header. To repeat headers I need to make them unique by appending an index and putting quotes around the module name.

This PR adds support to haddock3 to parse config file like:

[headerone]
val = 10

['headerone.2']
val = 20

['headerone.3']
val = 30

to

{
  "headerone": {'val': 10},
  "headerone.1": {'val': 20},
  "headerone.2": {'val': 30},
}

This config file can be written or loaded by any standard compliant toml library and should make it easier for haddock3 config files to be produced or consumed by other programs.

Parses ['headerone.1'] as [headerone]

This config file can be written or loaded by any standard compliant toml library.
@codecov-commenter
Copy link

codecov-commenter commented Jun 10, 2022

Codecov Report

Merging #473 (b9d74a2) into main (dcb1b6f) will increase coverage by 0.12%.
The diff coverage is 97.29%.

@@            Coverage Diff             @@
##             main     #473      +/-   ##
==========================================
+ Coverage   74.66%   74.78%   +0.12%     
==========================================
  Files         105      105              
  Lines        6926     6963      +37     
==========================================
+ Hits         5171     5207      +36     
- Misses       1755     1756       +1     
Impacted Files Coverage Δ
src/haddock/gear/config_reader.py 95.55% <93.33%> (-0.21%) ⬇️
tests/test_config_reader.py 95.93% <100.00%> (+0.44%) ⬆️
tests/test_gear_config_writer.py 100.00% <100.00%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@joaomcteixeira
Copy link
Member

Hi @sverhoeven

Good addition!

The only caveat I see on this is the difference between the two config styles for users. But I perfectly understand the restraints required by the web interface. It's just a matter that in the official examples we advertise only the one without integers. FYI, we have already a config writer module in case it helps you.

Some technical aspect I am trying to sort out is that our config requires the multiline lists to have an empty line afterwards. I think I need to sort that out before this PR can go in because toml does not have such a requirement.

Another thing, you also need to operate in this function:

def _update_key_number(key, d, sep='.', offset=0):

You need to key.split(".")[0] before the function applies the index. You can add that inside the function itself. I would rather not skip the function in case there is a mix of "with index" and "without index" in the config file. In the later case, the index should be ignored and haddock3 should handle it internally (by the _update_key_number function).

And, I think this is all. 😉

p.s. - an alternative would that the web interface create a toml string (internally) and then quickly remove the header integers before saving the string to the file. I don't know if this is possible in the web interface. But, it's an idea. With the python json module, for example, it is possible to generate the string before saving it to the file.

Cheers,

Had to use separate regexps to make sure digit in config file is disregarded.
@sverhoeven
Copy link
Contributor Author

The test_read_config test has been expanded to test mix of "with index" and "without index".
Added a short description in the module doc string that headers with index is a thing.

haddock3 and the workflow-builder both need to read and write config files.
it would be nice that both can read "with index" or "without index" and haddock3 only writes "without index" and the workflow-builder only writes "with index" .

@joaomcteixeira
Copy link
Member

Okay, I will accommodate that. We can have:

  1. reading with and without index automatically parsed by the function
  2. writing with or without index by boolean flag.

Sounds good? ☺️

Internally, the dictionaries entering the writing function have indexes (by obviously reasons). In writing, indexes are removed here:

module_key = get_module_name(key)

for example:

>>> get_module_name("topoaa.1")
"topoaa"

@rvhonorato
Copy link
Member

Thanks for the contribution @sverhoeven, having haddock3 support toml out of the box will be very helpful for future integrations!

@rvhonorato
Copy link
Member

it would be nice that both can read "with index" or "without index" and haddock3 only writes "without index" and the workflow-builder only writes "with index" .

@joaomcteixeira are you working on this or should I try?

@rvhonorato rvhonorato added users feature New feature request labels Jul 29, 2022
@rvhonorato rvhonorato added this to the v3.0.0 stable release milestone Jul 29, 2022
@joaomcteixeira
Copy link
Member

You can work on it.

Copy link
Member

@joaomcteixeira joaomcteixeira left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have now reviewed this pull request. Sorry for the long delay, as it was still "draft" I thought it was still a work in progress. It seems very okay to merge. Thanks!

@joaomcteixeira joaomcteixeira marked this pull request as ready for review August 1, 2022 09:41
@joaomcteixeira
Copy link
Member

it would be nice that both can read "with index" or "without index" and haddock3 only writes "without index" and the workflow-builder only writes "with index" .

I would leave this for another Pull Request following #518

We can work with toml reading edited strings instead of files directly, i suggest.

@joaomcteixeira joaomcteixeira changed the title Support toml with repeated modules Supported integer suffix for step headings in the user config file - config_reader Aug 1, 2022
@joaomcteixeira joaomcteixeira added enhancement Enhancing an existing feature of adding a new one and removed users feature New feature request labels Aug 1, 2022
@joaomcteixeira
Copy link
Member

lint issues above are addressed here: #523

@sverhoeven
Copy link
Contributor Author

Thanks for fixing the linting errors, I have merged main and now CI and tox -e lint are green.

@joaomcteixeira
Copy link
Member

Hi @sverhoeven

a quick question before merging. Why do the headers need to be quoted to have the indexes? Is it because to avoid conflicts with parameters that are dictionaries? If so, I think we can safely edit the regexes because no parameters start with digits. Let me know, Cheers.

@sverhoeven
Copy link
Contributor Author

Hi @sverhoeven

a quick question before merging. Why do the headers need to be quoted to have the indexes? Is it because to avoid conflicts with parameters that are dictionaries? If so, I think we can safely edit the regexes because no parameters start with digits. Let me know, Cheers.

TOML libraries treat . in table name with and without quotes differently.

For example

[foo]
x = 1

['foo.1']
x = 2

[bar]
y = 1

[bar.1]
y = 2

will be parsed as

{
    "foo": {"x": 1}, 
    "foo.1": {"x": 2}, 
    "bar": {
        "y": 1, 
        "1": {"y": 2}
    }
 }

Having [bar] and [bar.1] will give keys on different levels, while with quotes the keys are on same level.
So using quotes gives a data structure, more like what the parser does with repeated table names (without digits or qoutes).

@joaomcteixeira joaomcteixeira mentioned this pull request Aug 7, 2022
9 tasks
joaomcteixeira added a commit to joaomcteixeira/haddock3 that referenced this pull request Aug 8, 2022
Co-authored-by: Stefan Verhoeven <stefan.verhoeven@gmail.com>
@joaomcteixeira
Copy link
Member

closed by #539
Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancing an existing feature of adding a new one
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants