7 feature request argument parsing #8

schackartk · 2022-03-09T00:12:46Z

Overview

This pull request implements argparse so that the user is less likely to need to edit source code in main.py. However, more work will need to be done to include parameters related to environmental condotions like ph, etc.

Other than implementing argparse, functionality is the same. Some things are still a bit awkward because I didn't want to change too much beyond that.

Affected files

The following files have changes:

main.py: add shebang line, add argument parsing and validation
automate_tests.sh: update arguments to align with argparse
README.md: describe current functionality and arguments

Notes

main.py

There were a few things that may need to be changed to work most efficiently and predictably.

The relationship between --ligand, --ligand_type, and --ligand_seq is a bit complex and can probably be improved. Ideally, I think --ligand would be optional, yielding a default of None. This makes more sense than having to use --ligand False. Then --ligand_type, and --ligand_seq could also be optional with a default of None (instead of an empty string). Only when --ligand is present, you validate the others are there and if not parser.error(). I also think the authors should consider if --ligand_seq is truly required if --l;igand is either 'peptide', 'DNA' or 'RNA'. Currently this is enforced (by parser.error()), but if it is actually optional, that should be updated.

I left the code that uses different params based on whether it is run as local or cluster, but I am not sure if it is necessary. I especially think that the hard-coded paths used when it is cluster should be removed, and turned into arguments. In which case, it is the same as the usual arguments, and may make --device obsolete if there is no difference between local and cluster.

I implemented wildcards to help the user find their MMB paths (lib and executable) within the --mmb_dir and --mmb. I am hoping the defaults will make it so users don't have to change this argument.

I removed the operating system argument and instead used platform to detect it. This new implementation has only been tested on my WSL system, so please check this works. One issue is if the result of platform.system().lower() doesn't match an expected value on mac. Initially mine returned Linux, which is why I ran lower() to make it 'linux' which is compatible with the previous implementation.

Lots of argument validation now happens in get_args(), so hopefully more helpful error messages are produced.

I added a feature so that both --aptamer and --ligand_seq can be names of files. In that case, the file contents are read in and used as the sequences. Literal strings can still be used instead of file names.

Readme.md

I hope my additions are helpful in describing the current functionality.

One thing I was uncertain is the description of ligand type saying "(default: Amber14)" I didn't see this anywhere that params were set. It is not the default to any arguments I set up. If this needs to be a default, please take note of this.

Conclusions

Currently, all modes in automate_tests.sh run for me, so it seems that these changes are compatible. It would be great to have unit and integration tests with pytest to confirm.

Please check that it works on MacOS still, as I have only tested on WSL.

No additional dependencies have been added, only core libraries were used.

Please feel free to make any changes you see fit or discuss!

taoliu032 · 2022-03-13T12:34:58Z

@schackartk - Thanks for putting in the work! Very busy week - have you tested the code (such as running the automated test script) after incorporating these functions?

Edit: never mind, I just saw your "Conclusions" that you had tested the automated tests

taoliu032 · 2022-03-14T04:57:54Z

Thank you so much for your updated code!!! I have several questions/comments regarding your changes:

The relationship between --ligand, --ligand_type, and --ligand_seq is a bit complex and can probably be improved. Ideally, I think --ligand would be optional, yielding a default of None. This makes more sense than having to use --ligand False. Then --ligand_type, and --ligand_seq could also be optional with a default of None (instead of an empty string). Only when --ligand is present, you validate the others are there and if not parser.error(). I also think the authors should consider if --ligand_seq is truly required if --l;igand is either 'peptide', 'DNA' or 'RNA'. Currently this is enforced (by parser.error()), but if it is actually optional, that should be updated.

I used camelCase for flag names like --ligandSeq, do you think it is better to use underscore notation like "--ligand_seq"? Perhaps I should've changed --run_num to --runNum to be consistent...
About --ligand, --ligand_type, and --ligand_seq: I agree their dependency can be improved, 100% concur.
- When I designed these arguments, I wanted a user to be explicit about their systems - I think it helps a user including me to be clear about what they are about to simulate. Therefore I didn't make the three arguments optional, even in the case where a ligand is not required.
- I chose their default values a bit arbitrarily, e.g., --ligand has a default value of string 'False' rather than None or boolean False. And I used these default values in the 'if' statements later in the code. The default values could be changed to None; if so, we will update several places in the rest of the code.
- Yes, I think the --ligand_seq is necessary for peptide or DNA or RNA because we can make use of their primary structures. We ask users to provide a pdb file of ligand structure which should contain its sequence, if applicable; so the users can easily find out the sequence, which I think is safer and easier than writing a script to detect the sequence, especially given the range of pdb file format diversity.
- I will think more about these three arguments, depending on any future features related to them. It's exciting that they are taken care of in the CL in a much better way now! Thank you!

I left the code that uses different params based on whether it is run as local or cluster, but I am not sure if it is necessary. I especially think that the hard-coded paths used when it is cluster should be removed, and turned into arguments. In which case, it is the same as the usual arguments, and may make --device obsolete if there is no difference between local and cluster.

Yeah you are right - we don't have to specify the file paths for 'local' and 'cluster' separately. I made the params['device'] to account for any difference between my computer (macos) and our cluster (linux), which is necessary for a module in the component to find its library files.
In practice, when a user needs to customize the pipeline for themselves (such as using an alternative package for a module), they may encounter other issues including difference between their local computers and their clusters, if any. Therefore, even if --device option seems redundant, I would recommend keeping it.
But I think it's a good idea to make file paths as arguments! Furthermore, we were thinking about making a yaml file to specify the input configurations - this could be another feature to add in the future.

I implemented wildcards to help the user find their MMB paths (lib and executable) within the --mmb_dir and --mmb. I am hoping the defaults will make it so users don't have to change this argument.

That is a good idea! The released MMB packages have gradually converged to a consistent way of organizing the files, but the MMB executable could be named just "MMB" or "MMB.2_17.Linux" for example. Therefore, I appended an extra "*" for MMB executable path as shown in my review comment.

I removed the operating system argument and instead used platform to detect it. This new implementation has only been tested on my WSL system, so please check this works. One issue is if the result of platform.system().lower() doesn't match an expected value on mac. Initially mine returned Linux, which is why I ran lower() to make it 'linux' which is compatible with the previous implementation.

I like the idea of detecting platform OS. However, when I ran platform.system() on my macos computer, it returned 'Darwin' 😆, nowhere near what I wanted it to be (something like 'macos'). So perhaps I would just keep it as an explicit string, because if a user sets out to run this simulation pipeline somewhere, the platform OS is unlikely to change very often. To clarify: it returned "WSL" or "Linux" on your WSL system?

Lots of argument validation now happens in get_args(), so hopefully more helpful error messages are produced.

Yeah, those error messages look good!

I added a feature so that both --aptamer and --ligand_seq can be names of files. In that case, the file contents are read in and used as the sequences. Literal strings can still be used instead of file names.

Nice!

I also reviewed the code, which should show up below here. Learned a lot from your code!

taoliu032

First off, thank you again for putting in the time! I really like the help message you included in the README! Second, I added comments (and found a little bug) for this PR. Some changes will necessitate changes to other part of the pipeline as well - which is relatively easy to me.

main.py

README.MD

taoliu032 · 2022-03-23T19:01:14Z

Hi @schackartk - if you are okay with it, I can merge your PR, then make the changes mentioned in my review comments :)

schackartk · 2022-03-23T21:52:21Z

I used camelCase for flag names like --ligandSeq, do you think it is better to use underscore notation like "--ligand_seq"? Perhaps I should've changed --run_num to --runNum to be consistent...

Most python tools will use either snake_case or skewer-case. When using skewer case, the - is implicity converted to _ by argparse. So for instance if you have an argument called --out-dir, it is accessed using args.out_dir since snakecase is the community standard in Python. You can use camelCase, is it syntactically correct, but it propagates into the code, such as args.ligandSeq, which is not Pythonic.

schackartk · 2022-03-23T21:58:04Z

Hi @schackartk - if you are okay with it, I can merge your PR, then make the changes mentioned in my review comments :)

Go for it! I am glad to see you found my changes useful.

taoliu032 · 2022-03-24T00:29:00Z

I used camelCase for flag names like --ligandSeq, do you think it is better to use underscore notation like "--ligand_seq"? Perhaps I should've changed --run_num to --runNum to be consistent...

Most python tools will use either snake_case or skewer-case. When using skewer case, the - is implicity converted to _ by argparse. So for instance if you have an argument called --out-dir, it is accessed using args.out_dir since snakecase is the community standard in Python. You can use camelCase, is it syntactically correct, but it propagates into the code, such as args.ligandSeq, which is not Pythonic.

Thanks for explaining it! I wasn't a big fan of snake_case because the "_" not long takes space but also requires pressing two keys (shift and -) to type lol
But it is a bit easier to read than camelCase, and since it is the community standard, I shall follow the convention as well :)

taoliu032 · 2022-03-24T00:30:57Z

Hi @schackartk - if you are okay with it, I can merge your PR, then make the changes mentioned in my review comments :)

Go for it! I am glad to see you found my changes useful.

Awesome! Oh yes, super useful! I will wait for you to take a look at the only remaining review comment above about aptamerSeq. Then I will process the merging. Thanks a lot again!!!

taoliu032

Will approve these changes, then I will make the further changes mentioned in the comments and discussions. Thanks again!

Complete the changes mentioned in the comments of PR#8: #8

* add shebang * add argument parsing * begin validating arguments * fix small inconsistencies * update argument formatting * improve directory handling * automatically detect OS * revert to run_num as argument name * update readme to reflect argparse implementation * revert to --run_num argument name * broaden default mmb location flag wildcards * fix a few typos * remove accidental extra characters * remove old command line parsing * use argparse to validate --mode === The commits above were made by schackartk and were merged into the topic branch by the PR #8 === * Complete PR#8 Complete the changes mentioned in the comments of the PR #8 * Update README.MD * Two more fixes in main.py 1. Add an argparse argument to specify OS in command line. 2. Only overwrite the output directory (eg, ".../run1") when `-f/--force` and '--pick_up No' in command line. Then update the help message of main.py in the README.MD according to the new main.py * Update argparse in main.py * Update argparse in main.py Co-authored-by: Kenneth Schackart <schackartk1@gmail.com>

schackartk added 14 commits March 8, 2022 11:18

add shebang

50ba13f

add argument parsing

d669fa0

begin validating arguments

044d903

fix small inconsistencies

eab3e0f

update argument formatting

97f1d41

improve directory handling

aaaed06

automatically detect OS

944499f

revert to run_num as argument name

4414fa7

update readme to reflect argparse implementation

4f0bc5c

revert to --run_num argument name

19f09eb

broaden default mmb location flag wildcards

40ef553

fix a few typos

49eab6e

remove accidental extra characters

fe379bc

remove old command line parsing

3d3fe1a

schackartk linked an issue Mar 9, 2022 that may be closed by this pull request

Feature Request: Argument parsing #7

Closed

InfluenceFunctional requested a review from taoliu032 March 9, 2022 00:15

This was referenced Mar 9, 2022

Enhancement: Avoid runtime exception when "run" folder exists #10

Closed

Bug: Runtime exception when params['workdir'] does not exist #9

Closed

Enhancement: More control over output location #12

Open

use argparse to validate --mode

aa75cbc

schackartk mentioned this pull request Mar 9, 2022

Bug: Mysterious error when using invalid mode #13

Closed

taoliu032 reviewed Mar 14, 2022

View reviewed changes

taoliu032 added the enhancement New feature or request label Mar 18, 2022

taoliu032 approved these changes May 3, 2022

View reviewed changes

taoliu032 merged commit 945e479 into siminegroup:7-feature-request-argument-parsing May 3, 2022

taoliu032 added a commit that referenced this pull request May 3, 2022

Complete PR#8

006af1f

Complete the changes mentioned in the comments of PR#8: #8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

7 feature request argument parsing #8

7 feature request argument parsing #8

schackartk commented Mar 9, 2022

taoliu032 commented Mar 13, 2022 •

edited

Loading

taoliu032 commented Mar 14, 2022

taoliu032 left a comment

taoliu032 commented Mar 23, 2022

schackartk commented Mar 23, 2022

schackartk commented Mar 23, 2022

taoliu032 commented Mar 24, 2022

taoliu032 commented Mar 24, 2022

taoliu032 left a comment

7 feature request argument parsing #8

7 feature request argument parsing #8

Conversation

schackartk commented Mar 9, 2022

Overview

Affected files

Notes

main.py

Readme.md

Conclusions

taoliu032 commented Mar 13, 2022 • edited Loading

taoliu032 commented Mar 14, 2022

taoliu032 left a comment

Choose a reason for hiding this comment

taoliu032 commented Mar 23, 2022

schackartk commented Mar 23, 2022

schackartk commented Mar 23, 2022

taoliu032 commented Mar 24, 2022

taoliu032 commented Mar 24, 2022

taoliu032 left a comment

Choose a reason for hiding this comment

taoliu032 commented Mar 13, 2022 •

edited

Loading