Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

7 feature request argument parsing #8

Conversation

schackartk
Copy link
Contributor

Overview

This pull request implements argparse so that the user is less likely to need to edit source code in main.py. However, more work will need to be done to include parameters related to environmental condotions like ph, etc.

Other than implementing argparse, functionality is the same. Some things are still a bit awkward because I didn't want to change too much beyond that.

Affected files

The following files have changes:

  • main.py: add shebang line, add argument parsing and validation
  • automate_tests.sh: update arguments to align with argparse
  • README.md: describe current functionality and arguments

Notes

main.py

There were a few things that may need to be changed to work most efficiently and predictably.

The relationship between --ligand, --ligand_type, and --ligand_seq is a bit complex and can probably be improved. Ideally, I think --ligand would be optional, yielding a default of None. This makes more sense than having to use --ligand False. Then --ligand_type, and --ligand_seq could also be optional with a default of None (instead of an empty string). Only when --ligand is present, you validate the others are there and if not parser.error(). I also think the authors should consider if --ligand_seq is truly required if --l;igand is either 'peptide', 'DNA' or 'RNA'. Currently this is enforced (by parser.error()), but if it is actually optional, that should be updated.

I left the code that uses different params based on whether it is run as local or cluster, but I am not sure if it is necessary. I especially think that the hard-coded paths used when it is cluster should be removed, and turned into arguments. In which case, it is the same as the usual arguments, and may make --device obsolete if there is no difference between local and cluster.

I implemented wildcards to help the user find their MMB paths (lib and executable) within the --mmb_dir and --mmb. I am hoping the defaults will make it so users don't have to change this argument.

I removed the operating system argument and instead used platform to detect it. This new implementation has only been tested on my WSL system, so please check this works. One issue is if the result of platform.system().lower() doesn't match an expected value on mac. Initially mine returned Linux, which is why I ran lower() to make it 'linux' which is compatible with the previous implementation.

Lots of argument validation now happens in get_args(), so hopefully more helpful error messages are produced.

I added a feature so that both --aptamer and --ligand_seq can be names of files. In that case, the file contents are read in and used as the sequences. Literal strings can still be used instead of file names.

Readme.md

I hope my additions are helpful in describing the current functionality.

One thing I was uncertain is the description of ligand type saying "(default: Amber14)" I didn't see this anywhere that params were set. It is not the default to any arguments I set up. If this needs to be a default, please take note of this.

Conclusions

Currently, all modes in automate_tests.sh run for me, so it seems that these changes are compatible. It would be great to have unit and integration tests with pytest to confirm.

Please check that it works on MacOS still, as I have only tested on WSL.

No additional dependencies have been added, only core libraries were used.

Please feel free to make any changes you see fit or discuss!

@taoliu032
Copy link
Collaborator

taoliu032 commented Mar 13, 2022

@schackartk - Thanks for putting in the work! Very busy week - have you tested the code (such as running the automated test script) after incorporating these functions?

Edit: never mind, I just saw your "Conclusions" that you had tested the automated tests

@taoliu032
Copy link
Collaborator

Thank you so much for your updated code!!! I have several questions/comments regarding your changes:

The relationship between --ligand, --ligand_type, and --ligand_seq is a bit complex and can probably be improved. Ideally, I think --ligand would be optional, yielding a default of None. This makes more sense than having to use --ligand False. Then --ligand_type, and --ligand_seq could also be optional with a default of None (instead of an empty string). Only when --ligand is present, you validate the others are there and if not parser.error(). I also think the authors should consider if --ligand_seq is truly required if --l;igand is either 'peptide', 'DNA' or 'RNA'. Currently this is enforced (by parser.error()), but if it is actually optional, that should be updated.

  • I used camelCase for flag names like --ligandSeq, do you think it is better to use underscore notation like "--ligand_seq"? Perhaps I should've changed --run_num to --runNum to be consistent...
  • About --ligand, --ligand_type, and --ligand_seq: I agree their dependency can be improved, 100% concur.
    • When I designed these arguments, I wanted a user to be explicit about their systems - I think it helps a user including me to be clear about what they are about to simulate. Therefore I didn't make the three arguments optional, even in the case where a ligand is not required.
    • I chose their default values a bit arbitrarily, e.g., --ligand has a default value of string 'False' rather than None or boolean False. And I used these default values in the 'if' statements later in the code. The default values could be changed to None; if so, we will update several places in the rest of the code.
    • Yes, I think the --ligand_seq is necessary for peptide or DNA or RNA because we can make use of their primary structures. We ask users to provide a pdb file of ligand structure which should contain its sequence, if applicable; so the users can easily find out the sequence, which I think is safer and easier than writing a script to detect the sequence, especially given the range of pdb file format diversity.
    • I will think more about these three arguments, depending on any future features related to them. It's exciting that they are taken care of in the CL in a much better way now! Thank you!

I left the code that uses different params based on whether it is run as local or cluster, but I am not sure if it is necessary. I especially think that the hard-coded paths used when it is cluster should be removed, and turned into arguments. In which case, it is the same as the usual arguments, and may make --device obsolete if there is no difference between local and cluster.

  • Yeah you are right - we don't have to specify the file paths for 'local' and 'cluster' separately. I made the params['device'] to account for any difference between my computer (macos) and our cluster (linux), which is necessary for a module in the component to find its library files.
  • In practice, when a user needs to customize the pipeline for themselves (such as using an alternative package for a module), they may encounter other issues including difference between their local computers and their clusters, if any. Therefore, even if --device option seems redundant, I would recommend keeping it.
  • But I think it's a good idea to make file paths as arguments! Furthermore, we were thinking about making a yaml file to specify the input configurations - this could be another feature to add in the future.

I implemented wildcards to help the user find their MMB paths (lib and executable) within the --mmb_dir and --mmb. I am hoping the defaults will make it so users don't have to change this argument.

  • That is a good idea! The released MMB packages have gradually converged to a consistent way of organizing the files, but the MMB executable could be named just "MMB" or "MMB.2_17.Linux" for example. Therefore, I appended an extra "*" for MMB executable path as shown in my review comment.

I removed the operating system argument and instead used platform to detect it. This new implementation has only been tested on my WSL system, so please check this works. One issue is if the result of platform.system().lower() doesn't match an expected value on mac. Initially mine returned Linux, which is why I ran lower() to make it 'linux' which is compatible with the previous implementation.

  • I like the idea of detecting platform OS. However, when I ran platform.system() on my macos computer, it returned 'Darwin' 😆, nowhere near what I wanted it to be (something like 'macos'). So perhaps I would just keep it as an explicit string, because if a user sets out to run this simulation pipeline somewhere, the platform OS is unlikely to change very often. To clarify: it returned "WSL" or "Linux" on your WSL system?

Lots of argument validation now happens in get_args(), so hopefully more helpful error messages are produced.

  • Yeah, those error messages look good!

I added a feature so that both --aptamer and --ligand_seq can be names of files. In that case, the file contents are read in and used as the sequences. Literal strings can still be used instead of file names.

  • Nice!

I also reviewed the code, which should show up below here. Learned a lot from your code!

Copy link
Collaborator

@taoliu032 taoliu032 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First off, thank you again for putting in the time! I really like the help message you included in the README! Second, I added comments (and found a little bug) for this PR. Some changes will necessitate changes to other part of the pipeline as well - which is relatively easy to me.

main.py Show resolved Hide resolved
main.py Show resolved Hide resolved
main.py Show resolved Hide resolved
main.py Show resolved Hide resolved
main.py Show resolved Hide resolved
main.py Show resolved Hide resolved
main.py Show resolved Hide resolved
main.py Show resolved Hide resolved
README.MD Show resolved Hide resolved
README.MD Show resolved Hide resolved
@taoliu032 taoliu032 added the enhancement New feature or request label Mar 18, 2022
@taoliu032
Copy link
Collaborator

Hi @schackartk - if you are okay with it, I can merge your PR, then make the changes mentioned in my review comments :)

@schackartk
Copy link
Contributor Author

I used camelCase for flag names like --ligandSeq, do you think it is better to use underscore notation like "--ligand_seq"? Perhaps I should've changed --run_num to --runNum to be consistent...

Most python tools will use either snake_case or skewer-case. When using skewer case, the - is implicity converted to _ by argparse. So for instance if you have an argument called --out-dir, it is accessed using args.out_dir since snakecase is the community standard in Python. You can use camelCase, is it syntactically correct, but it propagates into the code, such as args.ligandSeq, which is not Pythonic.

@schackartk
Copy link
Contributor Author

Hi @schackartk - if you are okay with it, I can merge your PR, then make the changes mentioned in my review comments :)

Go for it! I am glad to see you found my changes useful.

@taoliu032
Copy link
Collaborator

I used camelCase for flag names like --ligandSeq, do you think it is better to use underscore notation like "--ligand_seq"? Perhaps I should've changed --run_num to --runNum to be consistent...

Most python tools will use either snake_case or skewer-case. When using skewer case, the - is implicity converted to _ by argparse. So for instance if you have an argument called --out-dir, it is accessed using args.out_dir since snakecase is the community standard in Python. You can use camelCase, is it syntactically correct, but it propagates into the code, such as args.ligandSeq, which is not Pythonic.

Thanks for explaining it! I wasn't a big fan of snake_case because the "_" not long takes space but also requires pressing two keys (shift and -) to type lol
But it is a bit easier to read than camelCase, and since it is the community standard, I shall follow the convention as well :)

@taoliu032
Copy link
Collaborator

Hi @schackartk - if you are okay with it, I can merge your PR, then make the changes mentioned in my review comments :)

Go for it! I am glad to see you found my changes useful.

Awesome! Oh yes, super useful! I will wait for you to take a look at the only remaining review comment above about aptamerSeq. Then I will process the merging. Thanks a lot again!!!

Copy link
Collaborator

@taoliu032 taoliu032 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will approve these changes, then I will make the further changes mentioned in the comments and discussions. Thanks again!

@taoliu032 taoliu032 merged commit 945e479 into siminegroup:7-feature-request-argument-parsing May 3, 2022
taoliu032 added a commit that referenced this pull request May 3, 2022
Complete the changes mentioned in the comments of PR#8: #8
taoliu032 added a commit that referenced this pull request May 13, 2022
* add shebang

* add argument parsing

* begin validating arguments

* fix small inconsistencies

* update argument formatting

* improve directory handling

* automatically detect OS

* revert to run_num as argument name

* update readme to reflect argparse implementation

* revert to --run_num argument name

* broaden default mmb location flag wildcards

* fix a few typos

* remove accidental extra characters

* remove old command line parsing

* use argparse to validate --mode

=== The commits above were made by schackartk and were merged into the topic branch by the PR #8 ===

* Complete PR#8

Complete the changes mentioned in the comments of the PR #8

* Update README.MD

* Two more fixes in main.py

1. Add an argparse argument to specify OS in command line.
2. Only overwrite the output directory (eg, ".../run1") when `-f/--force` and '--pick_up  No' in command line.
Then update the help message of main.py in the README.MD according to the new main.py

* Update argparse in main.py

* Update argparse in main.py

Co-authored-by: Kenneth Schackart <schackartk1@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature Request: Argument parsing
2 participants