-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
End repair #63
End repair #63
Conversation
* develop: (61 commits) add biopython as a install requirement. move end repair to proper directory. remove touch and move ref_genome_path_list to output file. specify 95% of the machines memory in the config rather than 100% check for fq and fastq convert to using fstrings Cleaned up comments Basic working version Separated argparse into its own function, as discussed in #47 add update to readme instructions. CLI debugging and misc code fixes make sure it selects the correct config for rotary run. add swap sh for subprocess check call. add command line flag for forcing ovewrites with rotary init. more edits to capture entire std err output more code for running snakemake. add call snakemake. modularize Python code. fix command line error when no command is specified. update git ignore recreate scripts folder ... # Conflicts: # .gitignore # rotary/rules/rotary.smk
New hmm download
@jmtsuji I added e4dceff to add the dependencies to the other requirements files. So, three files require dependency updates.
You could technically remove The
The big projects run these steps using GitHub Actions (https://docs.github.com/en/actions) to do the build and upload steps. |
…directory so it can be rerun.
…in sample creation and during rotary init. Changes so it checks the entire file extension even if there are multiple.
Great job putting this together! Looks like it's coming together splendidly. I left a lot of comments, so keep going. Okay, right now, Right now, your code is like the following: samtools_sort_args = [dependency_dict['samtools'], 'sort', '-@', str(threads), '-m', f'{threads_mem_mb}M']
subprocess.run(samtools_sort_args, check=True, input=samtools_view.stdout, stdout=bam_handle, stderr=logfile_handle) I want to move your code to using So you can just run:
|
What are your thoughts on removing the CLI dependency check and the |
The following code should be moved to a function as it is repeated multiple times. if append_log is True:
write_mode = 'a'
elif append_log is False:
write_mode = 'w'
else:
raise ValueError I would also add an ValueError(fstring) / error.log to the explain why the value error is being raised. |
@LeeBergstrand Thanks for the explanation and for updating the other requirement files! I'll keep this in mind when updating dependencies in the future. Your overview of how the pip and conda installers interact is very helpful. |
Moved to a function |
@LeeBergstrand Actually, based on my previous testing, I am pretty sure that I can already call the underlying programs like samtools without a path, even if shell = False. However, I decided to explicitly set the path to the programs to avoid any weird PATH conflict issues that might happen. Do you think that such path conflicts are basically impossible? If so, I can simplify the code. |
@LeeBergstrand I like the CLI dependency check because it tells the user upfront that a dependency is missing. Otherwise, they might have to run 50% of the pipeline before they run into the missing dependency issue. In future, I am thinking of using the dependency check function to also grab the version of the dependency tool to add to the log. So I would be happy to leave the function in the code... what are your thoughts? |
@LeeBergstrand Thanks! Appreciate the comments. I left several responses and some unresolved threads, so please have a look and let me know your thoughts... we can move forward toward merging this PR based on your feedback. |
…tentions Bug fixes env pathing and file extentions
Given the installation, these pathing issues are generally unlikely. Because Given the above I think we can simplify the code. |
If the end repair pipeline takes a while, I would keep the dependency check-in. I think logging the dependency versions would be helpful. |
… stdin.stdout into function.
OK, got it. I have simplified the code! |
End repair might take several minutes (or longer if a large assembly is input), so let's leave |
* refactor-subcommands: move subprocess logging to function, invert check being default, move stdin.stdout into function. # Conflicts: # rotary/repair.py
@jmtsuji We are pretty close to being able to merge this in—one last issue other than removing the Is this the correct function argument mapping? In the other runs commands, you mapped |
@LeeBergstrand Yes, this is correct. This particular tool writes the log info to STDOUT, so I had to direct STDOUT to the logfile. I wanted to retain STDERR info as well in case anything got printed there. (By the way, this is the tool that I plan to replace with in-house code). |
Great I wanted to confirm. |
…ers with --verbose)
@LeeBergstrand Dependency dict and logger changes are addressed. End-to-end tests are passing. OK to merge? |
@jmtsuji Great. Feel free to merge. |
Addresses #58 so that
repair.py
is installed as part of rotary and is integrated into the snakemake workflow. Also includes usage of TIGRFAM HMMs.@LeeBergstrand would you mind taking a look before this PR is merged? (Thanks so much for your support for the install of repair.py!!)