Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove docker command wrappers; CPSR refactor #193

Merged
merged 33 commits into from Jul 11, 2022
Merged

Remove docker command wrappers; CPSR refactor #193

merged 33 commits into from Jul 11, 2022

Conversation

pdiakumis
Copy link
Collaborator

@pdiakumis pdiakumis commented Jul 3, 2022

  • Cleaning up the codebase to get rid of the ('legacy') docker wrappers.
  • Cleaned up the CPSR script in the process. Refactored its functions into the other files to fit with the PCGR modularisation.

Still got a bit of testing to do during the week so keeping this as a draft PR.

TODO:

  • Remove the rm -f bits which scare me a bit. We can replace with os.remove().
  • Test CPSR
  • Test PCGR

@pdiakumis
Copy link
Collaborator Author

Okay, think this is ready for review whenever you're back on board @sigven.
Summary of changes:

  • General changes:
    • use python f-strings throughout
    • apply 4-space indentation (instead of a mix of 3 and 2-space)
    • improve logging (a few things were getting duplicated in the logs since there were some loggers with the same name used).
  • pcgr/main.py:
    • remove docker_uid and no_docker CLI args
    • simplify --maf_gnomad.. help msg
    • fix logging
    • remove docker stuff
  • pcgr/cpsr.py:
    • move GE_panels to pcgr/pcgr_vars.py
    • fix indentation to 4 spaces
    • fix logging
    • add support for --version option
    • move check_args and verify_input_files to pcgr/arg_checker.py
  • pcgr/pcgr_vars.py:
    • moved GE_panels here for displaying in CPSR help message.
  • pcgr/utils.py:
    • removed docker-related stuff, added get_loftee_dir and get_cpsr_version funcs, moved is_integer func here.
  • scripts/annoutils.py:
    • removed is_valid_vcf func since we've deprecated EBIvariation/vcf-validator
    • removed duplicated funcs to use those from pcgr/utils.py.
  • scripts/pcgr_vcfanno.py:
    • mostly indentation, debug/logger refactor
  • scripts/cpsr_validate_input.py and scripts/pcgr_validate_input.py:
    • use multiallelic_list list to track and display multiallelic variants in the log in a single list, instead of printing them all line by line. Example similar to the one below.
    • fix logger
  • scripts/pcgr_summarise.py:
    • use funcs from pcgr/utils.py
    • better use of debug option
    • use vars_no_csq list to track and display variants with no VEP CSQ tag in the log in a single list, instead of printing them all line by line. Example:

Before:

2022-07-04 04:35:22 - pcgr-summarise - INFO - PCGR - STEP 3: Cancer gene annotations with pcgr-summarise
2022-07-04 04:35:22 - pcgr-summarise - INFO - pcgr_summarise.py /projects/sigverse/out/SBJ02242-somatic-PASS.pcgr_ready.vep.vcfanno.vcf.gz 0 0 /projects/sigverse/data/grch38 --debug
2022-07-04 04:35:22 - pcgr-gene-annotate - WARNING - Variant record g.1:7952954C>T has no CSQ tag from VEP (--vep_no_intergenic flag set?)  - skipping variant
2022-07-04 04:35:22 - pcgr-gene-annotate - WARNING - Variant record g.1:149040189C>T has no CSQ tag from VEP (--vep_no_intergenic flag set?)  - skipping variant
2022-07-04 04:35:22 - pcgr-gene-annotate - WARNING - Variant record g.1:164551593C>T has no CSQ tag from VEP (--vep_no_intergenic flag set?)  - skipping variant
2022-07-04 04:35:22 - pcgr-gene-annotate - WARNING - Variant record g.1:164859485A>T has no CSQ tag from VEP (--vep_no_intergenic flag set?)  - skipping variant
2022-07-04 04:35:22 - pcgr-gene-annotate - WARNING - Variant record g.1:164859486C>T has no CSQ tag from VEP (--vep_no_intergenic flag set?)  - skipping variant
[...]

After:

2022-07-04 05:07:13 - pcgr-gene-annotate - WARNING - The following 809 records do not have a CSQ tag from VEP (was --vep_no_intergenic flag set?) - skipping these variants:
----
g.1:7952954C>T, g.1:149040189C>T, g.1:164551593C>T, g.1:164859485A>T, g.1:164859486C>T, g.1:164860609C>T, g.1:164861170C>T, g.1:164866284G>A, [...]
  • pcgr/arg_checker.py:
    • refactored two large CPSR functions and moved them to here
    • remove docker stuff
    • host_directories -> pcgr_paths

@pdiakumis pdiakumis marked this pull request as ready for review July 5, 2022 15:54
@pdiakumis pdiakumis requested a review from sigven July 5, 2022 15:54
@sigven sigven merged commit 79e27ee into master Jul 11, 2022
@pdiakumis pdiakumis deleted the docker_unwrap branch July 11, 2022 07:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants