Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove docker command wrappers; CPSR refactor #193

Merged
merged 33 commits into from Jul 11, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
fb1a6fa
pcgr_validate_input: no default output_dir
pdiakumis Jul 3, 2022
5441ffd
utils: remove docker references
pdiakumis Jul 3, 2022
8403aa3
utils: get loftee dir
pdiakumis Jul 3, 2022
f9a1d8e
cpsr_validate_input: no default output_dir
pdiakumis Jul 3, 2022
ffd9d41
arg_checker: host_directories -> pcgr_paths
pdiakumis Jul 3, 2022
fccf351
modularise cpsr funcs/vars
pdiakumis Jul 3, 2022
14f5b2a
cpsr: modularise funcs/vars; remove docker wrappers
pdiakumis Jul 3, 2022
0b538cd
pcgr main: remove docker wrappers
pdiakumis Jul 3, 2022
190d78d
cpsr.py: fix indentation
pdiakumis Jul 3, 2022
b4c673a
main.py: shorten maf help msg
pdiakumis Jul 3, 2022
99ca87e
Genomics England PanelApp -> GEP
pdiakumis Jul 3, 2022
2a26837
cleanup help msg
pdiakumis Jul 3, 2022
0f06070
remove trailing whitespace
pdiakumis Jul 3, 2022
6c15590
refactor/align pcgr_validate_input
pdiakumis Jul 3, 2022
a17828e
refactor utils; remove EBI vcf-validation; use debug in scripts
pdiakumis Jul 3, 2022
b646554
pcgr_summarise: fix indentation
pdiakumis Jul 3, 2022
bdc4822
pcgr_vcfanno: fix indentation
pdiakumis Jul 3, 2022
467f4ca
pcgr_vcfanno: add debug/logger
pdiakumis Jul 3, 2022
d5eb7d1
pcgr_summarise: use vars_no_csq for succinct logging
pdiakumis Jul 4, 2022
d298908
main.py: refactor debug/logging
pdiakumis Jul 4, 2022
c76b2e7
cpsr: fix indentation
pdiakumis Jul 4, 2022
1b73cad
cpsr: logging
pdiakumis Jul 4, 2022
4932e7a
cpsr: support version checking
pdiakumis Jul 5, 2022
5ff22ae
pcgr: log multiallelics in list
pdiakumis Jul 5, 2022
3c9aeb3
cpsr: log multiallelics in list; fix debug
pdiakumis Jul 5, 2022
7c1771b
pcgr/cpsr: remove --docker_uid, --no_docker options
pdiakumis Jul 5, 2022
a457007
log up to 100 of multiallelic/intergenic records
pdiakumis Jul 6, 2022
af1f475
output TRUE/FALSE in log for "skip intergenic"
pdiakumis Jul 7, 2022
2a4a539
remove NEWS.md from pcgrr (using CHANGELOG instead)
pdiakumis Jul 7, 2022
d3f740b
use unzip_if_gzipped_cmd when simplifying and sorting input_vcf
pdiakumis Jul 7, 2022
817e873
fix ICGC-PCAWG rendering
pdiakumis Jul 9, 2022
e1d5be5
fix vcf unzipping
pdiakumis Jul 9, 2022
1e560fc
fix
pdiakumis Jul 9, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
292 changes: 237 additions & 55 deletions pcgr/arg_checker.py

Large diffs are not rendered by default.

990 changes: 338 additions & 652 deletions pcgr/cpsr.py

Large diffs are not rendered by default.

344 changes: 126 additions & 218 deletions pcgr/main.py

Large diffs are not rendered by default.

63 changes: 55 additions & 8 deletions pcgr/pcgr_vars.py
Expand Up @@ -2,6 +2,14 @@

from pcgr._version import __version__

PCGR_VERSION = __version__
DB_VERSION = '20220203'
VEP_VERSION = '105'
GENCODE_VERSION = '39'
NCBI_BUILD_MAF = 'GRCh38'
VEP_ASSEMBLY = 'GRCh38'
MAX_VARIANTS_FOR_REPORT = 500_000

tsites = {
0: 'Any',
1: 'Adrenal Gland',
Expand Down Expand Up @@ -37,12 +45,51 @@
}

tumor_sites = '\n'.join([f'{k} = {tsites[k]}' for k in tsites]) # for displaying in help
PCGR_VERSION = __version__
DB_VERSION = '20220203'
VEP_VERSION = '105'
GENCODE_VERSION = '39'
NCBI_BUILD_MAF = 'GRCh38'
VEP_ASSEMBLY = 'GRCh38'
MAX_VARIANTS_FOR_REPORT = 500000
DOCKER_IMAGE_VERSION = f'sigven/pcgr:{PCGR_VERSION}'

GE_panels = {
0: "CPSR exploratory cancer predisposition panel (n = 433, GEP / TCGA Germline Study / Cancer Gene Census / Other)",
1: "Adult solid tumours cancer susceptibility (GEP)",
2: "Adult solid tumours for rare disease (GEP)",
3: "Bladder cancer pertinent cancer susceptibility (GEP)",
4: "Brain cancer pertinent cancer susceptibility (GEP)",
5: "Breast cancer pertinent cancer susceptibility (GEP)",
6: "Childhood solid tumours cancer susceptibility (GEP)",
7: "Colorectal cancer pertinent cancer susceptibility (GEP)",
8: "Endometrial cancer pertinent cancer susceptibility (GEP)",
9: "Familial Tumours Syndromes of the central & peripheral Nervous system (GEP)",
10: "Familial breast cancer (GEP)",
11: "Familial melanoma (GEP)",
12: "Familial prostate cancer (GEP)",
13: "Familial rhabdomyosarcoma (GEP)",
14: "GI tract tumours (GEP)",
15: "Genodermatoses with malignancies (GEP)",
16: "Haematological malignancies cancer susceptibility (GEP)",
17: "Haematological malignancies for rare disease (GEP)",
18: "Head and neck cancer pertinent cancer susceptibility (GEP)",
19: "Inherited MMR deficiency (Lynch Syndrome) - GEP",
20: "Inherited non-medullary thyroid cancer (GEP)",
21: "Inherited ovarian cancer (without breast cancer) (GEP)",
22: "Inherited pancreatic cancer (GEP)",
23: "Inherited polyposis (GEP)",
24: "Inherited predisposition to acute myeloid leukaemia (AML) (GEP)",
25: "Inherited predisposition to GIST (GEP)",
26: "Inherited renal cancer (GEP)",
27: "Inherited phaeochromocytoma and paraganglioma (GEP)",
28: "Melanoma pertinent cancer susceptibility (GEP)",
29: "Multiple endocrine tumours (GEP)",
30: "Multiple monogenic benign skin tumours (GEP)",
31: "Neuroendocrine cancer pertinent cancer susceptibility (GEP)",
32: "Neurofibromatosis Type 1 (GEP)",
33: "Ovarian cancer pertinent cancer susceptibility (GEP)",
34: "Parathyroid Cancer (GEP)",
35: "Prostate cancer pertinent cancer susceptibility (GEP)",
36: "Renal cancer pertinent cancer susceptibility (GEP)",
37: "Rhabdoid tumour predisposition (GEP)",
38: "Sarcoma cancer susceptibility (GEP)",
39: "Sarcoma susceptbility (GEP)",
40: "Thyroid cancer pertinent cancer susceptibility (GEP)",
41: "Tumour predisposition - childhood onset (GEP)",
42: "Upper gastrointestinal cancer pertinent cancer susceptibility (GEP)"
}

panels = '\n'.join([f'{k} = {GE_panels[k]}' for k in GE_panels]) # for displaying in help
49 changes: 23 additions & 26 deletions pcgr/utils.py
Expand Up @@ -6,23 +6,6 @@
import os
import platform

def get_docker_user_id(docker_user_id):
logger = getlogger('pcgr-get-OS')
uid = ''
if docker_user_id:
uid = docker_user_id
elif platform.system() == 'Linux' or platform.system() == 'Darwin' or sys.platform == 'darwin' or sys.platform == 'linux2' or sys.platform == 'linux':
uid = os.getuid()
else:
if platform.system() == 'Windows' or sys.platform == 'win32' or sys.platform == 'cygwin':
uid = getpass.getuser()

if uid == '':
warn_msg = (f'Was not able to get user id/username for logged-in user on the underlying platform '
f'(platform.system(): {platform.system()}, sys.platform: {sys.platform}, now running PCGR as root')
logger.warning(warn_msg)
uid = 'root'
return uid

def getlogger(logger_name):
logger = logging.getLogger(logger_name)
Expand Down Expand Up @@ -60,25 +43,25 @@ def check_subprocess(logger, command, debug):
print(e.output.decode())
exit(0)

def script_path(env, bin_script, docker_run):
def script_path(env, bin_script):
"""Returns e.g. /path/conda/envs/{env}/{bin_script}
"""
prefix = conda_env_path(env, docker_run)
prefix = conda_env_path(env)
return os.path.join(prefix, bin_script)

def conda_env_path(env, docker_run):
def conda_env_path(env):
"""Construct absolute path to a conda env
using the current activated env as a prefix.
e.g. /path/to/conda/envs/{env}
"""
if docker_run:
env_path = f'/opt/mambaforge/envs/{env}'
else:
cp = os.path.normpath(os.environ.get('CONDA_PREFIX')) # /path/to/conda/envs/FOO
env_dir = os.path.dirname(cp) # /path/to/conda/envs
env_path = os.path.join(env_dir, env) # /path/to/conda/envs/{env}
cp = os.path.normpath(os.environ.get('CONDA_PREFIX')) # /path/to/conda/envs/FOO
env_dir = os.path.dirname(cp) # /path/to/conda/envs
env_path = os.path.join(env_dir, env) # /path/to/conda/envs/{env}
return env_path

def get_loftee_dir():
return script_path("pcgr", "share/loftee")

def get_pcgr_bin():
"""Return abs path to e.g. conda/env/pcgr/bin
"""
Expand All @@ -101,3 +84,17 @@ def get_perl_exports():
perl_path_parent = os.path.dirname(perl_path) # /conda/env/pcgr/bin
out = f"unset PERL5LIB && export PATH={perl_path_parent}:\"$PATH\""
return out

def is_integer(n):
try:
float(n)
except ValueError:
return False
else:
return float(n).is_integer()

def get_cpsr_version():
# use pcgrr's Rscript to grab cpsr's R pkg version
rscript = script_path("pcgrr", "bin/Rscript")
v_cmd = f"{rscript} -e 'x <- paste0(\"cpsr \", as.character(packageVersion(\"cpsr\"))); cat(x, \"\n\")'"
return subprocess.check_output(v_cmd, shell=True).decode("utf-8")
26 changes: 0 additions & 26 deletions pcgrr/NEWS.md

This file was deleted.

4 changes: 2 additions & 2 deletions pcgrr/vignettes/output.Rmd
Expand Up @@ -320,8 +320,8 @@ A VCF file containing annotated, somatic calls (single nucleotide variants and i
| `COSMIC_MUTATION_ID` | Mutation identifier in [Catalog of somatic mutations in cancer](http://cancer.sanger.ac.uk/cancergenome/projects/cosmic/) database, as provided by VEP |
| `TCGA_PANCANCER_COUNT` | Raw variant count across all TCGA tumor types |
| `TCGA_FREQUENCY` | Frequency of variant across TCGA tumor types. Format: `tumortype| percent affected|affected cases|total cases` |
| `ICGC_PCAWG_OCCURRENCE` | Mutation occurrence in [ICGC|PCAWG](http://docs.icgc.org/pcawg/). By project: `project_code|affected_donors|tested_donors|frequency` |
| `ICGC_PCAWG_AFFECTED_DONORS` | Number of donors with the current mutation in [ICGC|PCAWG](http://docs.icgc.org/pcawg/) |
| `ICGC_PCAWG_OCCURRENCE` | Mutation occurrence in [ICGC-PCAWG](http://docs.icgc.org/pcawg/). By project: `project_code|tumor_type|affected_donors|tested_donors|frequency` |
| `ICGC_PCAWG_AFFECTED_DONORS` (**?**) | Number of donors with the current mutation in [ICGC-PCAWG](http://docs.icgc.org/pcawg/) |

##### _Clinical associations_

Expand Down