In [4]:
library("dndscv")
library("readxl")

mutations = read.table('../analysis/merged_maf_filtered.txt', 
                       sep="\t", check.names=FALSE, header=TRUE, row.names=NULL)
metasamples = as.data.frame(read_excel('../Metadata.xlsx'))

In [5]:
# Keep only samples that are tumor-normal pair and labeled as included
df = metasamples[(metasamples$INCLUDED == "Y") 
                 & (metasamples$TUMOR_ONLY == "No")
                 & (metasamples$SAMPLE_TYPE != "gDNA")
                 & (metasamples$VAR2 %in% c('E', 'F')), 
                 c('PATIENT_ID', 'SAMPLE_ID')]
rownames(metasamples) = metasamples$SAMPLE_ID
meta_filtered = metasamples[df$SAMPLE_ID,]
SAMPLES = paste(df$SAMPLE_ID, df$PATIENT_ID, sep="-")
rownames(meta_filtered) = SAMPLES
mutations_filtered = mutations[mutations$Tumor_Sample_Barcode %in% SAMPLES, 
                               c('Tumor_Sample_Barcode', 'Chromosome', 'Start_Position', 'Reference_Allele', 'Tumor_Seq_Allele2')]

In [6]:
sel = rownames(meta_filtered[meta_filtered$VAR2 == 'E',])
mut = mutations_filtered[mutations_filtered$Tumor_Sample_Barcode %in% sel,]
dndsout = dndscv(mut, refdb="../dndscv_data/data/RefCDS_human_GRCh38.p12.rda", cv=NULL)

sel_cv = dndsout$sel_cv
signif_genes = sel_cv[sel_cv$pglobal_cv < 0.05, c("gene_name", "pglobal_cv", "qglobal_cv")]
rownames(signif_genes) = NULL
signif_genes$pglobal_cv = round(signif_genes$pglobal_cv, digits=3)
signif_genes$qglobal_cv = round(signif_genes$qglobal_cv, digits=3)
write.table(signif_genes, file='../analysis/analysis_responders/E/genes_table_dndscv.txt', 
            sep="\t", row.names=FALSE, col.names=TRUE, quote=FALSE)

[1] Loading the environment...

[2] Annotating the mutations...

“Mutations observed in contiguous sites within a sample. Please annotate or remove dinucleotide or complex substitutions for best results.”
“Same mutations observed in different sampleIDs. Please verify that these are independent events and remove duplicates otherwise.”
    Note: 25 mutations removed for exceeding the limit of mutations per gene per sample (see the max_muts_per_gene_per_sample argument in dndscv)

[3] Estimating global rates...

[4] Running dNdSloc...

[5] Running dNdScv...

    Regression model for substitutions: no covariates were used (theta = 0.00254).

    Regression model for indels (theta = 1.79)



In [7]:
sel = rownames(meta_filtered[meta_filtered$VAR2 == 'F',])
mut = mutations_filtered[mutations_filtered$Tumor_Sample_Barcode %in% sel,]
dndsout = dndscv(mut, refdb="../dndscv_data/data/RefCDS_human_GRCh38.p12.rda", cv=NULL)

sel_cv = dndsout$sel_cv
signif_genes = sel_cv[sel_cv$pglobal_cv < 0.05, c("gene_name", "pglobal_cv", "qglobal_cv")]
rownames(signif_genes) = NULL
signif_genes$pglobal_cv = round(signif_genes$pglobal_cv, digits=3)
signif_genes$qglobal_cv = round(signif_genes$qglobal_cv, digits=3)
write.table(signif_genes, file='../analysis/analysis_responders/F/genes_table_dndscv.txt', 
            sep="\t", row.names=FALSE, col.names=TRUE, quote=FALSE)

[1] Loading the environment...

[2] Annotating the mutations...

“Mutations observed in contiguous sites within a sample. Please annotate or remove dinucleotide or complex substitutions for best results.”
“Same mutations observed in different sampleIDs. Please verify that these are independent events and remove duplicates otherwise.”
    Note: 1 mutations removed for exceeding the limit of mutations per gene per sample (see the max_muts_per_gene_per_sample argument in dndscv)

[3] Estimating global rates...

[4] Running dNdSloc...

[5] Running dNdScv...

    Regression model for substitutions: no covariates were used (theta = 0.000166).

    Regression model for indels (theta = 2.83)

