fusion@gene[AB]@ensemblId not filed in by importStarfusion #16

plijnzaad opened this issue Feb 15, 2018 · 2 comments

in the latest version of importStarfusion function does not fill in the ensemblId slot of the fusion partners. Would be nice to have. This is using output from STAR-fusion 1.2.0 (run CentOS 7, 3.10.0-693.11.6.el7.x86_64), analyzed on Mac OSX (Darwin PMC-GEN003 15.6.0 Darwin Kernel Version 15.6.0: Tue Jan 9 20:12:05 PST 2018; root:xnu-3248.73.5~1/RELEASE_X86_64 x86_64 i386 MacBookPro12).

The LeftGene and RightGene columns of the star-fusion.fusion_predictions.abridged.tsv file look like MT-ATP6^ENSG00000198899.2

> sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: OS X El Capitan 10.11.6

Matrix products: default
BLAS: /opt/local/Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.dylib
LAPACK: /opt/local/Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

[1] C

attached base packages:
 [1] grid      stats4    parallel  stats     datasets  graphics  grDevices
 [8] utils     methods   base     

other attached packages:
 [1] chimeraviz_1.4.1       ensembldb_2.2.0        AnnotationFilter_1.3.1
 [4] GenomicFeatures_1.30.3 AnnotationDbi_1.40.0   Biobase_2.38.0        
 [7] Gviz_1.22.2            GenomicRanges_1.30.1   GenomeInfoDb_1.14.0   
[10] Biostrings_2.46.0      XVector_0.18.0         IRanges_2.12.0        
[13] S4Vectors_0.16.0       BiocGenerics_0.24.0    uuutils_1.48          
[16] gplots_3.0.1          

loaded via a namespace (and not attached):
 [1] ProtGenerics_1.10.0           bitops_1.0-6                 
 [3] matrixStats_0.53.1            devtools_1.13.4              
 [5] bit64_0.9-7                   RColorBrewer_1.1-2           
 [7] progress_1.1.2                httr_1.3.1                   
 [9] rprojroot_1.3-2               tools_3.4.3                  
[11] backports_1.1.2               DT_0.4                       
[13] R6_2.2.2                      rpart_4.1-11                 
[15] KernSmooth_2.23-15            Hmisc_4.1-1                  
[17] DBI_0.7-15                    lazyeval_0.2.1               
[19] colorspace_1.3-2              nnet_7.3-12                  
[21] withr_2.1.1                   gridExtra_2.3                
[23] prettyunits_1.0.2             RMySQL_0.10.13               
[25] bit_1.1-12                    curl_3.1                     
[27] compiler_3.4.3                git2r_0.21.0                 
[29] htmlTable_1.11.2              DelayedArray_0.4.1           
[31] rtracklayer_1.38.3            caTools_1.17.1               
[33] scales_0.5.0                  checkmate_1.8.5              
[35] readr_1.1.1                   RCircos_1.2.0                
[37] stringr_1.2.0                 digest_0.6.15                
[39] Rsamtools_1.30.0              foreign_0.8-69               
[41] rmarkdown_1.8                 pkgconfig_2.0.1              
[43] base64enc_0.1-3               dichromat_2.0-0              
[45] htmltools_0.3.6               BSgenome_1.46.0              
[47] htmlwidgets_1.0               rlang_0.1.6                  
[49] rstudioapi_0.7                RSQLite_2.0                  
[51] BiocInstaller_1.28.0          shiny_1.0.5                  
[53] BiocParallel_1.12.0           gtools_3.5.0                 
[55] acepack_1.4.1                 VariantAnnotation_1.24.5     
[57] RCurl_1.95-4.10               magrittr_1.5                 
[59] GenomeInfoDbData_1.0.0        Formula_1.2-2                
[61] Matrix_1.2-12                 Rcpp_0.12.15                 
[63] munsell_0.4.3                 stringi_1.1.6                
[65] yaml_2.1.16                   SummarizedExperiment_1.8.1   
[67] zlibbioc_1.24.0                
[69] plyr_1.8.4                    AnnotationHub_2.10.1         
[71] blob_1.1.0                    gdata_2.18.0                 
[73] lattice_0.20-35               splines_3.4.3                
[75] hms_0.4.1                     knitr_1.19                   
[77] pillar_1.1.0                  biomaRt_2.34.2               
[79] XML_3.98-1.9                  evaluate_0.10.1              
[81] biovizBase_1.26.0             latticeExtra_0.6-28          
[83] data.table_1.10.4-3           httpuv_1.3.5                 
[85] gtable_0.2.0                  assertthat_0.2.0             
[87] ggplot2_2.2.1                 mime_0.5                     
[89] xtable_1.8-2                  ArgumentCheck_0.10.2         
[91] survival_2.41-3               tibble_1.4.2                 
[93] GenomicAlignments_1.14.1      memoise_1.1.0                
[95] cluster_2.0.6                 interactiveDisplayBase_1.16.0
[97] BiocStyle_2.6.1              
I quickly concocted a work around, maybe this is of use to anyone (too much in a hurry to do this a as a proper pull request, sorry :-)

.ensid <- function(gene){
    gsub(perl=TRUE, "\\.\\d+$","",
         unlist(lapply(strsplit(gene, "\\^"), function(p)p[2])))

addEnsemblIds <- function(file, fusions) {
    ## Specific to STAR-fusion output
    ## import misses the ens id's, add them here
    ## Usage: fusions <- addEnsemblIds(file,fusions)
    table <- read.table(file=file,
                          sep="\t",, quote="", header=TRUE,
                          comment.char="", row.names=NULL)
    if(nrow(table) != length(fusions))
      stop("Number of fusions found in ", file,
           " unequal to that in fusions argument")
    if (is.null(table$LeftGene) || is.null(table$RightGene))
      stop("Missing columns LeftGene and/or RightGene in ", file)
    ensA <- .ensid(table$LeftGene)
    ensB <- .ensid(table$RightGene)

    sapply(1:length(fusions), function(i) {
        f <- fusions[[i]]
        f@geneA@ensemblId <- ensA[i]
        f@geneB@ensemblId <- ensB[i]
}                                       #addEnsemblIds

@plijnzaad plijnzaad reopened this Feb 15, 2018
Thank you!:) I've pushed a fix for this which will be available in chimeraviz version 1.4.2 of the release version of Bioconductor, and chimeraviz version 1.5.4 of the devel version of Bioconductor.

