Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fusion@gene[AB]@ensemblId not filed in by importStarfusion #16

Closed
plijnzaad opened this issue Feb 15, 2018 · 2 comments
Closed

fusion@gene[AB]@ensemblId not filed in by importStarfusion #16

plijnzaad opened this issue Feb 15, 2018 · 2 comments

Comments

@plijnzaad
Copy link

Hi,

in the latest version of importStarfusion function does not fill in the ensemblId slot of the fusion partners. Would be nice to have. This is using output from STAR-fusion 1.2.0 (run CentOS 7, 3.10.0-693.11.6.el7.x86_64), analyzed on Mac OSX (Darwin PMC-GEN003 15.6.0 Darwin Kernel Version 15.6.0: Tue Jan 9 20:12:05 PST 2018; root:xnu-3248.73.5~1/RELEASE_X86_64 x86_64 i386 MacBookPro12).

The LeftGene and RightGene columns of the star-fusion.fusion_predictions.abridged.tsv file look like MT-ATP6^ENSG00000198899.2

> sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: OS X El Capitan 10.11.6

Matrix products: default
BLAS: /opt/local/Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.dylib
LAPACK: /opt/local/Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] C

attached base packages:
 [1] grid      stats4    parallel  stats     datasets  graphics  grDevices
 [8] utils     methods   base     

other attached packages:
 [1] chimeraviz_1.4.1       ensembldb_2.2.0        AnnotationFilter_1.3.1
 [4] GenomicFeatures_1.30.3 AnnotationDbi_1.40.0   Biobase_2.38.0        
 [7] Gviz_1.22.2            GenomicRanges_1.30.1   GenomeInfoDb_1.14.0   
[10] Biostrings_2.46.0      XVector_0.18.0         IRanges_2.12.0        
[13] S4Vectors_0.16.0       BiocGenerics_0.24.0    uuutils_1.48          
[16] gplots_3.0.1          

loaded via a namespace (and not attached):
 [1] ProtGenerics_1.10.0           bitops_1.0-6                 
 [3] matrixStats_0.53.1            devtools_1.13.4              
 [5] bit64_0.9-7                   RColorBrewer_1.1-2           
 [7] progress_1.1.2                httr_1.3.1                   
 [9] rprojroot_1.3-2               tools_3.4.3                  
[11] backports_1.1.2               DT_0.4                       
[13] R6_2.2.2                      rpart_4.1-11                 
[15] KernSmooth_2.23-15            Hmisc_4.1-1                  
[17] DBI_0.7-15                    lazyeval_0.2.1               
[19] colorspace_1.3-2              nnet_7.3-12                  
[21] withr_2.1.1                   gridExtra_2.3                
[23] prettyunits_1.0.2             RMySQL_0.10.13               
[25] bit_1.1-12                    curl_3.1                     
[27] compiler_3.4.3                git2r_0.21.0                 
[29] htmlTable_1.11.2              DelayedArray_0.4.1           
[31] rtracklayer_1.38.3            caTools_1.17.1               
[33] scales_0.5.0                  checkmate_1.8.5              
[35] readr_1.1.1                   RCircos_1.2.0                
[37] stringr_1.2.0                 digest_0.6.15                
[39] Rsamtools_1.30.0              foreign_0.8-69               
[41] rmarkdown_1.8                 pkgconfig_2.0.1              
[43] base64enc_0.1-3               dichromat_2.0-0              
[45] htmltools_0.3.6               BSgenome_1.46.0              
[47] htmlwidgets_1.0               rlang_0.1.6                  
[49] rstudioapi_0.7                RSQLite_2.0                  
[51] BiocInstaller_1.28.0          shiny_1.0.5                  
[53] BiocParallel_1.12.0           gtools_3.5.0                 
[55] acepack_1.4.1                 VariantAnnotation_1.24.5     
[57] RCurl_1.95-4.10               magrittr_1.5                 
[59] GenomeInfoDbData_1.0.0        Formula_1.2-2                
[61] Matrix_1.2-12                 Rcpp_0.12.15                 
[63] munsell_0.4.3                 stringi_1.1.6                
[65] yaml_2.1.16                   SummarizedExperiment_1.8.1   
[67] zlibbioc_1.24.0               org.Hs.eg.db_3.5.0           
[69] plyr_1.8.4                    AnnotationHub_2.10.1         
[71] blob_1.1.0                    gdata_2.18.0                 
[73] lattice_0.20-35               splines_3.4.3                
[75] hms_0.4.1                     knitr_1.19                   
[77] pillar_1.1.0                  biomaRt_2.34.2               
[79] XML_3.98-1.9                  evaluate_0.10.1              
[81] biovizBase_1.26.0             latticeExtra_0.6-28          
[83] data.table_1.10.4-3           httpuv_1.3.5                 
[85] gtable_0.2.0                  assertthat_0.2.0             
[87] ggplot2_2.2.1                 mime_0.5                     
[89] xtable_1.8-2                  ArgumentCheck_0.10.2         
[91] survival_2.41-3               tibble_1.4.2                 
[93] GenomicAlignments_1.14.1      memoise_1.1.0                
[95] cluster_2.0.6                 interactiveDisplayBase_1.16.0
[97] BiocStyle_2.6.1              
@plijnzaad
Copy link
Author

I quickly concocted a work around, maybe this is of use to anyone (too much in a hurry to do this a as a proper pull request, sorry :-)


.ensid <- function(gene){
    gsub(perl=TRUE, "\\.\\d+$","",
         unlist(lapply(strsplit(gene, "\\^"), function(p)p[2])))
}


addEnsemblIds <- function(file, fusions) {
    ## Specific to STAR-fusion output
    ## import misses the ens id's, add them here
    ## Usage: fusions <- addEnsemblIds(file,fusions)
    table <- read.table(file=file,
                          sep="\t", as.is=TRUE, quote="", header=TRUE,
                          comment.char="", row.names=NULL)
    if(nrow(table) != length(fusions))
      stop("Number of fusions found in ", file,
           " unequal to that in fusions argument")
    if (is.null(table$LeftGene) || is.null(table$RightGene))
      stop("Missing columns LeftGene and/or RightGene in ", file)
    ensA <- .ensid(table$LeftGene)
    ensB <- .ensid(table$RightGene)

    sapply(1:length(fusions), function(i) {
        f <- fusions[[i]]
        f@geneA@ensemblId <- ensA[i]
        f@geneB@ensemblId <- ensB[i]
        f
    })
}                                       #addEnsemblIds


@plijnzaad plijnzaad reopened this Feb 15, 2018
@stianlagstad
Copy link
Owner

Thank you!:) I've pushed a fix for this which will be available in chimeraviz version 1.4.2 of the release version of Bioconductor, and chimeraviz version 1.5.4 of the devel version of Bioconductor.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants