http://intron-splicing-order.online:3838/iso/
packages within R
install.packages(c("readr","Rcpp","dplyr","igraph","dbscan","stringr","gtools","rstudioapi","gridExtra") )
BiocManager::install("lpsymphony")
Oracle JDK8/JRE8
STAR, minimap2, et.al, then index the bam file
samtools index <Bam file>
java -jar java/isoLarge.jar -i anno/hg19_gencode_from_ucsc.bed -ibam <bam_file> -o <output_file> -t <optional INT e.g. 90>
The last parameter is the minium length of nucleotides aligned in intron side of intron-exon junction
Please put the output file under data/
, since the R code will treat data/ as directory of intron splicing order pairs files.
Output format
Column | Meaning |
---|---|
Column #1 | Transcript id |
Column #2 | Intron 1, the coordinate of relatively slower spliced intron |
Column #3 | Intron 2, the coordinate of relatively faster spliced introns (also include detected junctions) |
Column #4 | Strand |
Column #5 | Deprecated |
Column #6 | Read count supports this intron splicing order pair (intron 1 spliced after intron 2) |
Column #7 | Read count supports both two introns were spliced |
If users are not working with Rstudio, then will need to edit the run.R to change the working dir to intron_order
Source the below R script in Rstudio.
intron_order/code/run_human.R
This can be easily got from ENSEMBL BioMart server, please the below file for an example.
data/hg19_ensembl_gene_id_trans_id_map.tsv
Column names are:
gene_id,trans_id,gene_symbol