-
Notifications
You must be signed in to change notification settings - Fork 11
1.1. Manual: running JUM (v1.3.12)
- Under your working folder (
/user/home/AS_analysis
), you should have the following input files ready for JUM analysis ("*" indicates ctrl1, ctrl2, ctrl3, treat1, treat2 and treat3):-
*Aligned.out.sam
files -
*SJ.out.tab
files -
*Aligned.out_sorted.bam
files
-
Let us also assume that you have downloaded the JUM software package version 1.3.12 under your home directory and unzipped, so all the scripts are now in /user/home/JUM_1.3.12
).
-
Run
JUM_2-1.sh
:$ bash /user/home/JUM_1.3.12/JUM_2-1.sh
-
Create subdirectories for each condition. For example, control and treatment:
$ mkdir control $ mkdir treatment
-
Copy files with suffix
SJ.out.tab_strand_symbol_scaled
from the current directory to the corresponding condition subdirectories, respectively:$ cp ctrl1SJ.out.tab_strand_symbol_scaled control/ $ cp ctrl2SJ.out.tab_strand_symbol_scaled control/ $ cp ctrl3SJ.out.tab_strand_symbol_scaled control/ $ cp treat1SJ.out.tab_strand_symbol_scaled treatment/ $ cp treat2SJ.out.tab_strand_symbol_scaled treatment/ $ cp treat3SJ.out.tab_strand_symbol_scaled treatment/
-
Copy the file
UNION_junc_coor_with_junction_ID.txt
to each of the condition subdirectories:$ cp UNION_junc_coor_with_junction_ID.txt control/ $ cp UNION_junc_coor_with_junction_ID.txt treatment/
-
In each of the subdirectories, run
JUM-2-2.sh
as follows and then return to the current/main directory:$ bash /user/home/JUM_1.3.12/JUM_2-2.sh "directory" "junction_read_threshold" "file_number" "condition" #directory: path of the downloaded JUM package #junction_read_threshold - for junction filtering: JUM will filter for splice junctions that have more than this # of unique reads mapped to it in at least #file_number samples out of all replicates under the condition as valid junctions for downstream analysis #file_number - for junction filtering: JUM will filter for splice junctions that have more than #read_threshold of unique reads mapped to it in at least this # samples out of all replicates under the condition as valid junctions for downstream analysis #condition: the name of the condition, for example, control)
for example:
$ cd control $ bash /user/home/JUM_1.3.12/JUM_2-2.sh /user/home/JUM_1.3.12 5 2 control $ cd .. $ cd treatment $ bash /user/home/JUM_1.3.12/JUM_2-2.sh /user/home/JUM_1.3.12 5 2 treatment $ cd ..
NOTES
For
#junction_read_threshold
and#file_number
, users need to choose based on the RNA-seq sequencing depth and number of replicates they have for each condition. Below are a few examples and general rule of thumb:- If users have 3 replicates that are relatively deeply sequenced (~30M+ for drosophila and ~50M+ for human samples, for example), it is reasonable to filter for junctions that have more than 5 (or 10) reads in 2 replicates of one condition, or all 3 replicates of one condition.
- If users only have two replicates, it is reasonable to filter for junctions that have more than 5 (or 10) reads in both replicates of one condition.
- If users have 4 or even more replicates, it is reasonable to filter for junctions that have more than 5 (or 10) reads in at least (total # replicates - 1) samples of one condition.
- users can choose different
#file_number
for each condition, depends on how many replicates each condition has.
-
Copy the files with suffix
junction_counts.txt
andformatted.txt
from each of the condition subdirectories to the main/parent directory:$ cp control/*junction_counts.txt . $ cp treatment/*junction_counts.txt . $ cp control/*formatted.txt . $ cp treatment/*formatted.txt .
-
Run
JUM_2-3.sh
under the main directory (this step may take up to a couple of hours depending on number of samples to run):$ bash /user/home/JUM_1.3.12/JUM_2-3.sh "directory" "junction_read_threshold" "file_number" "IR_read_threshold" "read_length" #directory: path of the downloaded JUM package. #junction_read_threshold: as in step 5 #file_number: as in step 5. In the situation when this parameter is different for different conditions, choose the bigger number #IR_read_threshold - IR filter: JUM will filter for IR events that have more than this # of unique reads mapped to the upstream exon-intron and downstream intron-exon boundaries in as potential true IR events #read_length: the length of the RNA-seq reads
for example:
bash /user/home/JUM_1.3.12/JUM_2-3.sh /user/home/JUM_1.3.12 5 2 5 100
NOTES
- Step 7 will generate a new folder called
JUM_diff/
in the main directory with the results.
- Step 7 will generate a new folder called
-
Enter the
JUM_diff/
folder and run the R script with a user-provided experiment design file (txt format; a template is provided in the package) for differential AS analysis.$ cd JUM_diff/ $ Rscript /user/home/JUM_1.3.12/R_script_JUM.R experiment_design.txt > outputFile.Rout 2> errorFile.Rout
An example experiment_design.txt file is shown below:
Condition ctrl1 control ctrl2 control ctrl3 control treat1 treatment treat2 treatment treat3 treatment
NOTES
- It is important to make sure that in the experiment_design.txt file the sample naming and condition naming are in the same alphabetic order. For example, here control samples (ctrl1,2,3) all start with "c" so they are alphabatically before the treatment samples (treat1,2,3) that all start with "t"; accordingly, the condition name "control" for control samples is also alphabatically before the condition name "treatment" for treatment samples.
- R_script_JUM.R will output a file called
AS_differential.txt
.
-
(Make sure that Step 8 successfully generates a new file called
AS_differential.txt
in theJUM_diff/
folder. Otherwise you can refer to the fileerrorFile.Rout
for troubleshooting.) Now runJUM_3.sh
in theJUM_diff/
folder as follows:$ bash /user/home/JUM_1.3.12/JUM_3.sh "directory" "pvalue|adjusted_pvalue" "stat_threshold" "#samples" "#control|treated_samples" #directory: path of the downloaded JUM package #pvalue|adjusted_pvalue - choice of statistical measure for significance test: type "pvalue" or "adjusted_pvalue" #stat_threshold: a number, threshold for statistical cutoff ##samples: the number of total samples from all conditions ##control|treated_samples: the number of control samples or the number of treated samples, whichever is smaller
for example:
$ bash /user/home/JUM_1.3.12/JUM_3.sh /user/home/JUM_1.3.12 pvalue 0.05 6 3
NOTES
- We recommend the users run one round using
pvalue 0.05
at this step. This is the most generous statistical setting to profile for significantly differentially spliced AS events and it will be handy to keep a version of this result around, especially when users are still experimenting with the optimal statistical cutoff. In this case, when in need of more strict statistical cutoffs, users can easily filter thepvalue 0.05
analysis results using simple commands of linux that searches for AS events satisfying more strict cutoffs, instead of running step 9 again. - JUM_3.sh will generate a new folder called
FINAL_JUM_OUTPUT
that contains all the results.
- We recommend the users run one round using
-
Run
JUM_4.sh
in the folderFINAL_JUM_OUTPUT
as follows:$ cd FINAL_JUM_OUTPUT $ bash /user/home/JUM_1.3.12/JUM_4.sh "directory" "pvalue|adjusted_pvalue" "stat_threshold" "sample_#_condition_1" "sample_#_condition_2" "refFlat" #directory #pvalue|adjusted_pvalue: as in step 9 #stat_threshold: as in step 9 #sample_#_condition_1: the number of samples for condition 1 that is alphabetically listed first in the `experiment_design.txt` file #sample_#_condition_2: the number of samples for condition 2 that is alphabetically listed second in the `experiment_design.txt` file #refFlat: a `refFlat.txt` file. Such file should be available from UCSC genome browser for different organisms and users can download it to the current working directory. Note, JUM does **NOT** depend on any priori knowledge of annotation to perform AS analysis. This file here is for associating the final differential AS results from JUM to known genes for the convenience of users. If an AS event is not mapped to any known gene, it will be marked as "NONE" in the associated gene track.
For example:
bash /user/home/JUM_1.3.12/JUM_4.sh /user/home/JUM_1.3.12 pvalue 0.05 3 3 refFlat.txt
JUM_4.sh will output files with the name:
*_sorted_with_dpsi.txt
. These are the final output files.