Skip to content

6. Running JUM for differential AS analysis among multiple experimental conditions (multiple tissues, patients, time series experiments etc.)

Qingqing Wang edited this page Jul 5, 2019 · 6 revisions

Running JUM for differential AS among multiple experimental conditions

  1. Suppose we are comparing the differential AS profiles between control and two treatment conditions (treatA and treatB), and each condition has three replicates. Under your working folder (/user/home/AS_analysis), you should have the following input files ready for JUM analysis ("*" indicates ctrl1, ctrl2, ctrl3, treatA_1, treatA_2, treatA_3, and treatB_1, treatB_2, treatB_3):
    • *Aligned.out.sam files
    • *SJ.out.tab files
    • *Aligned.out_sorted.bam files

Let us also assume that you have downloaded the JUM software package version 2.0.2 for multi-conditions under your home directory and unzipped, so all the scripts are now in /user/home/JUM_2.0.2).

  1. Run JUM_2-1.sh:

    $ bash /user/home/JUM_2.0.2/JUM_2-1.sh
  2. Create subdirectories for each condition. For example, control, treatA, and treatB:

    $ mkdir control
    $ mkdir treatA
    $ mkdir treatB
  3. Copy files with suffix SJ.out.tab_strand_symbol_scaled from the current directory to the corresponding condition subdirectories, respectively:

    $ cp ctrl1SJ.out.tab_strand_symbol_scaled control/
    $ cp ctrl2SJ.out.tab_strand_symbol_scaled control/
    $ cp ctrl3SJ.out.tab_strand_symbol_scaled control/
    $ cp treatA_1SJ.out.tab_strand_symbol_scaled treatA/
    $ cp treatA_2SJ.out.tab_strand_symbol_scaled treatA/
    $ cp treatA_3SJ.out.tab_strand_symbol_scaled treatA/
    $ cp treatB_1SJ.out.tab_strand_symbol_scaled treatB/
    $ cp treatB_2SJ.out.tab_strand_symbol_scaled treatB/
    $ cp treatB_3SJ.out.tab_strand_symbol_scaled treatB/
  4. Copy the file UNION_junc_coor_with_junction_ID.txt to each of the condition subdirectories:

    $ cp UNION_junc_coor_with_junction_ID.txt control/
    $ cp UNION_junc_coor_with_junction_ID.txt treatA/
    $ cp UNION_junc_coor_with_junction_ID.txt treatB/
  5. In each of the subdirectories, run JUM-2-2.sh as follows and then return to the current/main directory:

    $ bash /user/home/JUM_2.0.2/JUM_2-2.sh --Folder "directory" --Threshold "junction_read_threshold" --Filenum "file_number" --Condition "condition"
    #--Folder: path of the downloaded JUM package
    #--Threshold - for junction filtering: JUM will filter for splice junctions that have more than this # of unique reads mapped to it in at least #file_number samples out of all replicates under the condition as valid junctions for downstream analysis
    #--Filenum - for junction filtering: JUM will filter for splice junctions that have more than #read_threshold of unique reads mapped to it in at least this # samples out of all replicates under the condition as valid junctions for downstream analysis
    #--Condition: the name of the condition, for example, control

    for example:

    $ cd control
    $ bash /user/home/JUM_2.0.2/JUM_2-2.sh --Folder /user/home/JUM_2.0.2 --Threshold 5 --Filenum 2 --Condition control
    $ cd ..
    $ cd treatA
    $ bash /user/home/JUM_2.0.2/JUM_2-2.sh --Folder /user/home/JUM_2.0.2 --Threshold 5 --Filenum 2 --Condition treatA
    $ cd ..
    $ cd treatB
    $ bash /user/home/JUM_2.0.2/JUM_2-2.sh --Folder /user/home/JUM_2.0.2 --Threshold 5 --Filenum 2 --Condition treatB
    $ cd ..

    NOTES

    For #--Threshold and #--Filenum, users need to choose based on the RNA-seq sequencing depth and number of replicates they have for each condition. Below are a few examples and general rule of thumb:

    • If users have 3 replicates that are relatively deeply sequenced (~30M+ for drosophila and ~50M+ for human samples, for example), it is reasonable to filter for junctions that have more than 5 (or 10) reads in 2 replicates of one condition, or all 3 replicates of one condition.
    • If users only have two replicates, it is reasonable to filter for junctions that have more than 5 (or 10) reads in both replicates of one condition.
    • If users have 4 or even more replicates, it is reasonable to filter for junctions that have more than 5 (or 10) reads in at least (total # replicates - 1) samples of one condition.
    • users can choose different #--Filenum for each condition, depends on how many replicates each condition has.
  6. Copy the files with suffix junction_counts.txt and formatted.txt from each of the condition subdirectories to the main/parent directory:

    $ cp control/*junction_counts.txt .
    $ cp treatA/*junction_counts.txt .
    $ cp treatB/*junction_counts.txt .
    $ cp control/*formatted.txt .
    $ cp treatA/*formatted.txt .
    $ cp treatB/*formatted.txt .
  7. Considering that there may be a lot of input files for multi-condition comparison, users may want to process the bam files and the sorting first. For each *Aligned.out_sorted.bam file, do the following (using ctrl1Aligned.out_sorted.bam as an example):

    $ bedtools bamtobed -i ctrl1Aligned.out_sorted.bam > ctrl1Aligned.out.bed
    $ sort -k1,1 -k2,2n ctrl1Aligned.out.bed > ctrl1Aligned.out.sorted.bed
  8. Run JUM_A_multi_1.sh under the main directory:

    $ bash /user/home/JUM_2.0.2/JUM_A_multi_1.sh --Folder "directory" --JuncThreshold "junction_read_threshold" --fileNum_threshold "file_number" --IRthreshold "IR_read_threshold" --Readlength "read_length" --Thread "thread_num"
      #--Folder: path of the downloaded JUM package.
      #--JuncThreshold: as in step 5
      #--fileNum_threshold: as fileNum in step 5.  In the situation when this parameter is different for different conditions, choose the biggest fileNum in step 5.
      #--IRthreshold - IR filter: JUM will filter for IR events that have more than this # of unique reads mapped to the upstream exon-intron and downstream intron-exon boundaries in as potential true IR events
      #--Readlength: the length of the RNA-seq reads
      #--Thread: number of threads for multi-threading processing of sam/bam files

    for example:

    bash /user/home/JUM_2.0.2/JUM_A_multi_1.sh --Folder /user/home/JUM_2.0.2 --JuncThreshold 5 --fileNum_threshold 2 --IRthreshold 5 --Readlength 100 --Thread 3
  9. Create subdirectories for each condition for downstream intron retention analysis. For example, control_IR, treatA_IR, and treatB_IR:

    $ mkdir control_IR
    $ mkdir treatA_IR
    $ mkdir treatB_IR
  10. Copy files with suffix junction_counts_combined_intron_retention_event_list.txt from the current directory to the corresponding condition subdirectories, respectively:

    $ cp ctrl*junction_counts_combined_intron_retention_event_list.txt control_IR/
    $ cp treatA*junction_counts_combined_intron_retention_event_list.txt treatA_IR/
    $ cp treatB*junction_counts_combined_intron_retention_event_list.txt treatB_IR/
  11. In each of the subdirectories, run the following perl script as follows and then return to the current/main directory:

    $ bash /user/home/JUM_2.0.2/Identify_intron_retention_event_exist_in_all_samples.pl *junction_counts_combined_intron_retention_event_list.txt "combined_file_name" "file_number"
    #"combined_file_name": customized output name for that condition, see an example below
    #"file_number": same as fileNum in step 5 for each condition, respectively.

    for example:

    $ cd control_IR
    $ perl /user/home/JUM_2.0.2/Identify_intron_retention_event_exist_in_all_samples.pl *junction_counts_combined_intron_retention_event_list.txt ctrl_junction_counts_intron_retention_in_all_samples_list.txt 2
    $ cd ..
    $ cd treatA_IR
    $ perl /user/home/JUM_2.0.2/Identify_intron_retention_event_exist_in_all_samples.pl *junction_counts_combined_intron_retention_event_list.txt treatA_junction_counts_intron_retention_in_all_samples_list.txt 2
    $ cd ..
    $ cd treatB_IR
    $ perl /user/home/JUM_2.0.2/Identify_intron_retention_event_exist_in_all_samples.pl *junction_counts_combined_intron_retention_event_list.txt treatB_junction_counts_intron_retention_in_all_samples_list.txt 2
    $ cd ..
  12. Combine output files from the previous step as follows:

    $ cat control_IR/control_junction_counts_intron_retention_in_all_samples_list.txt treatA_IR/treatA_junction_counts_intron_retention_in_all_samples_list.txt treatB_IR/treatB_junction_counts_intron_retention_in_all_samples_list.txt | sort -u > All_junction_counts_intron_retention_in_all_samples_sorted_list.txt
    
  13. Run JUM_A_multi_2.sh under the main directory:

    $ bash /user/home/JUM_2.0.2/JUM_A_multi_2.sh --Folder "directory" --JuncThreshold "junction_read_threshold" --fileNum_threshold "file_number" --IRthreshold "IR_read_threshold" --Readlength "read_length" --Thread "thread_num"
      #--Folder: path of the downloaded JUM package.
      #--JuncThreshold: as in step 5
      #--fileNum_threshold: as fileNum in step 5.  In the situation when this parameter is different for different conditions, choose the biggest fileNum in step 5.
      #--IRthreshold - IR filter: JUM will filter for IR events that have more than this # of unique reads mapped to the upstream exon-intron and downstream intron-exon boundaries in as potential true IR events
      #--Readlength: the length of the RNA-seq reads
      #--Thread: number of threads for multi-threading processing of sam/bam files

    for example:

    bash /user/home/JUM_2.0.2/JUM_A_multi_2.sh --Folder /user/home/JUM_2.0.2 --JuncThreshold 5 --fileNum_threshold 2 --IRthreshold 5 --Readlength 100 --Thread 3

    NOTES

    • Step 13 will generate a new folder called JUM_diff/ in the main directory with the results.
  14. Enter the JUM_diff/ folder. Users now need to re-distribute the files in JUM_diff/ into subdirectories for comparing different treatment conditions with the control condition (or multiple other times points with the time 0 point). Users can create ctrl_vs_treatA_JUM_diff/ and ctrl_vs_treatB_JUM_diff, each subdirectory should contain input files for the specific condition and the control condition under comparison. Take the subdirectory ctrl_vs_treatA_JUM_diff/ for example, it should contain the following files for treatA and control conditions:

    UNION_junc_coor_with_junction_ID_more_than_5_read_in_at_least_2_samples.txt
    more_than_5_profiled_total_AS_event_junction_first_processing_for_JUM_reference_building.txt
    combined_AS_JUM.gff
    ctrl1_combined_count.txt
    ctrl2_combined_count.txt
    ctrl3_combined_count.txt
    treatA_1_combined_count.txt
    treatA_2_combined_count.txt
    treatA_3_combined_count.txt
    ctrl1Aligned.out_coverage.bed
    ctrl2Aligned.out_coverage.bed
    ctrl3Aligned.out_coverage.bed
    treatA_1Aligned.out_coverage.bed
    treatA_2Aligned.out_coverage.bed
    treatA_3Aligned.out_coverage.bed
    experiment_design.txt

    An example experiment_design.txt file is shown below:

          condition
    ctrl1 control
    ctrl2 control
    ctrl3 control
    treatA_1 treatA
    treatA_2 treatA
    treatA_3 treatA

    NOTES

    • It is important to make sure that in the experiment_design.txt file the sample naming and condition naming are in the same alphabetic order. For example, here control samples (ctrl1,2,3) all start with "c" so they are alphabatically before the treatment samples (treatA_1,2,3) that all start with "t"; accordingly, the condition name "control" for control samples is also alphabatically before the condition name "treatment" for treatment samples.
    • R_script_JUM.R will output a file called AS_differential.txt. Make sure that Step 2 successfully generates a new file called AS_differential.txt in the ctrl_vs_treatA_JUM_diff/ folder. Otherwise you can refer to the files outputFile.Rout and errorFile.Rout for troubleshooting.

Then run the R script under the current ctrl_vs_treatA_JUM_diff/ folder:

$ cd ctrl_vs_treatA_JUM_diff/
$ Rscript /user/home/JUM_2.0.2/R_script_JUM.R experiment_design.txt > outputFile.Rout 2> errorFile.Rout

Perform the same steps under each subdirectory (ctrl_vs_treatA_JUM_diff/, ctrl_vs_treatB_JUM_diff), respectively, to get the splicing changes in each treatment condition compared to the common control condition.

  1. Now run JUM_B.sh in each ctrl_vs_treatA_JUM_diff/, ctrl_vs_treatB_JUM_diff subdirectory, respectively, as follows:

    $ bash /user/home/JUM_2.0.2/JUM_B.sh --Folder "directory" --Test "pvalue|adjusted_pvalue" --Cutoff "stat_threshold" --TotalFileNum "total#samples" --Condition1_fileNum_threshold "thre_file_num_1" --Condition2_fileNum_threshold "thre_file_num_2" --Condition1SampleName "condition_1sample" --Condition2SampleName "condition_2sample" 
      #--Folder: path of the downloaded JUM package.
      #--Test - choice of statistical measure for significance test: type "pvalue" or "adjusted_pvalue".
      #--Cutoff: a number, threshold for statistical cutoff.
      #--TotalFileNum: the number of total samples from all conditions adding together.
      #--Condition1_fileNum_threshold: same as specified in JUM_A.sh.
      #--Condition2_fileNum_threshold: same as specified in JUM_A.sh.
      #--Condition1SampleName: same as specified in JUM_A.sh.
      #--Condition2SampleName: same as specified in JUM_A.sh.

    for example:

    $ bash /user/home/JUM_2.0.2/JUM_B.sh --Folder /user/home/JUM_2.0.2 --Test pvalue --Cutoff 0.05 --TotalFileNum 6 --Condition1_fileNum_threshold 2 --Condition2_fileNum_threshold 2 --Condition1SampleName ctrl1,ctrl2,ctrl3 --Condition2SampleName treatA_1,treatA_2,treatA_3

    NOTES

    • We recommend the users run at least one round using pvalue 0.05 at this step. This is the most generous statistical setting to profile for significantly differentially spliced AS events and it will be handy to keep a version of this result around, especially when users are still experimenting with the optimal statistical cutoff. When in need of more strict statistical cutoffs, users can easily filter the pvalue 0.05 analysis results using simple commands of linux that searches for AS events satisfying more strict cutoffs, like adjusted pvalues and level of splicing changes, instead of running step 3 again.
    • If users would like to have an idea of the total number of AS events in the sample that can be detected by the sequencing depth at hand, they should run one round using pvalue 1. This setting will output every profiled AS events in all categories from the samples, be it changed between the two conditions or not. This will provide a complete atlas of all AS events detectable by the current sequencing depth.
    • For users that have multiple condition comparisons (e.g. control versus drug_treatment1 and drug_treatment2), it is also recommended that users run this step once with pvalue 1. This will facilitate comparisons of significantly changed AS events across different conditions versus control (e.g. changed AS events brought by drug_treatment1 compared to control versus changed AS events brought by drug_treatment2 compared to control). Users can easily filter the results for more strict statistical cutoffs with the final results run with pvalue 1. Also, JUM output all AS events with their coordinates as IDs. So it will be convenient to compare the results under each condition comparison.
    • JUM_B.sh will generate a new folder called FINAL_JUM_OUTPUT_$Test_$Cutoff that contains the results for downstream analysis.
  2. Run JUM_C.sh in the folder FINAL_JUM_OUTPUT_$Test_$Cutoff as follows:

    $ cd FINAL_JUM_OUTPUT_$Test_$Cutoff
    $ bash /user/home/JUM_2.0.2/JUM_C.sh --Folder "directory" --Test "pvalue|adjusted_pvalue" --Cutoff "stat_threshold" --TotalCondition1FileNum "total_sample_#_condition_1" --TotalCondition2FileNum "total_sample_#_condition_2" --REF "refFlat"
      #--Folder: path of the downloaded JUM package.
      #--Test: same as in running JUM_B.sh.
      #--Cutoff: same as in running JUM_B.sh.
      #--TotalCondition1FileNum: the total number of samples under condition 1 that is alphabetically listed first in the `experiment_design.txt` file.
      #--TotalCondition2FileNum: the total number of samples under condition 2 that is alphabetically listed second in the `experiment_design.txt` file.
      #--REF: a `refFlat.txt` file (a type of transcriptome annotation file, also called genePred format.   

    For example:

    $ bash /user/home/JUM_2.0.2/JUM_C.sh --Folder /user/home/JUM_2.0.2 --Test pvalue --Cutoff 0.05 --TotalCondition1FileNum 3 --TotalCondition2FileNum 3 --REF refFlat.txt

    NOTES

    • JUM_C.sh will output files with the suffix: *_final_simplified.txt and *_final_detailed.txt. These are the final output files from JUM.
    • The --REF refFlat.txt file should contain 11 columns that specify the following information respectively: 1-geneName 2-transcript_name 3-chrom 4-strand 5-txStart 6-txEnd 7-cdsStart 8-cdsEnd 9-exonCount 10-exonStarts 11-exonEnds). This file should be available from UCSC genome browser for different organisms and users can download it to the current working directory. If users only have other formats of annotation such as gff3 and gff files, users can easily convert these into the genePred format by binary scripts such as the gff3ToGenePred converter from the UCSC genome browser website: http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/. Users may want to double check the converted format and make re-arrangement of columns so that the final format matches the refFlat format. Note, JUM does NOT depend on any priori knowledge of annotation to perform AS analysis. This file here is for associating the final differential AS results from JUM to known genes for the convenience of users. If an AS event is not mapped to any known gene, it will be marked as "NONE" in the associated gene track.