Skip to content

A script used to calculate percent spliced-in (PSI) values from bulk RNA sequencing data.

Notifications You must be signed in to change notification settings

jalwillcox/Calculate-PSI

Repository files navigation

Calculate Percent Spliced-In (PSI)

This script compares the PSI between case and control samples for RNA-Seq data.

PSI is calculated on an individual basis and then averaged, NOT by comparing all cases merged vs. all controls merged.

If you just want to calculate PSI (no case/control comparison), label all samples as "case" or exclude the -c flag.

PSI Overview

Required Programs

This script uses the following programs; alternate versions may also work:

NOTE An old version of bedtools (v2.23.0) is necessary to produce the appropriately formatted output with coverageBed.

This program also uses some form of the following files:

The following Rscripts are included with this pipeline:

  • configure-psi-byIndiv.R combine individual PSI output and calculate average PSI across samples
  • comp-PSI.R compare cases to control (includes uncorrected p-values)
  • plot-psi.R generates a plot PSI plot for the transcript with the most exons, comparing cases and controls; the R package ggplot2 is required
  • plot-psi-noComp.R generates a plot for the transcript with the most exons, WITHOUT a comparison; the R package ggplot2 is required

Flags

Flag Argument Default Description
-b filename (required) - a file with col1=sample_id, col2=bam, and col2=case/control (example BAM list below)
-e "G"/"T" (required) - "G" to calculate for a gene or "T" to calculate for a transcript
-g gene/transcript ID (required) - gene/transcript of interest
-c no argument (optional) - compare cases and controls (must be specified in bamlist)
-f GTF file (optional) ./hg38.gencode.v27.primary_assembly.annotation.gtf the full path to the reference GTF file
-m temp directory (optional) ./ a temporary directory to store intermediate files
-n reference genome (optional) ./Homo_sapiens_assembly38.fasta the full path to the reference genome
-o string (optional) <GENE/TRANSCRIPT>_PSI the basename for output files
-p no argument (optional) - generate a plot of the PSI for the transcript with the highest exon count (requires ggplot2 R package)
-r integer (optional) 50 the length of the read. Use the shortest sample read length, longer reads are trimmed
-u no argument (optional) - Use unpaired reads
-x no argument (optional) - Remove output bam files after run (helpful if space is an issue)
-h no argument (optional) - print usage

Example Usage

For the TTN gene with NO comparison:

get-psi-byIndi-github.sh -e G -g TTN -b bamlist.txt -o TTN_PSI

For the MYH6-201 transcript comparing cases and controls and generating a plot:

get-psi-byIndi-github.sh -e T -g MYH6-201 -b bamlist.txt -cp -o MYH6-201_PSI

Example BAM List

sample1 /path/to/case1/bam/sample1.bam case
sample2 /path/to/case2/bam/sample2.bam case
sample3 /path/to/control1/bam/sample3.bam control
sample4 /path/to/control2/bam/sample4.bam control

Output

File Description
outname.fa a pseudo-genome extending 10kb on either side of the gene of interest
(can be loaded as genome into IGV)
outname.gff a GFF file for all exons used
(can be loaded into IGV with the outname.fa genome)
outname.gtf a GTF file for all exons used
(can be loaded into IGV with the outname.fa genome)
*.bam reads aligned to outname.fa
(1 BAM/sample; can be loaded into IGV with the outname.fa genome)
*junctions.bed a junctions file for each sample
(1 BED/sample; can be loaded into IGV with the outname.fa genome)
*-case.psi a PSI file for each case sample
(columns: exon ID, exon length, N included reads, N excluded reads, and PSI)
*-control.psi a PSI file for each control sample, requires -c flag
(columns: exon ID, exon length, N included reads, N excluded reads, and PSI)
gene.bed a bed file with the coordinates used to generate outname.fa
case-tot.psi a file containing the PSI for all cases along with the average
control-tot.psi a file containing the PSI for all controls along with the average, requires -c flag
psi-comparison.txt a file comparing cases and controls, requires -c flag
(columns: exon ID, average case PSI, average control PSI, exon, transcript, uncorrected t-test p-value)
*pdf a plot of the PSI, requires -p flag (see below for example)
outname-ref_coords.txt the exon coordinates relative to the reference genome (as opposed to outname.fa)

Example case/control plot (TTN-214):

get-psi-byIndi-github.sh -e T -g TTN-214 -b bamlist.txt -c -o TTN-214-comp_PSI -p

TTN-214-comp_PSI-out/TTN-214-comp_PSI-plot.pdf

Releases

No releases published

Packages

No packages published