PB_FLIP

PacBio Fusion and Long Isoform Pipeline (PB_FLIP).

Overview

PacBio Fusion and Long Isoform Pipeline (PB_FLIP) incorporates a suite of RNA-Seq software analysis tools and scripts to identify expressed gene fusion partners and isoforms.

Software Dependancies

Python 3.7
Snakemake 6.9.1
cDNA_Cupcake 28.0.0
SQANTI3 4.2
Salmon 0.14.1
STAR 2.6.1
minimap2 2.22
picard 2.26.1
pbsv 2.6.2
pbmm2 1.4.0
pbbam 1.6.0
bam2fastx 1.3
SnpEff 5.0

Required External Databases

FusionHubDB
DisGeNETDB 7.0
- Download curated_gene_disease_associations.tsv from the above link.
isoannotlitegff3
- Download Homo_sapiens_GRCh38_Ensembl_86.zip or Mus_musculus_GRCm38_Ensembl_86.zip
STAR Genome Index
- Provide STAR index folder for short-reads junction support
- Human and Mouse References can be downloaded from Gencode. The pipeline was tested with Human Genome release version 38.

The absolute paths to these 4 files should be added to config/case.yml

DISGENET:
  /data/pbflip/DisGeNET/curated_gene_disease_associations.tsv

FUSIONHUBDB:
  /data/pbflip/FusionDatabase/Fusionhub_global_summary.txt

REFERENCES:
    genome: /data/pbflip/isoseq_db/genomes/hg38.fa
    annotation: /data/pbflip/isoseq_db/genomes/gencode.v32.annotation.gtf
    isoannotlitegff3: /data/pbflip/isoseq_db/Homo_sapiens_GRCh38_Ensembl_86.gff3

GENOMEINDEX:
  star_index: /data/pbflip/star_index

Required Other Files

TX2G:
  "/data/pbflip/isoseq_db/gencode.v32.annotation.tr2g_gtf.tsv"

To create gencode.v32.annotation.tr2g_gtf.tsv

grep -w "exon" gencode.v32_SIRVome_isoforms_ERCCs_longSIRVs_200709a_C_170612a.gtf \
        | cut -f9 | cut -f1,2,4 -d";" \
        | sed 's/gene_id //g' | sed 's/; transcript_id / /' \
        | sed 's/; gene_name / /'| uniq > gencode.v32.annotation.tr2g_gtf.tsv

Dependancies and Conda Environment Setup

Set up conda environment for the PB_FLIP pipeline

cd ~
mkdir apps
cd apps
wget https://repo.anaconda.com/miniconda/Miniconda3-py37_4.10.3-Linux-x86_64.sh
bash Miniconda3-py37_4.10.3-Linux-x86_64.sh

Follow the screen instruction to complete the installation.

Then, activate the base environment.

conda activate base

To install some of the python dependancies:

conda install -r environment.yml

At this point, you can follow the instruction in sandbox_installer.sh to install rest of the dependancies.

How to Run PBFLIP?

Clone the repository to your local machine

git clone https://github.com/nch-igm/PBFLIP
cd PBFLIP

Edit config/case.yml
To run the pipeline you can issue the following command. This will run the pipeline on 16 cpu threads.

snakemake -f -p -j 16 -c 16 --latency-wait 20

How to Run PB-FLIP Docker Container?

The PB_FLIP container was evaluated using an AWS sandbox environment (16 CPU, 128GB RAM and 500GB disk space).

Clone the repository to your local machine

git clone https://github.com/nch-igm/PBFLIP
cd PBFLIP

Edit config/case.yml
Create a folder called pbflip

mkdir pbflip

Download all the required databases to pbflip directory as described Required External Databases
Copy config folder to pbflipfolder

pbflip/
├── Brain_Reference_SIRV_4_C99_I95
├── config
├── DisGeNET
├── FusionDatabase
├── isoseq_db
└── star_index

To run docker container in your local machine you can issue the following command. This will run the pipeline on 18 cpu threads.

docker run -d -rm -v '$(pwd)/pbflip:/data/pbflip' \
            -e 'configfile=/data/pbflip/config/case.yml' \
            -e 'threads=18' \
            -e 'result_dir=/data/pbflip' \
            public.ecr.aws/nch-igm/pb-flip:public

The final results will be under $(pwd)/pbflip/working_dir

Inputs

Before you run PB_FLIP, you need to have the following input files from smrtlink analyses. These files are located in $SMRT_ROOT/userdata/jobs_root/0000/0000000/0000000002/outputs/.

cluster_report: cluster_report.csv

hq_transcripts: hq_isoforms.fasta

flnc: flnc.bam

Configuration File

CASENAME : A name for your project. This is your current working directory name

SMRTLINKFILES

version : Current pipeline only supports data generated from smrtlink version 10 or above

cluster_report : Path to Cluter report file generated through smrtlink analysis

hq_transcripts : Path to HQ transcripts generated through smrtlink analysis

flnc : Path to full-length Non-Concatemer bam file generated through smrtlink analysis

ILLUMINASHORTREADS : Full paths to short-reads, read 1 & 2, if available

REFERENCES

species : Species, currently Human (hs) and Mouse (mm) samples are supported

genome : Full path to genome file

annotation : Full path to annotation file

isoannotlitegff3 : Full path to IsoAnnotLite annotation file

COLLAPSEPARAM : cDNA_Cupcake/ToFU collapse_isoforms_by_sam.py parameters

FILTERBYCOUNTS : cDNA_Cupcake/ToFU filter_by_count.py parameters

PBSVCALLERPARAM : pbsv parameters

PBBAM : pbindex and bam2fastq

MAPPERS : Short and long reads mappers used in the pipeline

GENOMEINDEX : Path to Genome index for STAR aligner

TX2G : Transcripts to gene association file for your species.

PICARD : Full path to picard software

SNPEFF : Full paths to snpEFF.jar and SnpSift.jar files

ISOSEQSCRIPTS : A collection of scripts from cDNA_Cupcake/ToFU and SQANTI3

LIBPATHS : PYTHONPATH for you conda environment

DISGENET : Full path to DisGeNET file

FUSIONHUBDB : Full path to the file downloaded from FusionHUB

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
config		config
img		img
workflow/rules		workflow/rules
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
sandbox_installer.sh		sandbox_installer.sh
snakefile		snakefile

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PB_FLIP

PacBio Fusion and Long Isoform Pipeline (PB_FLIP).

Overview

Software Dependancies

Required External Databases

Required Other Files

Dependancies and Conda Environment Setup

How to Run PBFLIP?

How to Run PB-FLIP Docker Container?

Inputs

Configuration File

Output Folder Structure

Fusion Pipeline Output Folder Structure

Isoform Pipeline Output Folder Structure

About

Releases

Packages

Languages

License

nch-igm/PBFLIP

Folders and files

Latest commit

History

Repository files navigation

PB_FLIP

PacBio Fusion and Long Isoform Pipeline (PB_FLIP).

Inputs

Configuration File

Output Folder Structure

Fusion Pipeline Output Folder Structure

Isoform Pipeline Output Folder Structure

About

Resources

License

Stars

Watchers

Forks

Languages