Skip to content
No description, website, or topics provided.
Python
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
rampage_alu
.gitignore
LICENSE
README.md
workflow.jpg

README.md

Pipeline to identify expressed Alu elements using RAMPAGE

A schematic flow shows the pipeline

workflow

Prerequisites

Softwares

Python libraries

Usage

Step 1:

Fetch proper read pairs and remove PCR dunplicates.

Usage: rm_pcr.py [options] <rampage>...

Options:
    -h --help                      Show help message.
    --version                      Show version.
    -p THREAD --thread=THREAD      Threads. [default: 5]
    -o OUTPUT --output=OUTPUT      Output directory. [default: rampage_peak]
    --min=MIN                      Minimum read counts. [default: 1]
  • Inputs: BAM files of RAMPAGE (<rampage>...)
  • Output: A output folder containing relevant files (-o OUTPUT)
    • rampage_plus_5end.bed: BED file of the 5' end of plus strand read pairs
    • rampage_plus_3read.bed: BED file of the 3' end of plus strand read pairs
    • rampage_minus_5end.bed: BED file of the 5' end of minus strand read pairs
    • rampage_minus_3read.bed: BED file of the 3' end of minus strand read pairs
    • rampage_link.bed: BED file linking the 5' and 3' ends of read pairs

Note:

  1. If there are multiple RAMPAGE BAM files (different replicates) derived from the same samples, you could simply list them afterwards.
  2. You could run with multiple threads using -p THREAD.
  3. You could set the minimum read filter for read pairs using --min=MIN.

Example:

rm_pcr.py -o rampage_peak rampage_rep1.bam rampage_rep2.bam

Step 2:

Call peaks using 5' end of RAMPAGE read pairs

Usage: call_peak.py [options] <rampagedir>

Options:
    -h --help                      Show help message.
    -v --version                   Show version.
    -l LENGTH                      Feature length for F-seq. [default: 30]
    --wig                          Create Wig files.
    -p PERCENT                     Retained percent of reads in resized peaks.
                                   [default: 0.95]
  • Input: the output folder created by rm_pcr.py
  • Output: rampage_peaks.txt under the input folder

Format of rampage_peaks.txt:

Field Description
Chrom Chromosome
Start Start of peak region
End End of peak region
Name peak
Score 0
Strand Strand of peak
Peak peak site
Height Height of peak site
Peak reads Reads of (peak site ± 2 bp)
Total Total reads of peak region
Start_Fseq Start of F-seq peak region
End_Fseq End of F-seq peak region
RPM RPM of peak region

Note:

  1. You could set feature length for F-seq peak calling using -l LENGTH.
  2. You could create wig files by setting --wig.
  3. You could run with multiple threads using -p THREAD.

Example:

call_peak.py rampage_peak

Step 3:

Calculate entropy for RAMPAGE peaks

Usage: entropy.py [options] <rampagedir>

Options:
    -h --help                      Show help message.
    --version                      Show version.
    -p THREAD --thread=THREAD      Threads. [default: 5]
  • Input: the output folder created by rm_pcr.py
  • Output: rampage_entropy.txt under the input folder

Format of rampage_entropy.txt:

The first thirteen columns of rampage_entropy.txt are the same as rampage_peaks.txt.

The additional two columns are listed below:

Field Description
Entropy Entropy of RAMPAGE peak
3' end 3' end of read pairs in peak

Note:

  1. You could run with multiple threads using -p THREAD.

Example:

entropy.py rampage_peak

Step 4:

Annotate expressed Alu elements

Usage: annotate_alu.py [options] -f ref (-a alu | -r rep) <rampagedir>

Options:
    -h --help                      Show help message.
    --version                      Show version.
    -f ref                         Gene annotations.
    -t type                        File type of gene annotations.
                                   [default: ref]
    --promoter region              Promoter region. [default: 250]
    -a alu                         Alu annotations (BED format).
    -r rep                         Repeatmasker annotations (RMSK format).
    --extend length                Alu extended length. [default: 50]
    --entropy entropy              Entropy cutoff. [default: 2.5]
    --span span                    Span cutoff. [default: 1000]
    --coverage coverage            Coverage cutoff. [default: 0.5]
    -o out                         Output file. [default: alu_peak.txt]
  • Input:
    • gene annotation file (-f ref)
    • Alu annotation file (-a alu or -r rep)
    • the output folder created by rm_pcr.py
  • Output: expressed Alu file (-o out)

Format of expressed Alu file:

The first fifteen columns are the same as rampage_entropy.txt.

The additional six columns are listed below:

Field Description
Chrom Chromosome of Alu
Start Start of Alu
End End of Alu
Name Name of Alu
Score 0
Strand Strand of Alu

Note:

  1. The the default format of gene annotation file is Gene Predictions and RefSeq Genes with Gene Names format. You could use gene annotation file in BED format by setting -t bed.
  2. If using Repeatmask annotation file, you could download them from UCSC.
  3. You could set the promoter region length using --promoter region.
  4. You could set the Alu annotation extension length using --extend length.
  5. You could set entropy cutoff using --entropy entropy.
  6. You could set RAMPAGE effective length cutoff using --span span.
  7. You could set Alu coverage cutoff using --coverage coverage.

Example:

annotate_alu.py -f ref.txt -a alu.bed -o alu_peak.txt rampage_peak

Citation

Zhang XO, Gingeras TR, Weng Z#. Genome-wide analysis of polymerase III-transcribed Alu elements suggests cell type-specific enhancer function. Genome Res. 2019, 29:1402-1414

License

Copyright (C) 2018-2019 Xiao-Ou Zhang. See the LICENSE file for license rights and limitations (MIT).

You can’t perform that action at this time.