This repository contains the code for training and running APARENT2, a deep residual neural network that can predict human 3' UTR Alternative Polyadenylation (APA) and cleavage magnitude at base-pair resolution. This is an updated model compared to the original APARENT.
Contact jlinder2 (at) stanford.edu for any questions about the model or data.
Web Prediction Tool
We have hosted a publicly accessible web application where users can predict APA isoform abundance and variant effects with APARENT2 and visualize the results.
The web prediction tool is located at https://apa.cs.washington.edu.
APARENT2 can be installed by cloning or forking the github repository:
git clone https://github.com/johli/aparent-resnet.git cd aparent-resnet python setup.py install
APARENT requires the following packages to be installed
- Python >= 3.6
- Tensorflow == 1.13.1
- Keras == 2.2.4
- h5py == 2.10.0
- Scipy >= 1.2.1
- Numpy >= 1.16.2
- Isolearn >= 0.2.0 (github)
- [Optional] Pandas >= 0.24.2
- [Optional] Matplotlib >= 3.1.1
Example: Scoring Variants with APARENT2
The following notebook demonstrates how to use the APARENT2 model to score polyadenylation signal variants: aparent2_score_variants.ipynb.
Genome-wide In-silico Saturation Mutagenesis (Human)
The following google drive folder contains in-silico saturation mutagenesis predictions for all polyadenylation signals found in PolyADB V3 (transcript-wide). The file 'aparent2_ism_scores_polyadb_v3.csv.gz' contains all data. The file 'aparent2_ism_scores_polyadb_v3_cutoff.csv.gz' contains only variants with more than 1.25-fold increase or decrease in isoform odds. The data columns 'delta_logodds' and 'delta_usage' contain variant isoform log odds ratios and isoform proportion differences (wrt. PolyADB measurements) for polyadenylation occurring anywhere +/- 100bp of the canonical cleavage site. The columns 'delta_logodds_narrow' and 'delta_usage_narrow' contains log odds ratios and proportion differences for cleaveage that occurs +0bp to +50bp immediately downstream of the canonical core hexamer motif. The data columns 'pas_position_hg19' and 'pas_position_hg38' indicate the start coordinate of the core hexamer.
The 3' UTR MPRA (the training data and the measured variant data) are available at the original APARENT GitHub. For reference, the below link will take you to the data repository.
The newest version of the data has been re-processed with the following additional improvements compared to the original published version:
- Exact cleavage positions have been mapped for the Alien1 Random MPRA Sublibrary.
- A 20 nt random barcode upstream of the USE in the Alien1 Sublibrary has been included in the sequence.
Note: The code for the updated data processing is located at the original APARENT GitHub, but was never published.
The following notebook scripts contain benchmark comparison results for various prediction tasks as well as other analyses performed in the paper.