Skip to content

openASO is a project designed to identify regulatory regions of an RNA that can be targeted by antisense oligonucleotides.

License

Notifications You must be signed in to change notification settings

lackeylela/openASO

Repository files navigation

openASO

Design effective antisense oligonucleotides

Can machine learning be used to identify aspects of RNA targets that make antisense oligonucleotide (ASOs) more effective? This project seeks to develop datasets and tools to understand important features of effective ASO targets to improve ASO design for experimental and clinical applications!

Antisense oligonucleotides (ASOs) have a high potential therapeutic value as they can be used to target any transcript and regulate its expression. One factor in choosing an effective ASO is identifying an open, unstructured region of the target RNA. However, RNA is a flexible molecule best represented by an ensemble of structures. Does using ensemble modeling to identify open regions within the target RNA promote selection of more suitable ASO sequences? To answer this question we will use machine learning to ask whether ensemble guided structure models perform better than traditional minimum free energy structure using publicly available datasets of ASOs and their targeting efficiency. Other known attributes that affect ASO targeting will be incorporated in both algorithms including RNA binding protein data (eCLIP) data. Finally, we will build a visualization tool where users can input their gene of interest and map regions suitable for further ASO design and testing.

Scripts

Determine ASO targets from sequence alone We developed a script based on BLAST to identify the top gene target for ASOs with no known gene target information.

Determine ASO binding site to a transcript We developed an R script to take a list of ASOs and their target gene and identify where the ASO hybridizes to a reduced set of transcripts (MANE).

Determine transcript binding site genomic coordinates We developed an R script to take transcript ASO binding information and identify the genomic coordinates to link to other datasources (eCLIP, etc).

Feature generation We developed several R scripts to identify features of ASO binding sites including where the ASO binds in the transcript (UTR vs coding), GC content of the ASO, RNA structure of the target RNA and protein binding site overlap.

Team members

Lela Lackey was the team lead and coordinator. Team members worked in three major groups including:

Data processing

Axel Hauduc, Ivan Jimenez, Jayashree Kumar, James Adler and Kimberly Wellman are members of the data processing group of team openASO. This group took ASO sequences and incomplete gene target information, and built a data frame from which new features could be analyzed. This entailed data clean up and standardization of the gene names to HGNC nomenclature. Transcript sequences of ASO targets were also included so that ASOs could be mapped to transcript coordinates as well as genome coordinates. Some of the features included in this data frame were secondary structure, RNA-binding protein occupancy, and frequency of SNP variation.

Machine Learning

Shawn Hsueh, Alex Sweeten and Tony Shen are members of the machine learning team of openASO. THey worked on the machine learning algorithm. They used sci-kit learn to develop a SVM Regression algorithm. This will be used on the final dataset including target RNA secondary structure, RNA-binding protein occupancy, and SNP variation.

Visualization

Sophia Shen, Owen Tsai, Helena Catena Sánchez and Kelly Wei are members of the visualization team of openASO. They made an interactive R Shiny table, where you can load the ASO data in tsv format and visualize it. There is an interactive table where you can select and filter by table attributes. We plan to expand this and show a representation of each transcript and where the different ASO bind and their features.

Data sources:

ENCODE – eCLIP data

Gene sequence and architecture resource

REDIportal – A-to-I editing sites

Genomic variation

McQuisten KA, Peek AS. Identification of sequence motifs significantly associated with antisense activity. BMC Bioinformatics. 2007;8:184. doi:10.1186/1471-2105-8-184 PubMed

About

openASO is a project designed to identify regulatory regions of an RNA that can be targeted by antisense oligonucleotides.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages