Project 5: A framework to evaluate profiles from DNA-binding site collections represented in peak sequences from ChIP-Seq assays #6

ttimbers · 2016-06-13T15:09:45Z

Project:
Transcription factor binding sites are important regulatory elements found upstream or downstream of a gene's transcription start site. These DNA-binding sites are non-exact, often represented by positional probabilities in a matrix, and also appear to have slightly different affinities across different ChIP-Seq assays. Here, we propose a framework to evaluate profiles from DNA-binding site collections (JASPAR, HocoMoco, UniPROBE, Jolma et al., TRANSFAC) versus what is found in peaks called from ChIP-Seq assays. The input is a position-weight matrix (PWM) representing the DNA profile for a given binding site of interest. The first part would automatically query the ENCODE project's API for experiments targeting the appropriate gene for the profile. The sequence at the respective peaks would be extracted for scanning using the PWM. The goal is to find how well the PWM agrees with what's found in experimental data. The output would be a summary of the profile's representation across sequences, and statistics on the number of possible matches found per sequences. Depending on which experiments are queried, further aims can include:

Comparing the profiles from alternative databases and versions to identify the most accurate representations per experiments.
Determine whether a database better represents a given organism's binding site (Mouse or Human).
Using the same approach, identify profiles for binding sites not targeted by the experiment but also frequently located on the peaks.

Ideally, this project would be about 1.5-2.0 days of development, and 1-1.5 days of experimentation and attempt to answer questions using the project. Interesting skills for these projects would include: -Software development, scripting, object-oriented programming, REST APIs.

Experience with transcription factor binding sites, motif discovery.
Prior research with transcription factors and co-factor interactions.

Project Lead: Manuel Belmadani / @mbelmadani / Industry Professional / University of British Columbia

sjackman · 2016-06-24T22:02:14Z

We're planning to have a Docker image with a bunch of bioinformatics software preinstalled running on machines at the BC Cancer Agency Genome Sciences Centre during the Hackathon. Which bioinformatics software do you plant to use for your project? In particular, is there any software that you plan to use that is not already listed here? http://www.bcgsc.ca/services/orca

mbelmadani · 2016-06-27T21:45:13Z

Hi Shaun,

Most of the tools I had in mind are either straightforward to install or already in the ORCA image. But just in case, here's a few extras I was thinking of using:

MOODS - https://www.cs.helsinki.fi/group/pssmfind/ - PWM matching algorithms

Also, MOODS requires a C++ compiler and probably the package python-dev (headers needed to build Python C extensions.)

I see that MEME is listed in ORCA software. Is this the entire MEME Suite, or just the MEME motif discovery tool? The MEME Suite also includes bunch of relevant tools we may use, so I wouldn't mind having the suite installed, if possible. http://meme-suite.org/doc/download.html?man_type=web

On the same page, there's the "Motif Databases" link which we'd probably need. They're just plain text files, but if you can also go ahead and pre-download them on the image. Let me know where they will sit on the filesystem!

That's all I can think of right now. Thanks for looking into this!

Cheers,

sjackman · 2016-07-01T01:57:22Z

Hi, Manuel. I believe MEME is the whole suite.
http://meme-suite.org/meme-software/4.10.1/meme_4.10.1_3.tar.gz
It's installed using Homebrew/Linuxbrew brew install homebrew/science/meme
See http://brew.sh and http://linuxbrew.sh

I'll create a ticket to install MOODS. hackseq/October_2016#41

We'll download data/databases at the start of the Hackathon, unless they're unusually large and downloading is expected to be a delay to being productive, in which case we can look into download them in advance.

mbelmadani · 2016-07-11T06:16:15Z

That's great, thanks!

It should be fine to download the databases the first day of the Hackathon.

mbelmadani · 2016-07-20T21:12:54Z

I'll start listing some papers that might provide context for this project. I'm not expecting anyone to read all of this, but definitely some of the ideas introduced in these can be relevant. I'll also update/edit this post with more content as I go.

Background:

Potentially useful ideas from the literature:

Reliable scaling of position weight matrices for binding strength comparisons between transcription factors
The orientation of transcription factor binding site motifs in gene promoter regions: does it matter?
Syntax compensates for poor binding sites to encode tissue specificity of developmental enhancers
Systematic discovery and characterization of regulatory motifs in ENCODE TF binding experiments
Variation in transcription factor binding among humans
A systematic, large-scale comparison of transcription factor binding site models

Brittdrog changed the title ~~Develop aframework to evaluate profiles from DNA-binding site collections versus what is found in peaks called from ChIP-Seq assays~~ A framework to evaluate profiles from DNA-binding site collections represented in peak sequences from ChIP-Seq assays Jun 13, 2016

sjackman mentioned this issue Jul 1, 2016

Install requested software hackseq/October_2016#41

Closed

2 tasks

sjackman changed the title ~~A framework to evaluate profiles from DNA-binding site collections represented in peak sequences from ChIP-Seq assays~~ Project 5: A framework to evaluate profiles from DNA-binding site collections represented in peak sequences from ChIP-Seq assays Aug 28, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Project 5: A framework to evaluate profiles from DNA-binding site collections represented in peak sequences from ChIP-Seq assays #6

Project 5: A framework to evaluate profiles from DNA-binding site collections represented in peak sequences from ChIP-Seq assays #6

ttimbers commented Jun 13, 2016

sjackman commented Jun 24, 2016

mbelmadani commented Jun 27, 2016

sjackman commented Jul 1, 2016 •

edited

Loading

mbelmadani commented Jul 11, 2016 •

edited

Loading

mbelmadani commented Jul 20, 2016 •

edited

Loading

Project 5: A framework to evaluate profiles from DNA-binding site collections represented in peak sequences from ChIP-Seq assays #6

Project 5: A framework to evaluate profiles from DNA-binding site collections represented in peak sequences from ChIP-Seq assays #6

Comments

ttimbers commented Jun 13, 2016

sjackman commented Jun 24, 2016

mbelmadani commented Jun 27, 2016

sjackman commented Jul 1, 2016 • edited Loading

mbelmadani commented Jul 11, 2016 • edited Loading

mbelmadani commented Jul 20, 2016 • edited Loading

sjackman commented Jul 1, 2016 •

edited

Loading

mbelmadani commented Jul 11, 2016 •

edited

Loading

mbelmadani commented Jul 20, 2016 •

edited

Loading