-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Project 5: A framework to evaluate profiles from DNA-binding site collections represented in peak sequences from ChIP-Seq assays #6
Comments
We're planning to have a Docker image with a bunch of bioinformatics software preinstalled running on machines at the BC Cancer Agency Genome Sciences Centre during the Hackathon. Which bioinformatics software do you plant to use for your project? In particular, is there any software that you plan to use that is not already listed here? http://www.bcgsc.ca/services/orca |
Hi Shaun, Most of the tools I had in mind are either straightforward to install or already in the ORCA image. But just in case, here's a few extras I was thinking of using: MOODS - https://www.cs.helsinki.fi/group/pssmfind/ - PWM matching algorithms Also, MOODS requires a C++ compiler and probably the package python-dev (headers needed to build Python C extensions.) I see that MEME is listed in ORCA software. Is this the entire MEME Suite, or just the MEME motif discovery tool? The MEME Suite also includes bunch of relevant tools we may use, so I wouldn't mind having the suite installed, if possible. http://meme-suite.org/doc/download.html?man_type=web On the same page, there's the "Motif Databases" link which we'd probably need. They're just plain text files, but if you can also go ahead and pre-download them on the image. Let me know where they will sit on the filesystem! That's all I can think of right now. Thanks for looking into this! Cheers, |
Hi, Manuel. I believe MEME is the whole suite. I'll create a ticket to install MOODS. hackseq/October_2016#41 We'll download data/databases at the start of the Hackathon, unless they're unusually large and downloading is expected to be a delay to being productive, in which case we can look into download them in advance. |
That's great, thanks! It should be fine to download the databases the first day of the Hackathon. |
Project:
Transcription factor binding sites are important regulatory elements found upstream or downstream of a gene's transcription start site. These DNA-binding sites are non-exact, often represented by positional probabilities in a matrix, and also appear to have slightly different affinities across different ChIP-Seq assays. Here, we propose a framework to evaluate profiles from DNA-binding site collections (JASPAR, HocoMoco, UniPROBE, Jolma et al., TRANSFAC) versus what is found in peaks called from ChIP-Seq assays. The input is a position-weight matrix (PWM) representing the DNA profile for a given binding site of interest. The first part would automatically query the ENCODE project's API for experiments targeting the appropriate gene for the profile. The sequence at the respective peaks would be extracted for scanning using the PWM. The goal is to find how well the PWM agrees with what's found in experimental data. The output would be a summary of the profile's representation across sequences, and statistics on the number of possible matches found per sequences. Depending on which experiments are queried, further aims can include:
Ideally, this project would be about 1.5-2.0 days of development, and 1-1.5 days of experimentation and attempt to answer questions using the project. Interesting skills for these projects would include: -Software development, scripting, object-oriented programming, REST APIs.
Project Lead: Manuel Belmadani / @mbelmadani / Industry Professional / University of British Columbia
The text was updated successfully, but these errors were encountered: