Skip to content
This repository has been archived by the owner on Jan 3, 2018. It is now read-only.

Bioinformatics python capstone lesson #608

Open
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

rbeagrie
Copy link
Contributor

This lesson is intended to be a 1.5 hour demo that will sit at the end of a novice python bootcamp and show learners how to apply some of their new skills to bioinformatics. Currently work in progress, contributions/suggestions very welcome!

@mitar
Copy link

mitar commented Jul 22, 2014

cc @janezd, @BlazZupan

@shaunagm
Copy link

What kind of feedback and suggestions are you looking for?

Also, what is the audience for the lesson? Presumably they won't all be genomics experts. It seems useful to provide some domain context and explanation for the example.

@gvwilson
Copy link
Contributor

In this case we're assuming they are knowledgeable about genomics, but
not about programming (or using software generally). Feedback on pace,
organization, and intelligibility is most desired - the more specific
the better.

@drlabratory
Copy link
Contributor

I think some more explanation about the output of the various commands would help. The output of bedtools intersect loses me a bit. It's a hazard of bioinformatics: each program has its own output. I _think_ I know what the output means, but it would be better if the instructor reminded me.

Do exons and CpG islands overlap significantly?
===============================================

OK, now we've explored the capabilities of BedTools a little, let's use it to answer a biological question. Let's try to answer if CpG islands and exons overlap significantly - that is, do the overlap more than we would expect by random chance?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: "the overlap" -> "they overlap".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, nice! Here is the biological question I was looking for up front! I think it is worth repeating - keeping this here but also introducing the Q. in the intro.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh also - BedTools is camel case here but not elsewhere.

@rbeagrie
Copy link
Contributor Author

OK this is up to date with development as of now. Reviews very welcome!

Forking the github repository
=============================

The data and code that you need to work on are located at [](git@github.com:rbeagrie/bedtools-example.git). You are going to want to make changes to the code, but this repository is owned by someone else. Instead of creating a new project, you want to [fork](../../gloss.html#fork) it; i.e., clone it on GitHub. You can do this using the GitHub web interface:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

link should be bedtools-example.git
And fork button link is broken as well

~~~
{:class="out"}

Again, we see that the order of the bed files is important. We get an output line for every feature in `data/cpg.bed`, even if there is no overlap with `data/exons.bed`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point This seems really important. Maybe worth reversing the procedure to show that results are not the same in both directions?


So we know what the overlap is between CpG islands and exons. How do we know whether this is significant? How do we know what we would expect by random chance?

BedTools includes a command called shuffle which will randomize one of our input files. Let's try randomizing the CpG islands:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add in a quick definition of bootstrapping? It can be kind of a buzzword, so it is fun for folks to know they are doing it!
e.g. "We are able to bootstrap our results by randomizing our actual data to generate a null distribution of CpG islands across chromosomal locations. We can use this distribution to estimate how likely we would be to get these this extent of overlap by chance"
This kind of shuffle your data and see what happens approach comes up often, and I think it would be good to do some generalizing here, before the specifics of doing it for this data.

@rbeagrie
Copy link
Contributor Author

Just to clarify for anyone coming back to this PR, it's not ready for merging and needs some more work (which I'm planning to do after the Cambridge SWC bootcamp at the end of August). In the mean time, any more comments or contributions on the work so far would be very welcome.

@gvwilson
Copy link
Contributor

@rbeagrie is this one good to merge? We'd like to clear things before #759.

@rbeagrie
Copy link
Contributor Author

No, unfortunately this is not ready for merging. It needs maybe another half day of work and I've been struggling to find the time recently. Is there anything I can do to help smooth over #759? Should this be in it's own repo for example?

@gvwilson
Copy link
Contributor

I think it's self-contained enough that you'll be able to redirect it to
a new repo after the break, so shall we close the PR for now and you can
resubmit in a couple of weeks?

@rbeagrie
Copy link
Contributor Author

Yep, sounds good to me!

@gvwilson gvwilson assigned gvwilson and unassigned abostroem Aug 4, 2016
@gvwilson gvwilson removed their assignment Apr 26, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants