Skip to content

pmagwene/Bio723

Repository files navigation

Biology 723: Statistical Computing for Biologists

Bio 723 (formerly Bio 313) is a course I offer at Duke University. The focus of this course is statistical computing for the biological sciences with an emphasis on common multivariate statistical methods and techniques for exploratory data analysis. A major goal of the course is to help graduate students in the biological sciences develop practical insights into methods that they are likely to encounter in their own research, and the potential advantages and pitfalls that come with their use.

In terms of mathematical perspectives, the course emphasize a geometric approach to understanding multivariate statistics. I try to help students develop an intuition for the geometry of vector spaces and discuss topics like correlation and regression in terms of angles between vectors, dot products, and projection.

In terms of practical computing, I use both R and Python in the course, sprinkled with a few tools from the Unix tool chest, and even a bit of LaTeX. I have chosen to use both R and Python because while R is a powerful statistical environment, with a wealth of available libraries, I don't think it's ideal as a general programming language and computing platform. Python has a clear syntax, a wider range of modules, and is more suitable as a "glue" language for building bioinformatics pipelines and such.

Starting in Fall 2011, I've put all the course materials under a Creative Commons License (see below) in a public repository on GitHub. Hosting the course materials on GitHub has a number of advantages from my perspective -- version control, a wiki, etc. I plan to implement updates and corrections to the course in 'real time'. For example, any bugs or mistakes in my examples, key commands left out of code listings, etc. will be corrected and commited during class sessions. My goal is to have the material in this repository always be the most up-to-date and correct version. As a consequence, I won't be providing any hard copies of hand-outs as they would inevitably be `out of date' before the end of every class session.

I also like the idea of making my course materials freely available for re-use and re-mixing. In the past several years I've had some very positive feedback from a number of postdocs and students who haven't taken the course but used the course materials found on my lab wiki for self-study. Moving the course to GitHub is my modest attempt at making the course materials more widely available for anyone who is interested.

Wiki

The Bio 723 Wiki is hosted on GitHub. Links to lecture slides, hands-on materials, data sets, and readings are all posted to the wiki site, so I recommend you navigate over there.

Syllabus

A PDF version of the course syllabus is available here. As with all syllabi, it's a coarse guide, not a set of marching instructions. It too will evolve over the semester.

Getting started

You can get started by installing the software tools, familiarizing yourself with the command-line, and configuring your computing environment. Links and instructions can be found on the Course Tools page.

License

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 United States License (CC BY-NC-SA 3.0).

![CC license logo](http://i.creativecommons.org/l/by-nc-sa/3.0/us/88x31.png)