Skip to content

hsharrison/open-source-science-workshop

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Open-Source Science Workshop

Summer 2014, University of Connecticut


Thanks to software carpentry for ideas and general inspiration. I'm using their bootcamps as a model.


This workshop will teach computing skills that will increase your productivity as a scientist. Some inspiration is taken from the open-source software community toward the goals of open science and reproducible research. The core topics will be the Python programming languge and general best practices. Secondary topics will be

  • version control using Git and Mercurial,
  • the Unix shell, and
  • document generation using Markdown and LaTeX.

Motivation

Us scientists need a certain amount of computer literacy in order to collect, analyze, and visualize our data. We often learn how to do one specific thing at a time, when it comes up (e.g., how to do FFT in Matlab). Because most of us are self-taught, we never learn general software-development principles, such as how to organize code for easy reuse. As a result, scientists often waste time due to duplicated effort. We continually run into problems that have already been solved by software developers.

In particular, we have a lot to learn from the open-source software community, and not just about programming. The open-source software development process is similar to how we believe science should work. It is completely transparent, with all the failed attempts, design decisions, and even the communication between developers publicly visible. In science, on the other hand, 99% of the time we only see the final product. In the face of a questionable finding, a replication can be attempted, but the actual data collection and analysis process cannot be closely scrutinized.

With programming skills, scientists can not only be more productive but also take on a more reliable workflow. If we learn from the open-source community, this can have the welcome side effects of making our workflows more visible, maintainable, and reproducible.

The goal is for participants to be able to prevent these kinds of situations:

  • Inability to understand code written two months earlier.
  • Documents with filenames like manuscript_revisions_v3_comments (2).
  • Inability to reproduce a result due to change in software or loss of the exact steps taken.
  • A programming workflow dependent on copy-paste, code templates, or commenting and uncommenting.
  • Reluctance to make a small analysis change because of how many steps will need to be re-done to recreate the manuscript.

For more along these lines, and a preview of some specific points, see Wilson et al. (2014), Best practices for scientific computing.

(After writing this I found this which makes the same points more eloquently.)

Schedule of topics

Each session will be three hours, including a break. The material will be hands-on whenever possible.

  1. The Unix shell, version control.
  2. Python basics, built-in data structures.
  3. Iterators and generators, object-oriented programming.
  4. Functional programming, array programming.
  5. The scientific Python ecosystem.
  6. Data analysis.
  7. Defensive programming, development strategies.
  8. Figures and visualizations, automating experiments.
  9. Dynamic documents, makefiles.
  10. Overflow session.

I am open to feedback on this list and may modify it as we go depending on everyone's priorities and interests.

Setup

Ideally everyone will come prepared with everything already installed on our laptops before the first day. Please follow the instructions at software-carpentry.org, with the following revisions:

  • We will not be using SQL, skip that part.
  • Windows users: the software-carpentry page links to an older version of Git for Windows. Get the latest version here.
  • Create an account at Bitbucket (sign up with your .edu email address and you will automatically get the academic plan). Follow Bitbucket's guide to setting up Git and Mercurial. This goes a bit beyond the software-carpentry guide: it also directs you to install Mercurial, and links both Git and Mercurial to your Bitbucket account.
  • Optionally, install PyCharm Community Edition. It's the IDE I use for Python, I can't recommend it enough, and you may find it advantageous to use the same tools as I do. However, any editor will do, including Spyder which will be installed with Anaconda if you follow the software-carpentry guide.
  • You will still want a lightweight text editor. I recommend Sublime Text 3 on all platforms, though it doesn't really matter.
  • Toward the end of the workshop, we will use Pandoc for document generation. Also follow the instructions on that link for installing LaTeX on your platform.

Please let me know if you run into problems getting set up.