Table of Contents
Date: December 16, 2015
Venue: Grand Conference Room, 12th Foor, 2525 West End Avenue
Instructor: Chris Fonnesbeck
The goal of this workshop is to very briefly introduce health policy researchers to the Python programming language, and illustrate how it can be used for data cleaning, exploration and analysis. Given the constraints of a two-hour time slot, we will try to use the first part of the workshop to provide an overview of programming in Python, including data structures, control flow and function-writing, while the second part will be an example-driven tour of Python's data analysis functionality, with links to additional external information for those interested in learning more about any given aspect of the material.
For participants who would like to follow along with the workshop as a hands-on tutorial, I have provided instructions below on how to install Python on your computer.
Part 1: Introduction to Python
Part 2: Data Analysis Using Pandas
If you wish to follow along with the tutorial materials on your own machine, you will need to download and install Python, and set it up with the appropriate data analysis packages. I recommend using Anaconda Python distribution because it is easy to get up and running, and comes bundled with most of the packages you will need.
To download the course materials, you can either clone this repository using git
, or if you are not familiar with git
, just download a zip file of the repository using the button near the upper right hand side of this page.
Go to the Anaconda download page and download the Python 3.5 installer for your operating system.
Double click the Anaconda installer. A user interface will pop up to guide you through the installation.
For details on installing Anaconda Python, see the official guide.
If you installed Anaconda successfully, then in a new terminal window you should have access to the conda
command. Try typing it in and running it; If you get a command not found
error, then the Anaconda directory is probably not in your PATH
. Please contact the tutorial instructor for assistance.
You can also try opening Python, by running the python
command from the terminal. You should get a Python prompt that looks something like this:
A really nice feature of using Anaconda for data analysis is that you can create isolated environments for particular projects you are working on. In the environment, you control exactly which packages (and which versions) are available, making it easier to replicate or share development environments between machines or people.
I have created a file called environment.yml
that contains instructions for creating an environment for this tutorial. The contents of that file are simply as follows:
name: healthpolicy
dependencies:
- python=3.5
- numpy
- pandas
- jupyter
- matplotlib
It is just a list of the packages we need, along with a name for the environment.
Move into the directory that contains the tutorial materials and run this conda
command to create this virtual environment:
conda env create
This will automatically fetch and install the packages from Anaconda's repository and set up the environment for you. If successful, you should see something like the following in your terminal:
Now you can enter your environment with this command:
source activate healthpolicy
Your command line prompt should look something like this:
(healthpolicy)user_name@machine_name:~$
To leave the environment, use this command:
source deactivate
The course materials are provided in the form of Jupyter notebooks. This is an interactive, web-based interface to Python that allows code, text and other supporting media to be integrated in a robust analytics environment. Having set up and activated your environment above, you can run Jupyter from the command line as follows:
jupyter notebook
This will automatically open your web browser to a list of available files in the project directory, including the notebooks containing the workshop materials, in files with a .ipynb
extension. Clicking on any of these files will open the notebook and allow you to work with them interactively.