# First Python Notebook: Scripting your way to the story

By Ben Welsh

A step-by-step guide to analyzing data with Python and the Jupyter Notebook.

This tutorial will teach you how to use computer programming tools to analyze data by exploring contributors to campaigns for and again Proposition 64, a ballot measure asking California voters to decide if recreational marijuana should be legalized.

This guide was developed by Ben Welsh for a [Oct. 2, 2016, "watchdog workshop" organized by Investigative Reporters and Editors](http://ire.org/events-and-training/event/2819/2841/) at San Diego State University's school of journalism.



## Prelude: Prequisites

Before you can begin, your computer needs the following tools installed and working to participate.

1. A [command-line interface](https://en.wikipedia.org/wiki/Command-line_interface) to interact with your computer
2. Version 2.7 of the [Python](http://python.org/download/releases/2.7.6/) programming language
3. The [pip](https://pip.pypa.io/en/latest/installing.html) package manager and [virtualenv](http://www.virtualenv.org/en/latest/) environment manager for Python

### Command-line interface

Unless something is wrong with your computer, there should be a way to open a window that lets you type in commands. Different operating systems give this tool slightly different names, but they all have some form of it, and there are alternative programs you can install as well.

On Windows you can find the command-line interface by opening the "command prompt." Here are instructions for [Windows 8](http://windows.microsoft.com/en-us/windows/command-prompt-faq#1TC=windows-8) and [earlier versions](http://windows.microsoft.com/en-us/windows-vista/open-a-command-prompt-window). On Apple computers, you open the ["Terminal" application](http://blog.teamtreehouse.com/introduction-to-the-mac-os-x-command-line). Ubuntu Linux comes with a program of the [same name](http://askubuntu.com/questions/38162/what-is-a-terminal-and-how-do-i-open-and-use-it).

### Python

If you are using Mac OSX or a common flavor of Linux, Python is probably already installed and you can test to see what version, if any, is there waiting for you by typing the following into your terminal.

```bash
python -V
```

If you don't have Python installed (a more likely fate for Windows users) try downloading and installing it from
[here](https://www.python.org/downloads/release/python-2712/).

In Windows, it's also crucial to make sure that the Python program is available on your system's ``PATH`` so it can be called from anywhere on the command line. [This screencast](http://showmedo.com/videotutorials/video?name=960000&fromSeriesID=96) can guide you through that process.

Python 2.7 is preferred but you can probably find a way to make most of this tutorial work with other versions if you futz a little.

### pip and virtualenv

The [pip package manager](https://pip.pypa.io/en/latest/) makes it easy to install open-source libraries that expand what you're able to do with Python. Later, we will use it to install everything needed to create a working web application.

If you don't have it already, you can get pip by following [these instructions](https://phttps://pip.pypa.io/en/latest/ip.pypa.io/en/latest/installing.html). In Windows, it's necessary to make sure that the Python ``Scripts`` directory is available on your system's ``PATH`` so it can be called from anywhere on the command line. [This screencast](http://showmedo.com/videotutorials/video?name=960000&fromSeriesID=96) can help.

Verify pip is installed with the following.

```bash
pip -V
```

The [virtualenv environment manager](http://www.virtualenv.org/en/latest/) makes it possible to create an isolated corner of your computer where all the different tools you use to build an application are sealed off.

It might not be obvious why you need this, but it quickly becomes important when you need to juggle different tools
for different projects on one computer. By developing your applications inside separate virtualenv environments, you can use different versions of the same third-party Python libraries without a conflict. You can also more easily recreate your project on another machine, handy when you want to copy your code to a server that publishes pages on the Internet.

You can check if virtualenv is installed with the following.

```bash
virtualenv --version
```

If you don't have it, install it with pip.

```bash
pip install virtualenv
# If you're on a Mac or Linux and get an error saying you lack permissions, try again as a superuser.
sudo pip install virtualenv
```

If that doesn't work, [try following this advice](http://virtualenv.readthedocs.org/en/latest/installation.html).

## Act 1: Hello Jupyter Notebook

Start by creating a new development environment with virtualenv in your terminal. Name it after our application.

```bash
virtualenv first-django-admin
```

Jump into the directory it created.

```bash
cd first-django-admin
```

Turn on the new virtualenv, which will instruct your terminal to only use those libraries installed
inside its sealed space. You only need to create the virtualenv once, but you'll need to repeat these
"activation" steps each time you return to working on this project.

```bash
# In Linux or Mac OSX try this...
. bin/activate
# In Windows it might take something more like...
cd Scripts
activate
cd ..
```

Use ``pip`` on the command line to install [Jupyter Notebook](http://jupyter.org/), an open-source tool for writing and sharing Python scripts.

```bash
pip install jupyter
```

Start up the notebook from your terminal.

```bash
jupyter notebook
```

That will open up a new tab in your default web browser that looks something like this:

![](http://jupyter.readthedocs.io/en/latest/_images/tryjupyter_file.png)

Click the "New" button in the upper right and create a new Python 2 notebook. 

## Act 2: Hello Python

In [None]:
# 2+2
# basic variable assignment foo+bar
# download the CSV of Prop. 64 data
# Read it into with open()
# for loop to print it out line by line

## Act 3: Hello agate 

In [None]:
# pip install agate
# open the file with agate
# print the table structure
# print the number of rows in the table

## Act 4: Hello analysis

In [None]:
# Sort by amount descending and print the top 10
# Sum up the total contribution amount
# Filter to support/oppose
# Sort and print top 10 for each
# Sum up the total contribution amount for each
# Group and count/sum by committee name
# Group and count/sum by the last_name field

## Act 5: Hello viz

In [None]:
# Install a charting library
# Bar chart of the top 10 contributors for each side
# Export the data to a CSV for your graphics department

## What is Proposition 64?

The use and sale of marijuana for recreational purposes is illegal in California. [Proposition 64](http://www.oag.ca.gov/system/files/initiatives/pdfs/15-0103%20%28Marijuana%29_1.pdf), scheduled to appear on the November 8 ballot, asks voters if it ought to be legalized. A "yes" vote would support legalization. A "no" vote would oppose it. A similar measure, [Proposition 19](http://articles.latimes.com/print/2010/nov/03/local/la-me-pot-20101103-1), was defeated in 2010.

[According to California's Secretary of State](http://www.sos.ca.gov/campaign-lobbying/cal-access-resources/measure-contributions/marijuana-legalization-initiative-statute/), more than 16 million dollars have been raised to campaign in support of Prop. 64 as of September 20. Just over 2 million has been raised to oppose it. 

### W

