# First Python Notebook: Stanford Fall 2022 edition

An guide to analyzing money in politics with the [Python](https://www.python.org/) programming language and a [Jupyter](https://jupyter.org/) notebook
   
By [Ben Welsh](https://palewi.re/who-is-ben-welsh/)

First developed in 2016, ["First Python Notebook"](https://palewi.re/docs/first-python-notebook/) is a tutorial that guides students through a data-driven investigation of money in California politics. It is most commonly taught as a six-hour, in-person class. This document is an updated, and abbreviated, spin-off for Stanford students in the fall 2022 semester

You will learn just enough of the Python computer programming language to work with the [pandas](https://pandas.pydata.org/) library, a popular open-source tool for analyzing data. The course will teach you how to read, filter, join, group, aggregate and rank structured data by developing an investigation of the money raised to campaign for ballot measures in the upcoming November election.

The class, which we will code together live below, walks through [the standard "First Python Notebook" tutorial](https://palewi.re/docs/first-python-notebook/index.htmlhttps://palewi.re/docs/first-python-notebook/index.html), substituting in data files from the current campaign.

In [1]:
2+2

4

In [2]:
number = 2 + 3 + 3

In [3]:
number

8

In [5]:
import pandas as pd

In [18]:
my_list = [1, 3, 5, 7, 9, 3, 3, 3, 62, 2.5]

In [19]:
my_series = pd.Series(my_list)

In [20]:
my_series.sum()

98.5

In [21]:
my_series.max()

62.0

In [22]:
my_series.min()

1.0

In [23]:
my_series.mean()

9.85

In [24]:
my_series.median()

3.0

In [25]:
my_series.std()

18.47528378973138

In [26]:
my_series.describe()

count    10.000000
mean      9.850000
std      18.475284
min       1.000000
25%       3.000000
50%       3.000000
75%       6.500000
max      62.000000
dtype: float64

In [28]:
committee_list = pd.read_csv("committees.csv")

In [30]:
committee_list.head() # first five rows

Unnamed: 0,filer_id,name,proposition,position
0,1451868,WOMEN FOR REPRODUCTIVE FACTS- NO ON PROP 1,1,oppose
1,1449991,"CALIFORNIA TOGETHER, NO ON PROPOSITION 1",1,oppose
2,1357909,ATKINS BALLOT MEASURE COMMITTEE; YES ON PROPOS...,1,support
3,1452181,PLANNED PARENTHOOD ADVOCACY PROJECT LOS ANGELE...,1,support
4,1425966,NO ON THE GAMBLING POWER GRAB: A COMMITTEE OF ...,26,oppose


In [31]:
committee_list.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 21 entries, 0 to 20
Data columns (total 4 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   filer_id     21 non-null     int64 
 1   name         21 non-null     object
 2   proposition  21 non-null     int64 
 3   position     21 non-null     object
dtypes: int64(2), object(2)
memory usage: 572.0+ bytes


In [36]:
committee_list.position # dataframe to series

0      oppose
1      oppose
2     support
3     support
4      oppose
5     support
6      oppose
7     support
8      oppose
9      oppose
10    support
11    support
12    support
13     oppose
14     oppose
15     oppose
16    support
17    support
18    support
19    support
20     oppose
Name: position, dtype: object

In [37]:
committee_list.position.value_counts()

support    11
oppose     10
Name: position, dtype: int64

In [38]:
committee_list.proposition.value_counts()

30    5
1     4
26    3
27    3
29    3
31    2
28    1
Name: proposition, dtype: int64