Skip to content

Commit

Permalink
Cleaned up totals page a bit
Browse files Browse the repository at this point in the history
  • Loading branch information
palewire committed Feb 26, 2022
1 parent 19893fe commit 9ea3f2e
Show file tree
Hide file tree
Showing 2 changed files with 25 additions and 26 deletions.
2 changes: 1 addition & 1 deletion docs/src/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ dataframe
columns
filters
merge
totals/index
totals
sort_values/index
groupby/index
compute
Expand Down
49 changes: 24 additions & 25 deletions docs/src/totals/index.md → docs/src/totals.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,18 +11,25 @@ kernelspec:
name: python3
---

```{include} ../_templates/nav.html
```{include} ./_templates/nav.html
```

# Totals

In some ways, your database is no different from a human source. Getting a good story requires careful, thorough questioning. In this section we will move ahead by conducting an interview with pandas to pursue our quest of finding out the biggest donors to Proposition 64.
In some ways, your database is no different from a human source. Getting a good story requires careful, thorough questioning.

Using tricks we learned as far back as {doc}`chapter three </pandas/index>`, we can start off by answering a simple question: What is the total sum of Proposition 64 contributions that have been reported?
In this section we will use pandas to interview our data as we continue our quest to find out the biggest donors for and against Proposition 64.

## Summing a column
```{contents} Sections
:depth: 1
:local:
```

## Sum a column

To answer that let's start by getting our hands on `amount`, the column from the contributions DataFrame with the numbers in it. We can do that just as we did with other columns earlier.
Using tricks we learned as far back as [chapter two](pandas.md), we can start off by answering a simple question: What is the total sum of Proposition 64 contributions that have been reported?

To answer that let’s start by getting our hands on `amount`, the column from the contributions DataFrame with numbers in it. We can do that just as we did with other columns earlier.

```{code-cell}
:tags: [hide-cell]
Expand All @@ -39,17 +46,13 @@ merged_prop = merged_everything[merged_everything.prop_name == my_prop]
merged_prop.amount
```

Now we can add up the column's total using the pandas method [sum], just as we did when we were first getting started with pandas.
Now we can add up the column's total using the pandas method [sum](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.sum.html), just as we did when we were first getting started.

```{code-cell}
merged_prop.amount.sum()
```

And printed out below your cell, there's our answer.

We've completed our first piece of analysis and discovered the total amount spent on this proposition.

Time to run off to Twitter and publish our results to the world, right?
We've completed our first piece of analysis and discovered the total amount spent on this proposition. Time to run off to Twitter and publish our results to the world, right?

Wrong.

Expand All @@ -59,44 +62,43 @@ The total we generated is not the overall total raised in the campaign, and it i

Why?

In California, campaigns are [only required] to disclose the names of donors who give over \$100, so our data is missing all of the donors who gave less than that amount.
In California, campaigns are [only required](http://www.documentcloud.org/documents/2781363-460-2016-01.html#document/p10) to disclose the names of donors who give over \$100, so our data is missing all of the donors who gave less than that amount.

The cutoff varies, and there are some exceptions, but the same thing is true in other states and also at the federal level in races for Congress and the White House.

The overall totals are instead reported on cover sheets included with disclosure reports that lump together all the smaller contributions as part of a grand total. Those are the records most commonly cited to total up a campaign's fundraising.

The result is that an itemized list of contributions, like the one we have, cannot be used to calculate a grand total. That's true in California and virtually anywhere else you work with campaign data. Overlooking that limitation is a rookie mistake routinely made by analysts new to this field.

But that doesn't mean our data is worthless. We just have to use it responsibly. In many cases, professional campaign reporters will refer to an analysis drawn from a list like ours as applying only to "large donors."
But that doesn't mean our data are worthless. We just have to use our list responsibly. In many cases, professional campaign reporters will refer to an analysis like ours as applying only to "large donors."

Since large donors typically account for most of the money, the results are still significant. And the high level of detail included in each record — like the donor's name, employer and occupation — makes the limitations worth working through.

## Which side got more large donations?
## Which side raised more?

Adding up a big total is all well and good. But we're aiming for something more nuanced.

We want to separate the money spent supporting the proposition from the money opposing it. Then we want to find out who raised more.

To answer that question, let's return to the filtering technique we learned in {doc}`chapter seven </filter/index>`.
We want to separate the money spent supporting the proposition from the money opposing it. Then we want to find out which side raised more.

First let's look at the column we're going to filter by, `committee_position`.
To answer that question, let's return to the filtering technique we learned in [chapter seven](filters.md). Let's look at the column we're going to filter by, `committee_position`.

```{code-cell}
merged_prop.committee_position.value_counts()
```
Now let's filter our `merged_prop` table down using that column and the pandas filtering method that combines a column, an operator and the value we want to filter by. Let's stick the result in a variable.

Filter our `merged_prop` table down using that column and the pandas filtering method that combines a column, an operator and the value we want to filter by. Let's stick the result in a variable.

```{code-cell}
support = merged_prop[merged_prop.committee_position == 'SUPPORT']
```

Now let's repeat all that for opposing contributions. First the filter into a new variable.
Repeat all that for opposing contributions. First the filter into a new variable.

```{code-cell}
oppose = merged_prop[merged_prop.committee_position == 'OPPOSE']
```

Now sum up the total disclosed contributions to each for comparison. First the opposition.
Sum up the total disclosed contributions to each for comparison. First the opposition.

```{code-cell}
oppose.amount.sum()
Expand All @@ -108,11 +110,8 @@ Then the supporters.
support.amount.sum()
```

The support is clearly larger. But what percent is it of the overall disclosed total? We can find out by combining two `sum` calculations using the division operator.
The support is clearly larger. But what percent is it of the overall disclosed total? We can find out by combining two `sum` calculations using Python’s built-in division operator.

```{code-cell}
support.amount.sum() / merged_prop.amount.sum()
```

[only required]: http://www.documentcloud.org/documents/2781363-460-2016-01.html#document/p10
[sum]: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.sum.html

0 comments on commit 9ea3f2e

Please sign in to comment.