Cleaned up totals page a bit

palewire · Feb 26, 2022 · 9ea3f2e · 9ea3f2e
1 parent 19893fe
commit 9ea3f2e
Show file tree

Hide file tree

Showing 2 changed files with 25 additions and 26 deletions.
diff --git a/docs/src/index.md b/docs/src/index.md
@@ -35,7 +35,7 @@ dataframe
 columns
 filters
 merge
-totals/index
+totals
 sort_values/index
 groupby/index
 compute

diff --git a/docs/src/totals/index.md → docs/src/totals.md b/docs/src/totals/index.md → docs/src/totals.md
@@ -11,18 +11,25 @@ kernelspec:
   name: python3
 ---
 
-```{include} ../_templates/nav.html
+```{include} ./_templates/nav.html
 ```
 
 # Totals
 
-In some ways, your database is no different from a human source. Getting a good story requires careful, thorough questioning. In this section we will move ahead by conducting an interview with pandas to pursue our quest of finding out the biggest donors to Proposition 64.
+In some ways, your database is no different from a human source. Getting a good story requires careful, thorough questioning.
 
-Using tricks we learned as far back as {doc}`chapter three </pandas/index>`, we can start off by answering a simple question: What is the total sum of Proposition 64 contributions that have been reported?
+In this section we will use pandas to interview our data as we continue our quest to find out the biggest donors for and against Proposition 64.
 
-## Summing a column
+```{contents} Sections
+  :depth: 1
+  :local:
+```
+
+## Sum a column
 
-To answer that let's start by getting our hands on `amount`, the column from the contributions DataFrame with the numbers in it. We can do that just as we did with other columns earlier.
+Using tricks we learned as far back as [chapter two](pandas.md), we can start off by answering a simple question: What is the total sum of Proposition 64 contributions that have been reported?
+
+To answer that let’s start by getting our hands on `amount`, the column from the contributions DataFrame with numbers in it. We can do that just as we did with other columns earlier.
 
 ```{code-cell}
 :tags: [hide-cell]
@@ -39,17 +46,13 @@ merged_prop = merged_everything[merged_everything.prop_name == my_prop]
 merged_prop.amount
 ```
 
-Now we can add up the column's total using the pandas method [sum], just as we did when we were first getting started with pandas.
+Now we can add up the column's total using the pandas method [sum](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.sum.html), just as we did when we were first getting started.
 
 ```{code-cell}
 merged_prop.amount.sum()
 ```
 
-And printed out below your cell, there's our answer.
-
-We've completed our first piece of analysis and discovered the total amount spent on this proposition.
-
-Time to run off to Twitter and publish our results to the world, right?
+We've completed our first piece of analysis and discovered the total amount spent on this proposition. Time to run off to Twitter and publish our results to the world, right?
 
 Wrong.
 
@@ -59,44 +62,43 @@ The total we generated is not the overall total raised in the campaign, and it i
 
 Why?
 
-In California, campaigns are [only required] to disclose the names of donors who give over \$100, so our data is missing all of the donors who gave less than that amount.
+In California, campaigns are [only required](http://www.documentcloud.org/documents/2781363-460-2016-01.html#document/p10) to disclose the names of donors who give over \$100, so our data is missing all of the donors who gave less than that amount.
 
 The cutoff varies, and there are some exceptions, but the same thing is true in other states and also at the federal level in races for Congress and the White House.
 
 The overall totals are instead reported on cover sheets included with disclosure reports that lump together all the smaller contributions as part of a grand total. Those are the records most commonly cited to total up a campaign's fundraising.
 
 The result is that an itemized list of contributions, like the one we have, cannot be used to calculate a grand total. That's true in California and virtually anywhere else you work with campaign data. Overlooking that limitation is a rookie mistake routinely made by analysts new to this field.
 
-But that doesn't mean our data is worthless. We just have to use it responsibly. In many cases, professional campaign reporters will refer to an analysis drawn from a list like ours as applying only to "large donors."
+But that doesn't mean our data are worthless. We just have to use our list responsibly. In many cases, professional campaign reporters will refer to an analysis like ours as applying only to "large donors."
 
 Since large donors typically account for most of the money, the results are still significant. And the high level of detail included in each record — like the donor's name, employer and occupation — makes the limitations worth working through.
 
-## Which side got more large donations?
+## Which side raised more?
 
 Adding up a big total is all well and good. But we're aiming for something more nuanced.
 
-We want to separate the money spent supporting the proposition from the money opposing it. Then we want to find out who raised more.
-
-To answer that question, let's return to the filtering technique we learned in {doc}`chapter seven </filter/index>`.
+We want to separate the money spent supporting the proposition from the money opposing it. Then we want to find out which side raised more.
 
-First let's look at the column we're going to filter by, `committee_position`.
+To answer that question, let's return to the filtering technique we learned in [chapter seven](filters.md). Let's look at the column we're going to filter by, `committee_position`.
 
 ```{code-cell}
 merged_prop.committee_position.value_counts()
 ```
-Now let's filter our `merged_prop` table down using that column and the pandas filtering method that combines a column, an operator and the value we want to filter by. Let's stick the result in a variable.
+
+Filter our `merged_prop` table down using that column and the pandas filtering method that combines a column, an operator and the value we want to filter by. Let's stick the result in a variable.
 
 ```{code-cell}
 support = merged_prop[merged_prop.committee_position == 'SUPPORT']
 ```
 
-Now let's repeat all that for opposing contributions. First the filter into a new variable.
+Repeat all that for opposing contributions. First the filter into a new variable.
 
 ```{code-cell}
 oppose = merged_prop[merged_prop.committee_position == 'OPPOSE']
 ```
 
-Now sum up the total disclosed contributions to each for comparison. First the opposition.
+Sum up the total disclosed contributions to each for comparison. First the opposition.
 
 ```{code-cell}
 oppose.amount.sum()
@@ -108,11 +110,8 @@ Then the supporters.
 support.amount.sum()
 ```
 
-The support is clearly larger. But what percent is it of the overall disclosed total? We can find out by combining two `sum` calculations using the division operator.
+The support is clearly larger. But what percent is it of the overall disclosed total? We can find out by combining two `sum` calculations using Python’s built-in division operator.
 
 ```{code-cell}
 support.amount.sum() / merged_prop.amount.sum()
 ```
-
-[only required]: http://www.documentcloud.org/documents/2781363-460-2016-01.html#document/p10
-[sum]: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.sum.html