Skip to content

Commit

Permalink
Clean up
Browse files Browse the repository at this point in the history
  • Loading branch information
palewire committed Mar 14, 2023
1 parent 6a963c8 commit ccde062
Show file tree
Hide file tree
Showing 5 changed files with 16 additions and 16 deletions.
2 changes: 1 addition & 1 deletion docs/src/charts.md
Original file line number Diff line number Diff line change
Expand Up @@ -130,7 +130,7 @@ What important facet of the data is this chart *not* showing? There are two Robi
We have that `latimes_make` column in our original dataframe, but it got lost when we created our ranking because we didn't include it in our `groupby` command. We can fix that by scrolling back up our notebook and adding it to the command. You will need to replace what's there with a list containing both columns we want to keep.

```{code-cell}
accident_counts = accident_list.groupby(["latimes_make", "latimes_make_and_model"]).size().reset_index()
accident_counts = accident_list.groupby(["latimes_make", "latimes_make_and_model"]).size().rename("accidents").reset_index()
```

Rerun all of the cells below to update everything you're working with. Now if you inspect the ranking you should see the `latimes_make` column included.
Expand Down
10 changes: 5 additions & 5 deletions docs/src/columns.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ kernelspec:

# Columns

We’ll begin with the `latimes_make_and_model` column, which records the standardized name of each helicopter that crashed. To access its contents separate from the rest of the DataFrame, append a period to the variable followed by the column’s name.
We’ll begin with the `latimes_make_and_model` column, which records the standardized name of each helicopter that crashed. To access its contents separate from the rest of the DataFrame, append a pair of flat brackets with the column’s name in quotes inside.

```{code-cell}
:tags: [hide-cell]
Expand All @@ -27,13 +27,13 @@ accident_list = pd.read_csv("https://raw.githubusercontent.com/palewire/first-py

```{code-cell}
:tags: [show-input]
accident_list.latimes_make_and_model
accident_list['latimes_make_and_model']
```

That will list the column out as a `Series`, just like the ones we created from scratch earlier. Just as we did then, you can now start tacking on additional methods that will analyze the contents of the column.

````{note}
You can also access columns a second way, like this: `accident_list['latimes_make_and_model']`. This method isn’t as pretty, but it’s required if your column has a space in its name, which would break the simpler dot-based method.
You can also access columns a second way, like this: `accident_list.latimes_make_and_model`. This method is quicker to type, but it won't work if your column has a space in its name. So we're teaching the universal bracket method instead.
````

## Count a column's values
Expand All @@ -44,7 +44,7 @@ There’s another built-in pandas tool that will total up the frequency of value

```{code-cell}
:tags: [show-input]
accident_list.latimes_make_and_model.value_counts()
accident_list['latimes_make_and_model'].value_counts()
```

Congratulations, you've made your first finding. With that little line of code, you've calculated an important fact: During the period being studied, the Robinson R44 had more fatal accidents than any other helicopter.
Expand All @@ -55,7 +55,7 @@ You may notice that even though the result has two columns, pandas did not retur

```{code-cell}
:tags: [show-input]
accident_list.latimes_make_and_model.value_counts().reset_index()
accident_list['latimes_make_and_model'].value_counts().reset_index()
```

Why does a Series behave differently than a DataFrame? Why does `reset_index` have such a weird name?
Expand Down
6 changes: 3 additions & 3 deletions docs/src/compute.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,14 +32,14 @@ In many cases, it’s no more complicated than combining two series using a math

```{code-cell}
:tags: [show-input]
merged_list.accidents / merged_list.total_hours
merged_list['accidents'] / merged_list['total_hours']
```

The resulting series can be added to your dataframe by assigning it to a new column. You name your column by providing it as a quoted string inside of flat brackets. Let's call this column something brief and clear like `per_hour`.

```{code-cell}
:tags: [show-input]
merged_list['per_hour'] = merged_list.accidents / merged_list.total_hours
merged_list['per_hour'] = merged_list['accidents'] / merged_list['total_hours']
```

Which, like everything else, you can inspect with the `head` command.
Expand All @@ -53,5 +53,5 @@ You can see that the result is in [scientific notation](https://en.wikipedia.org

```{code-cell}
:tags: [show-input]
merged_list['per_100k_hours'] = (merged_list.accidents / merged_list.total_hours) * 100_000
merged_list['per_100k_hours'] = merged_list['per_hour'] * 100_000
```
4 changes: 2 additions & 2 deletions docs/src/filters.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,14 +40,14 @@ In the next cell we will ask pandas to narrow down our list of accidents to just

```{code-cell}
:tags: [show-input]
accident_list[accident_list.state == my_state]
accident_list[accident_list['state'] == my_state]
```

Now we should save the results of that filter into a new variable separate from the full list we imported from the CSV file. Since it includes only the sites for the state we want, let’s call it `my_accidents`.

```{code-cell}
:tags: [show-input]
my_accidents = accident_list[accident_list.state == my_state]
my_accidents = accident_list[accident_list['state'] == my_state]
```

To check our work and find out how many records are left after the filter, let's run the DataFrame inspection commands we learned earlier.
Expand Down
10 changes: 5 additions & 5 deletions docs/src/groupby.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ The result is much like `value_counts`, but we're allowed run to all kinds of st

```{code-cell}
:tags: [show-input]
accident_list.groupby("latimes_make_and_model").total_fatalities.sum()
accident_list.groupby("latimes_make_and_model")['total_fatalities'].sum()
```

Again our data has come back as an ugly Series. To reformat it as a pretty DataFrame use the `reset_index` method again.
Expand All @@ -53,18 +53,18 @@ Again our data has come back as an ugly Series. To reformat it as a pretty DataF
accident_list.groupby("latimes_make_and_model").size().reset_index()
```

Now save that as a variable.
You can clean up the `0` column name assigned by pandas with the `rename` method.

```{code-cell}
:tags: [show-input]
accident_counts = accident_list.groupby("latimes_make_and_model").size().reset_index()
accident_list.groupby("latimes_make_and_model").size().rename("accidents").reset_index()
```

You can clean up the `0` column name assigned by pandas with the `rename` method. The `inplace` option, found on many pandas methods, will save the change to your variable automatically.
Now save that as a variable.

```{code-cell}
:tags: [show-input]
accident_counts.rename(columns={0: "accidents"}, inplace=True)
accident_counts = accident_list.groupby("latimes_make_and_model").size().rename("accidents").reset_index()
```

The result is a DataFrame with the accident totals we'll want to merge with the FAA survey data to calculate rates.
Expand Down

0 comments on commit ccde062

Please sign in to comment.