# Calculations on Groups

The [WIFIRE Lab](https://wifire.ucsd.edu/index.php/) uses footage streamed from a collection of wireless cameras across California to research and model the spread of wildfires. Suppose WIFIRE has a new camera, and they want to send it to whichever county has the largest fires -- they've asked you calculate the average fire size for each of the counties.

We know from the basics of querying that we can ask for just the rows of a single county and then calculate the average fire size in acres using some of the Series calculations we've learned about. Let's do this for San Diego County.

In [None]:
import babypandas as bpd

fires = bpd.read_csv("data/calfire-full.csv").set_index("name")

In [None]:
fires

In [None]:
san_diego = fires.loc[fires.get('county') == 'San Diego County']
san_diego.get('acres').mean()

And we could use the same pattern to find the average fire size for Yolo County.

In [None]:
yolo = fires.loc[fires.get('county') == 'Yolo County']
yolo.get('acres').mean()

And we could write the same pattern for every county in our dataset...

As you can imagine, we don't want this querying process to be so manual. Namely, it'll take a ton of time to write, a ton of time to run, and requires us to look at the dataset and find the name of all of the counties.

How many counties are there again? Yikes.

In [None]:
fires.get('county').unique()

Fortunately, this pattern is common enough that it has a special function to carry it out.

## GroupBy

Whenever we want to perform a calculation on all the rows that belong to a single *group*, and we want to perform that calculation across all of our groups, we can use the `.groupby` function.

```html
<table>.groupby('<column_name>').<calculation>()
```

When we call `.groupby` on a DataFrame, we pass a column name as the argument and are essentially telling Babypandas to split up our DataFrame based on the values of that column. Any rows which have the same value in that column will show up in the same group.

In [None]:
fires.groupby('county')

We don't get a table back when doing this step. Instead, we get a special type of object which serves as a placeholder for the next step.

The groupby object owns a handful of methods that we can use to perform calculations on each group, such as `.mean`, `.sum`, `.max`, and `.count` -- the count method simply counts how many rows there are in each group. Notice that all of these calculations only return *one* number for each group. So once the function operates on a group all of the rows in that group essentially get condensed -- or *aggregated* --into a single row. For this reason, these calculations are called {dterm}`aggregation functions`.

```{note}
You may already know of these calculations by a different name: 'summary statistics'. These are ways to summarize certain aspects of the data into a single number.
```

In [None]:
fires.groupby('county').mean()

Woah there, we only wanted to look at the average acres burnt across each county but we got the average of every numeric column.

This is because the function called on the groupby object will indeed attempt to apply to all the columns. Our original table also has 'unit' and 'cause' columns which both contain strings, but we're not seeing the average cause show up because you can't take the mean of strings! So, the aggregation function applies to all *possible* columns and just drops the rest.

To avoid confusion, and to avoid forcing the computer to do unneccessary work, it's a good practice to select only the columns you need before conducting the groupby -- you should select the column you want to group by, and any columns you want to perform the aggregation function on.

In [None]:
counties_and_acres = fires.get(['county', 'acres'])
counties_and_acres.groupby('county').mean()

Much better.

[index is replaced by the grouping column]

```{jupytertip}
What are all of the methods you can call on a groupby object? Type `.` after a groupby object then press {kbd}`Tab` to see them. Note that some of the possible functions you can use are *not* aggregation functions
[just like string methods, can do `.` on a groupby object then press {kbd}`Tab` to see all of the possible functions that can be used.]
```

## Grouping by multiple columns

[motivating example] [now multiple levels to the index]