# Calculations on Groups

The [WIFIRE Lab](https://wifire.ucsd.edu/index.php/) uses footage streamed from a collection of wireless cameras across California to research and model the spread of wildfires. Suppose WIFIRE plans to send a new camera [to a county, want it to have impact and be where the most fires are -- and they've asked you to determine which county has the largest average fire size.]

[we know from the basics of querying that we can ask for just the rows of a certain county, and then we can find the average fire size (number of fires?) of, say, San Diego.]

In [None]:
import babypandas as bpd

fires = bpd.read_csv("data/calfire-full.csv").set_index("name")

In [None]:
san_diego = fires.loc[fires.get('county') == 'San Diego County']
san_diego.get('acres').mean()

And we could do the same pattern for Yolo County.

In [None]:
yolo = fires.loc[fires.get('county') == 'Yolo County']
yolo.get('acres').mean()

And we could write the same pattern for every county in our dataset... but that has a few issues. Namely, it'll take a ton of time to write, a ton of time to run, and requires us to look at the dataset and find the name of all of the counties.

How many counties are there again? Yikes.

In [None]:
fires.get('county').unique()

Fortunately, this pattern is common enough that it has a special function to carry it out.

## GroupBy

Whenever we want to perform a calculation on all the rows that belong to a single *group*, and we want to perform that calculation across all of our groups, we can use the `.groupby` function.

In [None]:
fires.groupby('county').mean().get('acres')

When we call `.groupby` on a DataFrame, we pass a column name as the argument and are essentially telling Babypandas to group up our DataFrame based on the values of that column. Any rows that have the same value in that column will show up in the same group.

In [None]:
fires.groupby('county')

Notice that we don't get a table back when doing this step. Instead, we get a special type of object which serves as a placeholder for the next step.

The groupby object owns a handful of methods that we can use to perform calculations, such as `.mean`, `.sum`, `.max`, and `.count`. [once we use these, the groups will be aggregated into that calculation across all possible columns. for that reason we call them {dterm}`aggregation functions`.]

```{jupytertip}
[just like string methods, can do `.` on a groupby object then press {kbd}`Tab` to see all of the possible functions that can be used.]
```

[does it for all columns] [index changes]

## Grouping by multiple columns