## Split-Apply-Combine operations with `.groupby()`

Our data often consists of individual observations or events. To make sense of the patterns in the data it can be helpful to aggregate the data within categorical groups. The general method is described as the Split-Apply-Combine strategy for data analysis, as described in [this classic paper by Hadley Wickham](https://www.jstatsoft.org/article/view/v040i01/v40i01.pdf).

- In SQL this is done with `GROUPBY`
- In Tableau this is done with Level of Detail (LOD) calculations
- In R this is done with the `dplyr` package
- **In Python with Pandas this is done with `.groupby()`**

### Groupby

From [the Pandas `.groupby()` documentation](https://pandas.pydata.org/pandas-docs/stable/user_guide/groupby.html):

**By “group by” we are referring to a process involving one or more of the following steps:**

- **Splitting** the data into groups based on some criteria
- **Applying** a function to each group independently
- **Combining** the results into a data structure

After splitting, in **the apply step**, we do something to the groups, like:

- **Aggregation**: compute a summary statistic (or statistics) for each group, like group sums or means, or group sizes / counts
- **Transformation**: perform some group-specific computations and return a like-indexed object, such as a standardize data (zscore) within a group, or filling NAs within groups with a value derived from each group
- **Filtration**: discard some groups, according to a group-wise computation that evaluates True or False, such as discarding data that belongs to groups with only a few members, or filtering out data based on the group sum or mean


#### Example 

As Jake VanderPlas shows in the 
[Aggregation and Grouping](http://nbviewer.jupyter.org/github/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/03.08-Aggregation-and-Grouping.ipynb) 
section of his excellent 
[Python Data Science Handbook](https://github.com/jakevdp/PythonDataScienceHandbook)
, an archtypical example of a `groupby()` operation with a sum aggregation is:

<img src='images/split-apply-combine.svg' width=600>