# Problem Set 2.9: Aggregations

[Click here to open this notebook in your browser](https://leifwalsh.github.io/data-analysis-problem-sets/lab/index.html?path=2-pandas-basics/2.9-aggregations/2.9-aggregations.ipynb)

Learn how to summarize numerical data with different aggregations like `sum()` and `mean()`.

In this notebook, we'll continue working with NFL data:

In [None]:
import pandas as pd

In [None]:
standings = pd.read_csv("standings.csv")  # From http://www.habitatring.com/standings.csv
standings.head()

Which years do we have coverage for?

In [None]:
standings["season"].unique()

We've seen one form of aggregation before: the `.describe()` method:

In [None]:
standings.loc[:, ["wins", "losses", "ties"]].describe()

I think it makes sense that the median (50%) is 8 wins, 8 losses, and 0 ties (since there are 16 games in the regular season). But it doesn't have to be this way!

Now, think about the mean. Why are the mean wins and losses the same? Why are they less than 8? Does this make sense to you?

## Aggregation Functions

There are a lot of aggregation functions available on a DataFrame: https://pandas.pydata.org/pandas-docs/stable/reference/frame.html#computations-descriptive-stats. We can't cover them all, but we'll try out a few here.

The basic stats you might be familiar with are available. All of these are computed by `.describe()`, but if you need a specific one you can use these:

In [None]:
standings[["wins", "losses", "ties"]].mean()

In [None]:
standings[["wins", "losses", "ties"]].median()

In [None]:
standings[["wins", "losses", "ties"]].max()

In [None]:
standings[["wins", "losses", "ties"]].min()

In [None]:
standings[["wins", "losses", "ties"]].count()

There are others that `.describe()` doesn't give us:

In [None]:
standings[["wins", "losses", "ties"]].sum()

When you compute an aggregation over a DataFrame with multiple columns, you get a Series back with one value per column, as we just saw. It aggregates over each column individually.

If you just do it on one column, you just get a number:

In [None]:
standings["wins"].mean()

## Aggregating subsets

Above, we're getting totals over the whole table. What if we want to break things down by team?

One way is to combine what we already learned (filtering to subsets of data with `.loc[]`) with aggregations:

In [None]:
standings.loc[standings["team"] == "PIT", ["wins", "losses", "ties"]].sum()

Now let's do that for every team. First, we need to know which teams exist:

In [None]:
standings["team"].unique()

We can loop over those teams like this:

In [None]:
for team in standings["team"].unique():
    print(team)

Now, can you compute the wins, losses, and ties for each team?

## Examples

In [None]:
# Re-read data just in case:
standings = pd.read_csv("standings.csv")
standings

### Example 1

Compute the sums of wins, losses, and ties for the AFC conference:

### Example 2

Compute the sums of wins, losses, and ties for every division:

### Example 3

Compute the average number of points scored and allowed per season (and check that the result makes sense):

### Example 4

Count the number of teams in each season (check the pandas documentation linked above: is there a descriptive statistic you can use?):