# Alternative Groupby Syntax

This chapter covers a few other alternative syntaxes available to do aggregations with the `groupby` method. This chapter has potential to confuse beginning pandas users since these methods do not give you any extra power to do data analysis. However, many other people who use pandas will use these syntaxes so it's important to be aware that they exist. Let's begin by reading in the San Francisco employee compensation dataset.

In [None]:
import pandas as pd
import numpy as np
sf_emp = pd.read_csv('../data/sf_employee_compensation.csv')
sf_emp.head(3)

## Aggregating a single column

Originally, we set our new column name equal to a two-item tuple of the aggregating column and the aggregating function within the `agg` method.

In [None]:
sf_emp.groupby('organization group').agg(mean_salary=('salaries', 'mean'))

### Alternative - use a dictionary

Instead of a tuple, you can use a dictionary to map the aggregating column to the aggregating function. The generic syntax takes on the following form:

```python
df.groupby('grouping column').agg({'aggregating column': 'aggregating function'})
```


Although this syntax uses less code, it does not allow you to rename columns during the aggregation.

In [None]:
sf_emp.groupby('organization group').agg({'salaries': 'mean'})

### Alternative - select the column with the brackets

Instead of using a dictionary, place the aggregating columns in brackets following the `groupby` method and then pass the aggregating function as a string to the `agg` method. The generic syntax takes on the following form:

```python
df.groupby('grouping column')['aggregating column'].agg('aggregating function')
```

In [None]:
sf_emp.groupby('organization group')['salaries'].agg('mean')

You can even bypass the `agg` method and use the name of the aggregation as a method directly after the brackets.

In [None]:
sf_emp.groupby('organization group')['salaries'].mean()

### Possible advantage - allows for multiple aggregating columns

Using any of these alternative methods allows you to use multiple aggregating functions with less amount of code. Here, we use a dictionary to make the aggregating column to all of the aggregating functions we desire.

In [None]:
sf_emp.groupby('organization group').agg({'salaries': ['mean', 'min', 'max']})

We would need to use more code with the original syntax, but I prefer this as we are returned a DataFrame with a single level index for the columns and we can name each column exactly what we desire.

In [None]:
sf_emp.groupby('organization group').agg(mean_salary=('salaries', 'mean'),
                                         min_salary=('salaries', 'min'),
                                         max_salary=('salaries', 'max'))

## No Aggregating Columns

You actually do not need to specify the aggregating columns when grouping if using the method version of the aggregation. When doing so, all columns will be aggregated. Here, we set `numeric_only` to `True` to ensure that only the numeric columns are aggregated by the mean.

In [None]:
(sf_emp.groupby(['year', 'organization group'])
       .mean(numeric_only=True)
       .head()
       .round(-3))

## Exercises

Execute the cell below to read in the flights dataset and then use it for the following exercises.

In [None]:
import pandas as pd
flights = pd.read_csv('../data/flights.csv', parse_dates=['date'])
flights.head(3)

### Exercise 1

<span style="color:green; font-size:16px">Use a dictionary in the `groupby` `agg` method to calculate the mean, median, min, and max of the air time for every airline.</span>

### Exercise 2

<span style="color:green; font-size:16px">Without using the `agg` method calculate the number of unique destinations for each airline.</span>

### Exercise 3

<span style="color:green; font-size:16px">Calculate the mean of every numeric column for each airline and origin without using the `agg` method.</span>