# Pivot tables

Pandas group by method provides a powerful and flexible way to summarize a dataframe.  The groupby method will feel right at home to those who are familiar with databases and SQL.

Pandas also provides another method for summarizing a dataframe called pivot_table that performs a lot of the same functions. You can produce many of the same results with a pivot_table as you can with the groupby method.  The pivot_table method may be more comfortable to those with a spreadsheet background who are familiar with pivot tables in spreadsheets.

In [None]:
%matplotlib inline
import geopandas as gpd

raptor_buffer = gpd.read_file('data/intersections.gpkg', layer = 'raptor_buffer')
raptor_buffer.sort_values('Nest_ID').head()

In [None]:
raptor_buffer.groupby(['Project']).sum().head()

We can replicate this output using the pivot_table method.  **NOTE:** The pivot_table method is called on the Pandas object itself rather than a dataframe.

We have to pass in the dataframe as the first parameter, the column to use as the index, and the name of the aggfunc that ypu want to use for summarization.

In [None]:
import pandas as pd

pd.pivot_table(raptor_buffer, index='Project', aggfunc='sum').head()

If you don't want to see a sum of ALL the numeric columns you can restrict the columns to a single one or a list of columns using the values parameter.

In [None]:
pd.pivot_table(raptor_buffer, index='Project', values='area_ha', aggfunc='sum').head()

In [None]:
pd.pivot_table(raptor_buffer, index='Project', values=['area_ha', 'length_m'], aggfunc='sum').head()

Most of the other parameters can be included as a single value or a list as well. Lets add another level of indexing.

In [None]:
pd.pivot_table(raptor_buffer, index=['Project', 'recentspec'], values=['area_ha', 'length_m'], aggfunc='sum').head()

Lets add another aggregate function as well.

In [None]:
pd.pivot_table(raptor_buffer, index=['Project', 'recentspec'], values=['area_ha', 'length_m'], aggfunc=['sum', 'count']).head()

Further subdividing of columns can be achieved using the columns parameter.  This will take a categorical variable and produce the same output for each variable. Lets use the recentstat column.

In [None]:
pd.pivot_table(raptor_buffer, index=['Project', 'recentspec'], values=['area_ha', 'length_m'], columns='recentstat', aggfunc=['sum', 'count']).head()

In my opinion however, I think it would be preferabble to add this as another level of index which would provide the same results but foramtted differently.

In [None]:
pd.pivot_table(raptor_buffer, index=['Project', 'recentspec', 'recentstat'], values=['area_ha', 'length_m'], aggfunc=['sum', 'count']).head()

If you want to total each column you can set the marigins parameter to True

In [None]:
pd.pivot_table(raptor_buffer, index=['Project', 'recentspec', 'recentstat'], values=['area_ha', 'length_m'], margins=True, aggfunc=['sum', 'count'])

The transpose() method can be applied to any dataframe in order to switch the index and columns. If is most useful on smaller dataframes such as those that result from summary methods.  Lets remove the Project column from indexes to create a smaller data frame.

In [None]:
pd.pivot_table(raptor_buffer, index=['recentspec', 'recentstat'], values=['area_ha', 'length_m'], aggfunc=['sum', 'count'])

and then transpose it.

In [None]:
pd.pivot_table(raptor_buffer, index=['recentspec', 'recentstat'], values=['area_ha', 'length_m'], aggfunc=['sum', 'count']).transpose()