# _pivot vs. pivot_table_ in Pandas and Polars


It's easy to mix up `pivot` vs. `pivot_table` since both have similar names and perform similar functions. Here's the difference:

1. `pivot` - Reshape a table *without* aggregation in Pandas, *optionally* aggregate in Polars
2. `pivot_table` - Reshape a table *with* aggregation in Pandas

When you use `pivot_table`, it will aggregate all of your duplicate values of your index. If you want to pivot your table without aggregation, you need to use `pivot`. But what if you have duplicate values of your index? You can fix that by forcing it to be unique with a cumulative count and making it a multi-level index. 

Here is an example of both methods, plus a bonus for how to do both of these in Polars using a single `pivot` method.

In [4]:
import datetime
import pandas as pd
import polars as pl

long_df = pd.DataFrame(
    {
        "Date": 
        [
            datetime.datetime.strptime("01-01-2020", '%m-%d-%Y').date(),
            datetime.datetime.strptime("01-01-2020", '%m-%d-%Y').date(),
            datetime.datetime.strptime("01-02-2020", '%m-%d-%Y').date(),
            datetime.datetime.strptime("01-02-2020", '%m-%d-%Y').date(),
            datetime.datetime.strptime("01-03-2020", '%m-%d-%Y').date(),
            datetime.datetime.strptime("01-04-2020", '%m-%d-%Y').date(),
            datetime.datetime.strptime("01-04-2020", '%m-%d-%Y').date(),
            datetime.datetime.strptime("01-01-2020", '%m-%d-%Y').date(),
            datetime.datetime.strptime("01-01-2020", '%m-%d-%Y').date(),
            datetime.datetime.strptime("01-02-2020", '%m-%d-%Y').date(),
            datetime.datetime.strptime("01-02-2020", '%m-%d-%Y').date(),
            datetime.datetime.strptime("01-03-2020", '%m-%d-%Y').date(),
            datetime.datetime.strptime("01-04-2020", '%m-%d-%Y').date(),
            datetime.datetime.strptime("01-04-2020", '%m-%d-%Y').date()
        ], 
        "Category": 
        [
            "category_X", "category_X", "category_X", 
            "category_X", "category_X", "category_X", 
            "category_X", "category_Y", "category_Y", 
            "category_Y", "category_Y", "category_Y", 
            "category_Y", "category_Y"
        ], 
        "Values": [30, 40, 20, 30, 40, 50, 60, 25, 30, 42, 54, 21, 23, 30]
    }
)

pl_long_df = pl.DataFrame(long_df)

long_df

Unnamed: 0,Date,Category,Values
0,2020-01-01,category_X,30
1,2020-01-01,category_X,40
2,2020-01-02,category_X,20
3,2020-01-02,category_X,30
4,2020-01-03,category_X,40
5,2020-01-04,category_X,50
6,2020-01-04,category_X,60
7,2020-01-01,category_Y,25
8,2020-01-01,category_Y,30
9,2020-01-02,category_Y,42


# Long to Wide: No aggregation + duplicates

In [None]:
long_df['count'] = long_df.groupby('Category').cumcount()

wide_df = (
    long_df.pivot(index=['Date', 'count'], columns='Category', values='Values')
           .droplevel('count') 
           .rename_axis(None, axis=1)
)

wide_df


Unnamed: 0_level_0,category_X,category_Y
Date,Unnamed: 1_level_1,Unnamed: 2_level_1
2020-01-01,30,25
2020-01-01,40,30
2020-01-02,20,42
2020-01-02,30,54
2020-01-03,40,21
2020-01-04,50,23
2020-01-04,60,30


# Long to Wide: With Aggregation

In [None]:
wide_df_agg = (
    long_df.pivot_table(index='Date', columns='Category', values='Values', aggfunc='sum')
           .rename_axis(None, axis=1)
)

wide_df_agg

Unnamed: 0_level_0,category_X,category_Y
Date,Unnamed: 1_level_1,Unnamed: 2_level_1
2020-01-01,70,55
2020-01-02,50,96
2020-01-03,40,21
2020-01-04,110,53


# Pivot with Polars

In Polars, you have the `pivot` method which will do both of these functions at once. By default, it does no aggregation: but you can specify an aggregation to do so.

In [15]:
pl_long_df = (
    pl_long_df.with_columns(
        count = pl.col('Date').cum_count().over('Category')
    )
)

pl_wide_df = (
    pl_long_df.pivot(on='Category', index=['Date', 'count'], values='Values')
              .drop('count')
)

pl_wide_df

Date,category_X,category_Y
date,i64,i64
2020-01-01,30,25
2020-01-01,40,30
2020-01-02,20,42
2020-01-02,30,54
2020-01-03,40,21
2020-01-04,50,23
2020-01-04,60,30


In [18]:
pl_wide_df_agg = pl_long_df.pivot(on='Category', index='Date', values='Values', aggregate_function='sum')

pl_wide_df_agg

Date,category_X,category_Y
date,i64,i64
2020-01-01,70,55
2020-01-02,50,96
2020-01-03,40,21
2020-01-04,110,53
