# Building a Synthesizer for Pivot

In this tutorial, we will build a simple but efficient synthesizer for the Pandas [pivot](https://pandas.pydata.org/pandas-docs/version/0.22/generated/pandas.DataFrame.pivot.html) function. More concretely, given an input dataframe and a desired output dataframe, our synthesizer will output the arguments that need to be passed to pivot that can reproduce the desired output.

## Arguments Generator for Pivot
First let us define a generator that will be our synthesis engine for pivot. As mentioned earlier, this generator will enumerate possible argument combinations. Here is one version which simply selects one of the columns for each of the arguments (or the default value of `None`).

In [1]:
from atlas import generator

@generator
def pivot_args_generator(input_df):
    arg_columns = Select([None] + list(input_df.columns))
    arg_index = Select([None] + list(input_df.columns))
    arg_values = Select([None] + list(input_df.columns))
    
    return {'index': arg_index, 'columns': arg_columns, 'values': arg_values}

Now let's try running it on a sample dataframe

In [2]:
import pandas as pd
df = pd.DataFrame({
  'foo': ['one', 'one', 'one', 'two', 'two', 'two'],
  'bar': ['A', 'B', 'C', 'A', 'B', 'C'],
  'baz': [10, 20, 30, 40, 50, 60],
})
df

Unnamed: 0,foo,bar,baz
0,one,A,10
1,one,B,20
2,one,C,30
3,two,A,40
4,two,B,50
5,two,C,60


In [3]:
for args in pivot_args_generator.generate(df):
    print(args)

{'index': None, 'columns': None, 'values': None}
{'index': None, 'columns': None, 'values': 'foo'}
{'index': None, 'columns': None, 'values': 'bar'}
{'index': None, 'columns': None, 'values': 'baz'}
{'index': 'foo', 'columns': None, 'values': None}
{'index': 'foo', 'columns': None, 'values': 'foo'}
{'index': 'foo', 'columns': None, 'values': 'bar'}
{'index': 'foo', 'columns': None, 'values': 'baz'}
{'index': 'bar', 'columns': None, 'values': None}
{'index': 'bar', 'columns': None, 'values': 'foo'}
{'index': 'bar', 'columns': None, 'values': 'bar'}
{'index': 'bar', 'columns': None, 'values': 'baz'}
{'index': 'baz', 'columns': None, 'values': None}
{'index': 'baz', 'columns': None, 'values': 'foo'}
{'index': 'baz', 'columns': None, 'values': 'bar'}
{'index': 'baz', 'columns': None, 'values': 'baz'}
{'index': None, 'columns': 'foo', 'values': None}
{'index': None, 'columns': 'foo', 'values': 'foo'}
{'index': None, 'columns': 'foo', 'values': 'bar'}
{'index': None, 'columns': 'foo', 'value

However, not all the argument combinations printed above are valid. Not convinced? Let's try executing them.

In [4]:
for args in pivot_args_generator.generate(df):
    print(args, df.pivot(**args))

ValueError: cannot label index with a null key

Pandas threw an error as our generator is not *precise* while enumerating arguments. That is, it generates argument combinations which cause the pandas pivot function to throw an error. There are a number of hard-to-see constraints embedded in the [documentation](https://pandas.pydata.org/pandas-docs/version/0.22/generated/pandas.DataFrame.pivot.html) for pivot. For example, the columns argument is not marked *Optional* even though it is a keyword argument. We have already done the job of incorporating all these constraints in the generator below.

In [5]:
@generator
def pivot_args_generator(input_df): 
    def dup_filter(cand):
        try:
            return not any(input_df[[cand, arg_columns]].duplicated())
        except:
            return True

    arg_columns = Select(input_df.columns)
    arg_index = Select([None] + list(filter(dup_filter, set(input_df.columns) - {arg_columns})))

    if input_df.index.nlevels > 1 and arg_index is None:
        arg_values = None
    else:
        arg_values = Select(set(input_df.columns) | {None})

    return {'columns': arg_columns, 'index': arg_index, 'values': arg_values}

In [6]:
for args in pivot_args_generator.generate(df):
    print(args)

{'columns': 'foo', 'index': None, 'values': 'baz'}
{'columns': 'foo', 'index': None, 'values': 'bar'}
{'columns': 'foo', 'index': None, 'values': 'foo'}
{'columns': 'foo', 'index': None, 'values': None}
{'columns': 'foo', 'index': 'baz', 'values': 'baz'}
{'columns': 'foo', 'index': 'baz', 'values': 'bar'}
{'columns': 'foo', 'index': 'baz', 'values': 'foo'}
{'columns': 'foo', 'index': 'baz', 'values': None}
{'columns': 'foo', 'index': 'bar', 'values': 'baz'}
{'columns': 'foo', 'index': 'bar', 'values': 'bar'}
{'columns': 'foo', 'index': 'bar', 'values': 'foo'}
{'columns': 'foo', 'index': 'bar', 'values': None}
{'columns': 'bar', 'index': None, 'values': 'baz'}
{'columns': 'bar', 'index': None, 'values': 'bar'}
{'columns': 'bar', 'index': None, 'values': 'foo'}
{'columns': 'bar', 'index': None, 'values': None}
{'columns': 'bar', 'index': 'baz', 'values': 'baz'}
{'columns': 'bar', 'index': 'baz', 'values': 'bar'}
{'columns': 'bar', 'index': 'baz', 'values': 'foo'}
{'columns': 'bar', 'inde

## Building a Brute-Force Synthesizer
Given `pivot_args_generator` we are now ready to build our first brute-force synthesizer for pivot. Given an input-output pair, we will simply enumerate and execute all argument combinations till we find the right one.

In [7]:
from atlas.synthesis.pandas.checker import Checker
def synthesize(input_df, output_df):
    for args in pivot_args_generator.generate(input_df):
        result = input_df.pivot(**args)
        if Checker.check(result, output_df):
            print("Solution Found:", args)
            break

Try it out!

In [8]:
desired_output = pd.DataFrame({'one': {'A': 10, 'B': 20, 'C': 30}, 'two': {'A': 40, 'B': 50, 'C': 60}})
desired_output

Unnamed: 0,one,two
A,10,40
B,20,50
C,30,60


In [9]:
synthesize(df, desired_output)

Solution Found: {'columns': 'foo', 'index': 'bar', 'values': 'baz'}
