## Initial Setup

This may take a few minutes.

In [None]:
import pandas as pd
import numpy as np
import gauss.engines
import gauss.ui

engine = gauss.engines.create_pandas_lite_engine()

## About

Gauss (AutoPandas 2.0) is a programming-by-example (PbE) system for table transformations in Pandas. PbE systems generally accept plain input-output examples. Prior work for table transformations such as AutoPandas and Morpheus accept input-output tables as examples and synthesize the target program. This has two main flaws - 

1. Providing a full-output table can be tedious, error-prone, and often defeats the purpose of synthesis, especially in the case of large tables.
2. A lot of rich information is thrown away. For example, the type of aggregation being performed, something which is known to the user of the system.

Gauss extends the input-output example modality for table transformations, by providing a rich interface to the user to construct **partial outputs** or partial tables, using an array of operators covering simple operations such as addition subtraction. Gauss transparently captures the user interaction and generalizes the partial output to synthesize better code, faster.

In this tutorial, you will understand the basics of the interface provided by Gauss, and how you can use it to synthesize code for table transformation tasks that are easy to understand but relatively hard to automate. 

## Interface Tutorial

In this section, we go through the basics of using the tool. By itself, this should not involve any coding on the user's part. We aim to cover the following: 

1. Familiarize with the interface and its components.
2. Understanding and using the partial output editor.
3. Interpreting the output of the synthesizer (Gauss)

**Step 1** : Let us first create a dataframe upon which we intend to perform transformations. This will serve as the *input* part of the specification

In [None]:
df = pd.DataFrame([["Pants", 50, 70], ["Pants", 100, 90], ["Shirts", 80, 110]],
                 columns=["Type", "Low", "High"])
df

**Step 2** : Let us load it into the UI provided by Gauss

In [None]:
gauss.ui.start_synthesis([df], engine=engine)

## Example Tasks

#### Example Task #1

Given an input table with a column of statistics for each year:
```
         Metric     Y1      Y2      Y3      Y4      Y5
0        means  0.5200  0.5700  0.6000  0.6300  0.6300
1       stddev  0.1328  0.1321  0.1303  0.1266  0.1225
2  upper_range  0.6600  0.7000  0.7300  0.7500  0.7500
3  lower_range  0.3900  0.4400  0.4700  0.5000  0.5100
```

Transform the table such that it is transposed of sorts. That is, there is a `year` column containing `Y1`, `Y2`, `Y3`, `Y4` and `Y5`, and `means`, `stddev`, `upper_range` and `lower_range` become the new columns. Something along the lines of the following, where blanks represent omitted cells.
```
      year  lower_range  means  stddev  upper_range
0       Y1                                         
1       Y2                                         
2       Y3                                         
3       Y4                                         
4       Y5                                         
```

In [None]:
input_table = pd.DataFrame({
    'Metric': ['means', 'stddev', 'upper_range', 'lower_range'], 
    'Y1': [0.52, 0.1328, 0.66, 0.39],
    'Y2': [0.57, 0.1321, 0.7, 0.44], 
    'Y3': [0.6, 0.1303, 0.7303, 0.4700], 
    'Y4': [0.63, 0.1266, 0.7500, 0.5000],
    'Y5': [0.63, 0.1225, 0.7500, 0.5100]
})
input_table

In [None]:
gauss.ui.start_synthesis([input_table], engine=engine)

#### Example Task #2

##### Description

Given a table of low and high prices for each product:
```
    Product  Low  High
0     Phone   40    90
1        TV   50    80
2   Speaker   10    70
3  Computer  100   160
```
Produce a table that gives the price range (difference between low and high price) for each product. That is, something along the lines of the following, where blank lines represent omitted cells.
```
    Product  Range 
0     Phone    50
1        TV    
2   Speaker    
3  Computer    
```

In [None]:
input_table = pd.DataFrame({
    'Product': ['Phone', 'TV', 'Speaker', 'Computer'], 
    'Low': [40, 50, 10, 100], 
    'High': [90, 80, 70, 160]
})
input_table

In [None]:
gauss.ui.start_synthesis([input_table], engine=engine)

## Practice Tasks

#### Practice Task #1

##### Description
Given an input table with populations for each city in a region:
```
         Region      City      Population
0           foo      bar            10000
1           foo      baz            23000
2           Hex      Add            27810
3           Hex      Dad            35010
4           Hex      Fed            40770
```
Create a table showing the total population for each region, i.e.:
```
      Region  Population
0       foo        33000  
1       Hex       103590 
```
##### Exercise

Use Gauss to synthesize a `pandas` program that performs a similar transformation on this input:

In [None]:
input_table = pd.DataFrame({'Region': ['Misthalin', 'Kandarin', 'Kandarin', 'Misthalin', 'Misthalin'], 'City': ['Lumbdridge', 'Ardougne', 'Catherby', 'Varrock', 'Draynor Village'],
             'Population': [72, 950, 32, 1744, 29]})
input_table

In [None]:
gauss.ui.start_synthesis([input_table], engine=engine)