# Table-GPT 
This notebook provides instructions and codes to reproduce the main results in our paper.

## Step 1: Load Results
The experiment results for each model are saved as json files under the ``result`` folder. 

In [1]:
import pandas as pd
result_gpt = pd.read_json("results/result_GPT-3.5.jsonl", lines=True)
result_table_gpt = pd.read_json("results/result_Table-GPT-3.5.jsonl", lines=True)
result_chatgpt = pd.read_json("results/result_ChatGPT.jsonl", lines=True)
result_table_chatgpt = pd.read_json("results/result_Table-ChatGPT.jsonl", lines=True)

In [2]:
display(result_gpt.head(5))

Unnamed: 0,task,dataset,case,setting,prompt,completion,label
0,Column Finding,Spreadsheets-CF,Excel10013,Zero-Shot,# Task Description: Please look at the table b...,"{""result"": ""BalanceLeftTD""}","{""result"": ""Current Month""}"
1,Column Finding,Spreadsheets-CF,Excel10017,Zero-Shot,# Task Description: Please look at the table b...,"{""result"": ""Day Name""}","{""result"": ""Day Name""}"
2,Column Finding,Spreadsheets-CF,Excel10022,Zero-Shot,# Task Description: Please look at the table b...,"{""result"": ""Percentage SCS_ 6&7""}","{""result"": ""EO""}"
3,Column Finding,Spreadsheets-CF,Excel10026,Zero-Shot,# Task Description: Please look at the table b...,"{""result"": ""WORLD VISION""}","{""result"": ""OXFAM""}"
4,Column Finding,Spreadsheets-CF,Excel10028,Zero-Shot,# Task Description: Please look at the table b...,"{""result"": ""Autism""}","{""result"": ""Intellectual Disabilities""}"


# Step 2: Evaluate Performance
We provide a `Evaluator` class that can compute the performance score for each task. We can use it to evaluate the performance of each model by running the following code.

In [3]:
from evaluator import Evaluator
def evaluate_model(result, model_name):
    # Compute the performance score for each dataset
    zeroshot = []
    fewshot = []
    
    for (task, setting, dataset), res in result.groupby(["task", "setting", "dataset"]):
        evaluator = Evaluator(task=task)
        score = evaluator.compute_score(res)
        eval_result = {
            "task": task,
            "dataset": dataset,
            f"{model_name} {setting}": round(score, 3)
        }
        if setting == "Zero-Shot":
            zeroshot.append(eval_result)
        else:
            fewshot.append(eval_result)
            
    zeroshot = pd.DataFrame(zeroshot).set_index(["task", "dataset"])
    fewshot = pd.DataFrame(fewshot).set_index(["task", "dataset"])
    scores = pd.concat([zeroshot, fewshot], axis=1)
    return scores

scores_gpt = evaluate_model(result_gpt, "GPT-3.5")
scores_table_gpt = evaluate_model(result_table_gpt, "Table-GPT-3.5")
scores_chatgpt = evaluate_model(result_chatgpt, "ChatGPT")
scores_table_chatgpt = evaluate_model(result_table_chatgpt, "Table-ChatGPT")

Here are what the evaluation results look like.

In [4]:
display(scores_table_gpt)

Unnamed: 0_level_0,Unnamed: 1_level_0,Table-GPT-3.5 Zero-Shot,Table-GPT-3.5 Few-Shot
task,dataset,Unnamed: 2_level_1,Unnamed: 3_level_1
Column Finding,Spreadsheets-CF,0.713,0.817
Column Type Annotation,Efthymiou,0.886,0.847
Column Type Annotation,Limaye,0.755,0.853
Column Type Annotation,Sherlock,0.449,0.538
Column Type Annotation,T2D,0.875,0.915
Data Imputation,ExcelSynthetic,0.558,0.625
Entity Matching,Amazon-Google,0.657,0.676
Entity Matching,Beer,0.727,0.923
Entity Matching,DBLP-ACM,0.847,0.912
Entity Matching,DBLP-GoogleScholar,0.861,0.896


## Step 3: Compare Evaluation Results
We compare the results of different models (Table 3 in the paper).

In [5]:
summary = pd.concat([scores_gpt, scores_table_gpt, scores_chatgpt, scores_table_chatgpt], axis=1)
summary = summary[["GPT-3.5 Zero-Shot", "Table-GPT-3.5 Zero-Shot", "GPT-3.5 Few-Shot", "Table-GPT-3.5 Few-Shot", "ChatGPT Zero-Shot", "Table-ChatGPT Zero-Shot", "ChatGPT Few-Shot", "Table-ChatGPT Few-Shot"]]
summary = summary.loc[["Column Finding", "Column Type Annotation", "Missing Value Identification", "Table Question", "Data Imputation", "Entity Matching", "Error Detection", "Schema Matching", "Row-to-Row Transformation"]]
display(summary)

Unnamed: 0_level_0,Unnamed: 1_level_0,GPT-3.5 Zero-Shot,Table-GPT-3.5 Zero-Shot,GPT-3.5 Few-Shot,Table-GPT-3.5 Few-Shot,ChatGPT Zero-Shot,Table-ChatGPT Zero-Shot,ChatGPT Few-Shot,Table-ChatGPT Few-Shot
task,dataset,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
Column Finding,Spreadsheets-CF,0.461,0.713,0.683,0.817,0.699,0.807,0.804,0.849
Column Type Annotation,Efthymiou,0.757,0.886,0.784,0.847,0.824,0.882,0.806,0.861
Column Type Annotation,Limaye,0.683,0.755,0.719,0.853,0.742,0.769,0.832,0.854
Column Type Annotation,Sherlock,0.332,0.449,0.528,0.538,0.455,0.483,0.521,0.553
Column Type Annotation,T2D,0.776,0.875,0.83,0.915,0.828,0.887,0.853,0.912
Missing Value Identification,Column (no seperator),0.261,0.294,0.383,0.441,0.299,0.351,0.468,0.474
Missing Value Identification,Column (with seperator),0.305,0.457,0.519,0.643,0.422,0.52,0.635,0.665
Missing Value Identification,Row (no seperator),0.768,0.851,0.774,0.882,0.822,0.84,0.859,0.894
Missing Value Identification,Row (with seperator),0.875,0.959,0.917,0.976,0.923,0.936,0.96,0.968
Table Question,WikiTest,0.45,0.486,0.455,0.478,0.513,0.521,0.52,0.528


We can compute the average scores over different tasks, which are the results we show in Figure 8 and Figure 9 in the paper.

In [6]:
avg = summary.groupby("task").agg("mean").round(3)
display(avg)

Unnamed: 0_level_0,GPT-3.5 Zero-Shot,Table-GPT-3.5 Zero-Shot,GPT-3.5 Few-Shot,Table-GPT-3.5 Few-Shot,ChatGPT Zero-Shot,Table-ChatGPT Zero-Shot,ChatGPT Few-Shot,Table-ChatGPT Few-Shot
task,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Column Finding,0.461,0.713,0.683,0.817,0.699,0.807,0.804,0.849
Column Type Annotation,0.637,0.741,0.715,0.788,0.712,0.755,0.753,0.795
Data Imputation,0.423,0.558,0.57,0.625,0.524,0.594,0.609,0.649
Entity Matching,0.23,0.778,0.779,0.863,,0.839,,0.894
Error Detection,0.068,0.604,0.328,0.548,0.068,0.6,0.404,0.618
Missing Value Identification,0.552,0.64,0.648,0.736,0.616,0.662,0.73,0.75
Row-to-Row Transformation,,,0.531,0.651,,,0.635,0.687
Schema Matching,1.0,1.0,1.0,1.0,0.857,1.0,1.0,1.0
Table Question,0.45,0.486,0.455,0.478,0.513,0.521,0.52,0.528
