# Mix Self-Consistency Notebook 

<a href="https://colab.research.google.com/github/run-llama/llama-hub/blob/main/llama_hub/llama_packs/tables/mix_slf_consistency/mix_self_consistency.ipynb" target="_parent">
<img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In this notebook, we highlight the mix self-consistency method proposed in "Rethinking Tabular Data Understanding with Large Language Models" paper by Liu et al.[https://arxiv.org/pdf/2312.16702v1.pdf].

LLMs can reason over tabular data in 2 main ways:
1. textual reasoning via direct prompting
2. symbolic reasoning via program synthesis (e.g. python, SQL, etc)

The key insight of the paper is that different reasoning pathways work well in different tasks. By aggregating results from both with a self-consistency mechanism, it achieves SoTA performance.

We implemented the paper based on the prompts described in the paper, and adapted it to get it working. That said, this is marked as beta, so there may still be kinks to work through. Do you have suggestions / contributions on how to improve the robustness? Let us know! 

# Download Data

We use the [WikiTableQuestions dataset](https://ppasupat.github.io/WikiTableQuestions/) (Pasupat and Liang 2015) as our test dataset.

WikiTableQuestions is a question-answering dataset over various semi-structured tables taken from Wikipedia. These tables range in size from a few rows/columns to mnay rows. Some columns may contain multi-part information as well (e.g. a temperature column may contain both Fahrenheight and Celsius).

In [8]:
!wget "https://github.com/ppasupat/WikiTableQuestions/releases/download/v1.0.2/WikiTableQuestions-1.0.2-compact.zip" -O data.zip
!unzip data.zip

--2024-01-14 11:30:51--  https://github.com/ppasupat/WikiTableQuestions/releases/download/v1.0.2/WikiTableQuestions-1.0.2-compact.zip
Resolving github.com (github.com)... 192.30.255.113
Connecting to github.com (github.com)|192.30.255.113|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://objects.githubusercontent.com/github-production-release-asset-2e65be/82109896/b9b6aeb6-f3c1-11e6-9167-57b997906244?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAVCODYLSA53PQK4ZA%2F20240114%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20240114T193052Z&X-Amz-Expires=300&X-Amz-Signature=1bd2daf88500682f44ddf871c0ee3908244da040168fe80b297a92718d2ae1c6&X-Amz-SignedHeaders=host&actor_id=0&key_id=0&repo_id=82109896&response-content-disposition=attachment%3B%20filename%3DWikiTableQuestions-1.0.2-compact.zip&response-content-type=application%2Foctet-stream [following]
--2024-01-14 11:30:52--  https://objects.githubusercontent.com/github-production-release-asset-2e65be/

Let's visual some examples.

In [1]:
import pandas as pd

examples = pd.read_table('WikiTableQuestions/data/training-before300.tsv')

examples.head()

Unnamed: 0,id,utterance,context,targetValue
0,nt-0,what was the last year where this team was a p...,csv/204-csv/590.csv,2004
1,nt-1,in what city did piotr's last 1st place finish...,csv/204-csv/622.csv,"Bangkok, Thailand"
2,nt-2,which team won previous to crettyard?,csv/204-csv/772.csv,Wolfe Tones
3,nt-3,how many more passengers flew to los angeles t...,csv/203-csv/515.csv,12467
4,nt-4,who was the opponent in the first game of the ...,csv/204-csv/495.csv,Derby County


Let's load the table that can be used as context to answer the question in the first example.

In [2]:
example = examples.iloc[0]
table = pd.read_csv('WikiTableQuestions/' + example['context'])

In [3]:
table

Unnamed: 0,Year,Division,League,Regular Season,Playoffs,Open Cup,Avg. Attendance
0,2001,2,USL A-League,"4th, Western",Quarterfinals,Did not qualify,7169
1,2002,2,USL A-League,"2nd, Pacific",1st Round,Did not qualify,6260
2,2003,2,USL A-League,"3rd, Pacific",Did not qualify,Did not qualify,5871
3,2004,2,USL A-League,"1st, Western",Quarterfinals,4th Round,5628
4,2005,2,USL First Division,5th,Quarterfinals,4th Round,6028
5,2006,2,USL First Division,11th,Did not qualify,3rd Round,5575
6,2007,2,USL First Division,2nd,Semifinals,2nd Round,6851
7,2008,2,USL First Division,11th,Did not qualify,1st Round,8567
8,2009,2,USL First Division,1st,Semifinals,3rd Round,9734
9,2010,2,USSF D-2 Pro League,"3rd, USL (3rd)",Quarterfinals,3rd Round,10727


## Load Pack / Setup

Now we do `download_llama_pack` to load the Mix Self Consistency LlamaPack (you can also import the module directly if using the llama-hub package).

We will also optionally setup observability/tracing so we can observe the intermediate steps.

In [11]:
# Option: if developing with the llama_hub package
# from llama_hub.llama_packs.tables.mix_self_consistency.base import (
#     MixSelfConsistencyQueryEngine,
# )

# Option: download llama_pack
from llama_index.llama_pack import download_llama_pack

download_llama_pack(
    "MixSelfConsistencyPack",
    "./mix_self_consistency_pack",
    skip_load=True,
    # leave the below line commented out if using the notebook on main
    llama_hub_url="https://raw.githubusercontent.com/run-llama/llama-hub/suo/table_qa"
)
from chain_of_table_pack.base import ChainOfTableQueryEngine, serialize_table


JSONDecodeError: Extra data: line 1 column 4 (char 3)

In [7]:
from llama_index.llms import OpenAI
from llama_index.query_engine.mix_self_consistency_query_engine import MixSelfConsistencyQueryEngine

llm = OpenAI()
query_engine = MixSelfConsistencyQueryEngine(
    table=table, 
    llm=llm,
    text_paths=1,
    symbolic_paths=1,
    aggregation_mode='self_evaluation',
    verbose=True,
)

In [8]:
query_engine.query(example['utterance'])

Textual Reasoning Path 1/1
[1;3;38;2;155;135;227m> Running module c1ca8d49-12bb-4901-84af-332f6c71d750 with input: 
title: Untitled Table
question: what was the last year where this team was a part of the usl a-league?
table: |    |   Year |   Division | League              | Regular Season   | Playoffs        | Open Cup        | Avg. Attendance   |
|---:|-------:|-----------:|:--------------------|:-----------------|:----...

[0m[1;3;38;2;155;135;227m> Running module 0fd90385-40c5-42f8-9d44-7d3e1524ac4c with input: 
messages: You are an advanced AI capable of analyzing and understanding information within tables. Read the table below regarding "Untitled Table".

|    |   Year |   Division | League              | Regular Se...

[0m[1;3;38;2;155;135;227m> Running module a03b95e2-6925-46a5-bb80-3987183d946d with input: 
input: assistant: To determine the last year in which this team was a part of the USL A-League, we need to analyze the "Division" column in the table. 

Starting fro

KeyError: 'title'