# Table Question Answering based on Tapex: Inference example

In this notebook, we will show how to use our pretrained Tapex based TableQA model to answer questions over a table. The pre-trained model is available in huggingface and is fine-tuned on wikitableqeustions dataset.

## Dependencies

If not already done, make sure to install PrimeQA with notebooks extras before getting started.

In [1]:
from primeqa.tableqa.tapex.tapex_component import TapexReader
import pandas as pd

## Instantiating TapexReader using a config with a pre-trained model from Huggingface

This model was trained using PrimeQA library and uploaded to huggingface.

In [2]:
reader = TapexReader("../../primeqa/tableqa/tapex/configs/tapex_config_inference_wtq.json")

# Load the Table 
data = {"Actors": ["Brad Pitt", "Leonardo Di Caprio",
                        "George Clooney"], "Number of movies": ["87", "53", "69"]}
print(pd.DataFrame.from_dict(data))

reading the config from  ../../primeqa/tableqa/tapex/configs/tapex_config_inference_wtq.json
               Actors Number of movies
0           Brad Pitt               87
1  Leonardo Di Caprio               53
2      George Clooney               69


The natural language queries can be passed as a list of strings.

In [3]:
queries = ["how many movies Brad Pitt acted in", "Name the actor who has been in 53 movies"]
answers = reader.predict(data,queries)
print("answers" , answers)

in predict for TapexModel with data:  {'Actors': ['Brad Pitt', 'Leonardo Di Caprio', 'George Clooney'], 'Number of movies': ['87', '53', '69']}  ,queries: ['how many movies Brad Pitt acted in', 'Name the actor who has been in 53 movies']
{"time":"2023-01-27 16:44:26,883", "name": "primeqa.tableqa.tapex.tapex_component", "level": "INFO", "message": "loading from config at ../../primeqa/tableqa/tapex/configs/tapex_config_inference_wtq.json"}
{"time":"2023-01-27 16:44:26,884", "name": "primeqa.tableqa.tapex.tapex_component", "level": "INFO", "message": "loading from config at ../../primeqa/tableqa/tapex/configs/tapex_config_inference_wtq.json"}
{"time":"2023-01-27 16:44:27,290", "name": "primeqa.tableqa.tapex.tapex_component", "level": "INFO", "message": "Training/evaluation parameters Seq2SeqTrainingArguments(
_n_gpu=0,
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
auto_find_batch_size=False,
bf16=False,
bf16_full_eval=False,
data_seed=None,
dataloader_drop_last=

loading configuration file config.json from cache at /u/jaydesen/.cache/huggingface/hub/models--PrimeQA--tableqa_tapex_wtq/snapshots/6684181ee6ed047224a011f4076057c16640e964/config.json
Model config BartConfig {
  "_name_or_path": "PrimeQA/tableqa_tapex_wtq",
  "activation_dropout": 0.0,
  "activation_function": "gelu",
  "architectures": [
    "BartForConditionalGeneration"
  ],
  "attention_dropout": 0.1,
  "bos_token_id": 0,
  "classifier_dropout": 0.0,
  "d_model": 1024,
  "decoder_attention_heads": 16,
  "decoder_ffn_dim": 4096,
  "decoder_layerdrop": 0.0,
  "decoder_layers": 12,
  "decoder_start_token_id": 2,
  "dropout": 0.1,
  "encoder_attention_heads": 16,
  "encoder_ffn_dim": 4096,
  "encoder_layerdrop": 0.0,
  "encoder_layers": 12,
  "eos_token_id": 2,
  "forced_bos_token_id": 0,
  "forced_eos_token_id": 2,
  "id2label": {
    "0": "LABEL_0",
    "1": "LABEL_1",
    "2": "LABEL_2"
  },
  "init_std": 0.02,
  "is_encoder_decoder": true,
  "label2id": {
    "LABEL_0": 0,
    "L

answers {'how many movies Brad Pitt acted in': ' 87', 'Name the actor who has been in 53 movies': ' leonardo di caprio'}


## TapexReader to do eval() as well. 
We use a different config files specifically suited for eval() demonstration in this notebook.

In [4]:
reader = TapexReader("../../primeqa/tableqa/tapex/configs/tapex_config_eval_wtq.json")
reader.eval()

reading the config from  ../../primeqa/tableqa/tapex/configs/tapex_config_eval_wtq.json
{"time":"2023-01-27 16:44:58,684", "name": "primeqa.tableqa.tapex.tapex_component", "level": "INFO", "message": "loading from config at ../../primeqa/tableqa/tapex/configs/tapex_config_eval_wtq.json"}
{"time":"2023-01-27 16:44:58,685", "name": "primeqa.tableqa.tapex.tapex_component", "level": "INFO", "message": "loading from config at ../../primeqa/tableqa/tapex/configs/tapex_config_eval_wtq.json"}


PyTorch: setting up devices
The default value for the training argument `--report_to` will change in v5 (from all installed integrations to none). In v5, you will need to use `--report_to all` to get the same behavior as now. You should start updating your code and make this info disappear :-).


{"time":"2023-01-27 16:44:58,711", "name": "primeqa.tableqa.tapex.tapex_component", "level": "INFO", "message": "Training/evaluation parameters Seq2SeqTrainingArguments(
_n_gpu=0,
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
auto_find_batch_size=False,
bf16=False,
bf16_full_eval=False,
data_seed=None,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_pin_memory=True,
ddp_bucket_cap_mb=None,
ddp_find_unused_parameters=None,
ddp_timeout=1800,
debug=[],
deepspeed=None,
disable_tqdm=False,
do_eval=False,
do_predict=False,
do_train=False,
eval_accumulation_steps=None,
eval_delay=0,
eval_steps=1,
evaluation_strategy=no,
fp16=False,
fp16_backend=auto,
fp16_full_eval=False,
fp16_opt_level=O1,
fsdp=[],
fsdp_min_num_params=0,
fsdp_transformer_layer_cls_to_wrap=None,
full_determinism=False,
generation_max_length=None,
generation_num_beams=None,
gradient_accumulation_steps=8,
gradient_checkpointing=False,
greater_is_better=None,
group_by_length=False,
half_

loading configuration file config.json from cache at /u/jaydesen/.cache/huggingface/hub/models--PrimeQA--tableqa_tapex_wtq/snapshots/6684181ee6ed047224a011f4076057c16640e964/config.json
Model config BartConfig {
  "_name_or_path": "PrimeQA/tableqa_tapex_wtq",
  "activation_dropout": 0.0,
  "activation_function": "gelu",
  "architectures": [
    "BartForConditionalGeneration"
  ],
  "attention_dropout": 0.1,
  "bos_token_id": 0,
  "classifier_dropout": 0.0,
  "d_model": 1024,
  "decoder_attention_heads": 16,
  "decoder_ffn_dim": 4096,
  "decoder_layerdrop": 0.0,
  "decoder_layers": 12,
  "decoder_start_token_id": 2,
  "dropout": 0.1,
  "encoder_attention_heads": 16,
  "encoder_ffn_dim": 4096,
  "encoder_layerdrop": 0.0,
  "encoder_layers": 12,
  "eos_token_id": 2,
  "forced_bos_token_id": 0,
  "forced_eos_token_id": 2,
  "id2label": {
    "0": "LABEL_0",
    "1": "LABEL_1",
    "2": "LABEL_2"
  },
  "init_std": 0.02,
  "is_encoder_decoder": true,
  "label2id": {
    "LABEL_0": 0,
    "L



  0%|          | 0/3 [00:00<?, ?it/s]



max_steps is given, it will override any value given in num_train_epochs


{"time":"2023-01-27 16:45:08,160", "name": "primeqa.tableqa.tapex.tapex_component", "level": "INFO", "message": "*** Evaluate ***"}


***** Running Evaluation *****
  Num examples = 10
  Batch size = 4


max_eval_samples is set as:  10


***** eval metrics *****
  eval_denotation_accuracy =        0.7
  eval_loss                =     1.7337
  eval_runtime             = 0:00:17.59
  eval_samples             =         10
  eval_samples_per_second  =      0.568
  eval_steps_per_second    =       0.17
