# Table Question Answering based on Tapex: Inference example

In this notebook, we will show how to use our pretrained Tapex based TableQA model to answer questions over a table. The pre-trained model is available in huggingface and is fine-tuned on wikitableqeustions dataset.

## Dependencies

If not already done, make sure to install PrimeQA with notebooks extras before getting started.

In [1]:
from primeqa.tableqa.tapex.tapex_component import TapexReader
import pandas as pd

## Loading pretrained model from huggingface

This model was trained using PrimeQA library and uploaded to huggingface.

In [None]:
# TapexReader can be instantiated using a config file which specifies the class arguments and training hyperparameters. 

In [2]:
reader = TapexReader("../../primeqa/tableqa/tapex/configs/tapex_config_inference_wtq.json")

# Load the Table 
data = {"Actors": ["Brad Pitt", "Leonardo Di Caprio",
                        "George Clooney"], "Number of movies": ["87", "53", "69"]}
print(pd.DataFrame.from_dict(data))

reading the config from  ../../primeqa/tableqa/tapex/configs/tapex_config_inference_wtq.json
               Actors Number of movies
0           Brad Pitt               87
1  Leonardo Di Caprio               53
2      George Clooney               69


The natural language queries can be passed as a list of strings.

In [3]:
queries = ["how many movies Brad Pitt acted in", "Name the actor who has been in 53 movies"]
answers = reader.predict(data,queries)
print("answers" , answers)

in predict for TapexModel with data:  {'Actors': ['Brad Pitt', 'Leonardo Di Caprio', 'George Clooney'], 'Number of movies': ['87', '53', '69']}  ,queries: ['how many movies Brad Pitt acted in', 'Name the actor who has been in 53 movies']
loading from config at  ../../primeqa/tableqa/tapex/configs/tapex_config_inference_wtq.json
{"time":"2023-01-08 15:05:36,346", "name": "primeqa.tableqa.tapex.tapex_component", "level": "INFO", "message": "Training/evaluation parameters Seq2SeqTrainingArguments(
_n_gpu=0,
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
auto_find_batch_size=False,
bf16=False,
bf16_full_eval=False,
data_seed=None,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_pin_memory=True,
ddp_bucket_cap_mb=None,
ddp_find_unused_parameters=None,
ddp_timeout=1800,
debug=[],
deepspeed=None,
disable_tqdm=False,
do_eval=False,
do_predict=False,
do_train=False,
eval_accumulation_steps=None,
eval_delay=0,
eval_steps=1000,
evaluation_strategy=no,
fp16

Downloading:   0%|          | 0.00/1.10k [00:00<?, ?B/s]

loading configuration file config.json from cache at /u/jaydesen/.cache/huggingface/hub/models--PrimeQA--tableqa_tapex_wtq/snapshots/6684181ee6ed047224a011f4076057c16640e964/config.json
Model config BartConfig {
  "_name_or_path": "PrimeQA/tableqa_tapex_wtq",
  "activation_dropout": 0.0,
  "activation_function": "gelu",
  "architectures": [
    "BartForConditionalGeneration"
  ],
  "attention_dropout": 0.1,
  "bos_token_id": 0,
  "classifier_dropout": 0.0,
  "d_model": 1024,
  "decoder_attention_heads": 16,
  "decoder_ffn_dim": 4096,
  "decoder_layerdrop": 0.0,
  "decoder_layers": 12,
  "decoder_start_token_id": 2,
  "dropout": 0.1,
  "encoder_attention_heads": 16,
  "encoder_ffn_dim": 4096,
  "encoder_layerdrop": 0.0,
  "encoder_layers": 12,
  "eos_token_id": 2,
  "forced_bos_token_id": 0,
  "forced_eos_token_id": 2,
  "id2label": {
    "0": "LABEL_0",
    "1": "LABEL_1",
    "2": "LABEL_2"
  },
  "init_std": 0.02,
  "is_encoder_decoder": true,
  "label2id": {
    "LABEL_0": 0,
    "L

Downloading:   0%|          | 0.00/999k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/957 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.42k [00:00<?, ?B/s]

loading file vocab.json from cache at /u/jaydesen/.cache/huggingface/hub/models--PrimeQA--tableqa_tapex_wtq/snapshots/6684181ee6ed047224a011f4076057c16640e964/vocab.json
loading file merges.txt from cache at /u/jaydesen/.cache/huggingface/hub/models--PrimeQA--tableqa_tapex_wtq/snapshots/6684181ee6ed047224a011f4076057c16640e964/merges.txt
loading file added_tokens.json from cache at None
loading file special_tokens_map.json from cache at /u/jaydesen/.cache/huggingface/hub/models--PrimeQA--tableqa_tapex_wtq/snapshots/6684181ee6ed047224a011f4076057c16640e964/special_tokens_map.json
loading file tokenizer_config.json from cache at /u/jaydesen/.cache/huggingface/hub/models--PrimeQA--tableqa_tapex_wtq/snapshots/6684181ee6ed047224a011f4076057c16640e964/tokenizer_config.json


Downloading:   0%|          | 0.00/1.63G [00:00<?, ?B/s]

loading weights file pytorch_model.bin from cache at /u/jaydesen/.cache/huggingface/hub/models--PrimeQA--tableqa_tapex_wtq/snapshots/6684181ee6ed047224a011f4076057c16640e964/pytorch_model.bin
All model checkpoint weights were used when initializing BartForConditionalGeneration.

All the weights of BartForConditionalGeneration were initialized from the model checkpoint at PrimeQA/tableqa_tapex_wtq.
If your task is similar to the task the model of the checkpoint was trained on, you can already use BartForConditionalGeneration for predictions without further training.


answers {'how many movies Brad Pitt acted in': ' 87', 'Name the actor who has been in 53 movies': ' leonardo di caprio'}


In [None]:
# here we show how to use TapexReader to do eval() as well. We use a different config files specifically suited for eval() demonstration in this notebook.

In [2]:
reader = TapexReader("../../primeqa/tableqa/tapex/configs/tapex_config_eval_wtq.json")
reader.eval()

reading the config from  ../../primeqa/tableqa/tapex/configs/tapex_config_eval_wtq.json
loading from config
loading from config at  ../../primeqa/tableqa/tapex/configs/tapex_config_eval_wtq.json
{"time":"2023-01-09 15:04:10,579", "name": "primeqa.tableqa.tapex.tapex_component", "level": "INFO", "message": "Training/evaluation parameters Seq2SeqTrainingArguments(
_n_gpu=0,
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
auto_find_batch_size=False,
bf16=False,
bf16_full_eval=False,
data_seed=None,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_pin_memory=True,
ddp_bucket_cap_mb=None,
ddp_find_unused_parameters=None,
ddp_timeout=1800,
debug=[],
deepspeed=None,
disable_tqdm=False,
do_eval=False,
do_predict=False,
do_train=False,
eval_accumulation_steps=None,
eval_delay=0,
eval_steps=1,
evaluation_strategy=no,
fp16=False,
fp16_backend=auto,
fp16_full_eval=False,
fp16_opt_level=O1,
fsdp=[],
fsdp_min_num_params=0,
fsdp_transformer_layer_cls_to_wrap=Non

loading configuration file config.json from cache at /u/jaydesen/.cache/huggingface/hub/models--PrimeQA--tableqa_tapex_wtq/snapshots/6684181ee6ed047224a011f4076057c16640e964/config.json
Model config BartConfig {
  "_name_or_path": "PrimeQA/tableqa_tapex_wtq",
  "activation_dropout": 0.0,
  "activation_function": "gelu",
  "architectures": [
    "BartForConditionalGeneration"
  ],
  "attention_dropout": 0.1,
  "bos_token_id": 0,
  "classifier_dropout": 0.0,
  "d_model": 1024,
  "decoder_attention_heads": 16,
  "decoder_ffn_dim": 4096,
  "decoder_layerdrop": 0.0,
  "decoder_layers": 12,
  "decoder_start_token_id": 2,
  "dropout": 0.1,
  "encoder_attention_heads": 16,
  "encoder_ffn_dim": 4096,
  "encoder_layerdrop": 0.0,
  "encoder_layers": 12,
  "eos_token_id": 2,
  "forced_bos_token_id": 0,
  "forced_eos_token_id": 2,
  "id2label": {
    "0": "LABEL_0",
    "1": "LABEL_1",
    "2": "LABEL_2"
  },
  "init_std": 0.02,
  "is_encoder_decoder": true,
  "label2id": {
    "LABEL_0": 0,
    "L



  0%|          | 0/3 [00:00<?, ?it/s]



max_steps is given, it will override any value given in num_train_epochs


{"time":"2023-01-09 15:04:20,350", "name": "primeqa.tableqa.tapex.tapex_component", "level": "INFO", "message": "*** Evaluate ***"}


***** Running Evaluation *****
  Num examples = 10
  Batch size = 4


max_eval_samples is set as:  10


***** eval metrics *****
  eval_denotation_accuracy =        0.8
  eval_loss                =     3.4317
  eval_runtime             = 0:00:48.59
  eval_samples             =         10
  eval_samples_per_second  =      0.206
  eval_steps_per_second    =      0.062
