# Table Question Generation: Inference example

In this notebook, we will show how to use a pretrained TableQG model to generate questions. We first sample SQL queries from a given table, and then use a text-to-text transformer (T5) to transcribe the SQL query to a natural language question. 

## Dependencies

If not already done, make sure to install PrimeQA with notebooks extras before getting started.

In [1]:
from primeqa.qg.models.qg_model import QGModel
from tabulate import tabulate # only used to visualize table

2023-11-17 15:12:33.598091: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /path/to/anaconda/envs/myenv/lib:/usr/local/cuda/lib64:/usr/local/cuda-11.7/lib64::/usr/local/cuda/lib64:/usr/local/cuda-11.7/lib64


{"time":"2023-11-17 15:12:35,145", "name": "numexpr.utils", "level": "INFO", "message": "Note: NumExpr detected 40 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8."}
{"time":"2023-11-17 15:12:35,146", "name": "numexpr.utils", "level": "INFO", "message": "NumExpr defaulting to 8 threads."}


Downloading https://raw.githubusercontent.com/stanfordnlp/stanza-resources/main/resources_1.4.1.json:   0%|   …

{"time":"2023-11-17 15:12:36,241", "name": "stanza", "level": "INFO", "message": "Downloading default packages for language: multilingual (multilingual) ..."}


Downloading https://huggingface.co/stanfordnlp/stanza-multilingual/resolve/v1.4.1/models/default.zip:   0%|   …

{"time":"2023-11-17 15:12:39,599", "name": "stanza", "level": "INFO", "message": "Finished downloading models and saved to /home/sjbuhan/stanza_resources."}


## Loading pretrained model from huggingface

This model was trained using PrimeQA library and uploaded to huggingface.

In [2]:
model_name = 'PrimeQA/t5-base-table-question-generator'
table_qg_model = QGModel(model_name, modality='table')

Downloading (…)lve/main/config.json:   0%|          | 0.00/1.35k [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/892M [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/1.92k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/2.42M [00:00<?, ?B/s]

Downloading (…)in/added_tokens.json:   0%|          | 0.00/98.0 [00:00<?, ?B/s]

<br>

## Sample table

Tables should be passed a `list` of `dicts`. Each `dict` will correspond to a table with keys `"header"` and `"rows"`.

In [4]:
table_list = [
    {"header": ["Player", "No.", "Nationality", "Position", "Years in Toronto", "School Team"],
      "rows": [
            ["Antonio Lang", 21, "United States", "Guard-Forward", "1999-2000", "Duke"],
            ["Voshon Lenard", 2, "United States", "Guard", "2002-03", "Minnesota"],
            ["Martin Lewis", 32, "United States", "Guard-Forward", "1996-97", "Butler CC (KS)"],
            ["Brad Lohaus", 33, "United States", "Forward-Center", "1996", "Iowa"],
            ["Art Long", 42, "United States", "Forward-Center", "2002-03", "Cincinnati"]
        ]
    }
]
# [optional] include an id_list aligned with table_list
id_list = ["abcID123"]

print(tabulate(table_list[0]['rows'], headers=table_list[0]['header'], tablefmt='grid'))

+---------------+-------+---------------+----------------+--------------------+----------------+
| Player        |   No. | Nationality   | Position       | Years in Toronto   | School Team    |
| Antonio Lang  |    21 | United States | Guard-Forward  | 1999-2000          | Duke           |
+---------------+-------+---------------+----------------+--------------------+----------------+
| Voshon Lenard |     2 | United States | Guard          | 2002-03            | Minnesota      |
+---------------+-------+---------------+----------------+--------------------+----------------+
| Martin Lewis  |    32 | United States | Guard-Forward  | 1996-97            | Butler CC (KS) |
+---------------+-------+---------------+----------------+--------------------+----------------+
| Brad Lohaus   |    33 | United States | Forward-Center | 1996               | Iowa           |
+---------------+-------+---------------+----------------+--------------------+----------------+
| Art Long      |    42 | Unit

## Generate questions

There are some arguments to control the type of questions generated.
#### Controls:
- `num_questions_per_instance`: Number of questions to generate per table (default=5)
- `agg_prob`: Probability distribution over aggregates. `agg_prob` should be a probability vector of length 6, with each index giving prob of an aggregate ops appearing in this order `['select', 'maximum', 'minimum', 'count', 'sum', 'average']`. (default=`[1,0,0,0,0,0]`)
- `num_where_prob`: Should be a vector of size 5 with probablities of number of where clauses to use while generating sqls. If k where clause can't be generated the code tries to generate k-1 where clause query, and so on. (default: [0,1,0,0,0])
- `ineq_prob`: Probability of generating inequality clauses in SQLs. It should be a float.(default=0.0)
- `id_list`: Include an id_list of tables aligned with table_list, defaults to empty list.

In [5]:
table_qg_model.generate_questions(table_list, 
                                    num_questions_per_instance = 5,
                                    agg_prob = [1.,0,0,0,0,0],
                                    num_where_prob = [0,1.,0,0,0],
                                    ineq_prob = 0.0,
                                    id_list=id_list
                                )

[{'context_id': 'abcID123',
  'context': 'select <<sep>> School Team <<sep>> Player <<cond>> equal <<cond>> Brad Lohaus <<answer>> iowa <<header>> Player <<hsep>> No. <<hsep>> Nationality <<hsep>> Position <<hsep>> Years in Toronto <<hsep>> School Team',
  'questions': ['What school team did Brad Lohaus play for?'],
  'answer': 'iowa'},
 {'context_id': 'abcID123',
  'context': 'select <<sep>> School Team <<sep>> Player <<cond>> equal <<cond>> Martin Lewis <<answer>> butler cc (ks) <<header>> Player <<hsep>> No. <<hsep>> Nationality <<hsep>> Position <<hsep>> Years in Toronto <<hsep>> School Team',
  'questions': ['What school team did Martin Lewis play for?'],
  'answer': 'butler cc (ks)'},
 {'context_id': 'abcID123',
  'context': 'select <<sep>> Position <<sep>> School Team <<cond>> equal <<cond>> Cincinnati <<answer>> forward-center <<header>> Player <<hsep>> No. <<hsep>> Nationality <<hsep>> Position <<hsep>> Years in Toronto <<hsep>> School Team',
  'questions': ['What is the posit