# Table Question Generation: Inference example

In this notebook, we will show how to use a pretrained TableQG model to generate questions. We first sample SQL queries from a given table, and then use a text-to-text transformer (T5) to transcribe the SQL query to a natural language question. 

## Dependencies

If not already done, make sure to install PrimeQA with notebooks extras before getting started.

In [1]:
from primeqa.qg.models.qg_model import QGModel
from tabulate import tabulate # only used to visualize table

Downloading https://raw.githubusercontent.com/stanfordnlp/stanza-resources/main/resources_1.4.0.json:   0%|   …

2022-07-29 11:32:43 INFO: Downloading default packages for language: multilingual (multilingual)...
2022-07-29 11:32:43 INFO: File exists: /u/jaydesen/stanza_resources/multilingual/default.zip
2022-07-29 11:32:43 INFO: Finished downloading models and saved to /u/jaydesen/stanza_resources.


## Loading pretrained model from huggingface

This model was trained using PrimeQA library and uploaded to huggingface.

In [2]:
model_name = 'PrimeQA/t5-base-table-question-generator'
table_qg_model = QGModel(model_name, modality='table')

<br>

## Sample table

Tables should be passed a `list` of `dicts`. Each `dict` will correspond to a table with keys `"header"` and `"rows"`.

In [5]:
table_list = [
    {"header": ["Player", "No.", "Nationality", "Position", "Years in Toronto", "School Team"],
      "rows": [
            ["Antonio Lang", 21, "United States", "Guard-Forward", "1999-2000", "Duke"],
            ["Voshon Lenard", 2, "United States", "Guard", "2002-03", "Minnesota"],
            ["Martin Lewis", 32, "United States", "Guard-Forward", "1996-97", "Butler CC (KS)"],
            ["Brad Lohaus", 33, "United States", "Forward-Center", "1996", "Iowa"],
            ["Art Long", 42, "United States", "Forward-Center", "2002-03", "Cincinnati"]
        ]
    }, 

      {"header": ["Employee", "No.", "Company", "Position", "year_of_appointment"],
      "rows": [
            ["John Doe", 142500, "IBM", "Researcher", "1999"],
            ["Niel Bridge", 177600, "Microsoft", "Manager", "2003"],
            ["Rick Jowes", 320022, "Google", "Engineer", "1996"],
            ["Ron Duff", 330110, "Meta", "Admin", "1998"],
            ["Sean Root", 420020, "Amazon", "Director", "2002"]
        ]
    }
]

id_list = ["abcID123"]

print(tabulate(table_list[0]['rows'], headers=table_list[0]['header'], tablefmt='grid'))
print(tabulate(table_list[1]['rows'], headers=table_list[1]['header'], tablefmt='grid'))

+---------------+-------+---------------+----------------+--------------------+----------------+
| Player        |   No. | Nationality   | Position       | Years in Toronto   | School Team    |
| Antonio Lang  |    21 | United States | Guard-Forward  | 1999-2000          | Duke           |
+---------------+-------+---------------+----------------+--------------------+----------------+
| Voshon Lenard |     2 | United States | Guard          | 2002-03            | Minnesota      |
+---------------+-------+---------------+----------------+--------------------+----------------+
| Martin Lewis  |    32 | United States | Guard-Forward  | 1996-97            | Butler CC (KS) |
+---------------+-------+---------------+----------------+--------------------+----------------+
| Brad Lohaus   |    33 | United States | Forward-Center | 1996               | Iowa           |
+---------------+-------+---------------+----------------+--------------------+----------------+
| Art Long      |    42 | Unit

## Generate questions

There are some arguments to control the type of questions generated.
#### Controls:
- `num_questions_per_instance`: Number of questions to generate per table (default=5)
- `agg_prob`: Probability distribution over aggregates. `agg_prob` should be a probability vector of length 6, with each index giving prob of an aggregate ops appearing in this order `['select', 'maximum', 'minimum', 'count', 'sum', 'average']`. (default=`[1,0,0,0,0,0]`)
- `num_where_prob`: Should be a vector of size 5 with probablities of number of where clauses to use while generating sqls. If k where clause can't be generated the code tries to generate k-1 where clause query, and so on. (default: [0,1,0,0,0])
- `ineq_prob`: Probability of generating inequality clauses in SQLs. It should be a float.(default=0.0)

In [7]:
table_qg_model.generate_questions(table_list, 
                                    num_questions_per_instance = 10,
                                    agg_prob = [1.,0,0,0,0,0],
                                    num_where_prob = [0,1.,0,0,0],
                                    ineq_prob = 0.0,
                                    id_list=[]
                                )

[{'question': "What is Voshon Lenard's nationality?",
  'answer': 'united states'},
 {'question': 'What position does the player from Iowa play?',
  'answer': 'forward-center'},
 {'question': 'What number has 1996-97 as the years in Toronto?',
  'answer': '32.0'},
 {'question': 'What is the nationality of player number 32.0?',
  'answer': 'united states'},
 {'question': 'What is the nationality of player number 32.0?',
  'answer': 'united states'},
 {'question': 'Name the school team for number 32.0',
  'answer': 'butler cc (ks)'},
 {'question': 'What is the nationality of player 2.0?',
  'answer': 'united states'},
 {'question': 'What school team did player 2.0 play for?',
  'answer': 'minnesota'},
 {'question': 'What position does number 33.0 play?',
  'answer': 'forward-center'},
 {'question': 'Which School Team has a Years in Toronto of 1999-2000?',
  'answer': 'duke'},
 {'question': 'Which company has a year_of_appointment of 1996.0?',
  'answer': 'google'},
 {'question': 'What po