<a href="https://colab.research.google.com/github/google-research/tapas/blob/master/notebooks/sqa_predictions.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

##### Copyright 2020 The Google AI Language Team Authors

Licensed under the Apache License, Version 2.0 (the "License");

In [None]:
# Copyright 2019 The Google AI Language Team Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

Running a Tapas fine-tuned checkpoint
---
This notebook shows how to load and make predictions with TAPAS model, which was introduced in the paper: [TAPAS: Weakly Supervised Table Parsing via Pre-training](https://arxiv.org/abs/2004.02349)

# Clone and install the repository


First, let's install the code.

In [None]:
! pip install tapas-table-parsing

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting tapas-table-parsing
  Downloading tapas_table_parsing-0.0.1.dev0-py3-none-any.whl (195 kB)
[K     |████████████████████████████████| 195 kB 3.8 MB/s 
[?25hCollecting tensorflow-probability==0.10.1
  Downloading tensorflow_probability-0.10.1-py2.py3-none-any.whl (3.5 MB)
[K     |████████████████████████████████| 3.5 MB 52.8 MB/s 
[?25hCollecting tensorflow~=2.2.0
  Downloading tensorflow-2.2.3-cp37-cp37m-manylinux2010_x86_64.whl (516.4 MB)
[K     |████████████████████████████████| 516.4 MB 16 kB/s 
[?25hCollecting scikit-learn~=0.22.1
  Downloading scikit_learn-0.22.2.post1-cp37-cp37m-manylinux1_x86_64.whl (7.1 MB)
[K     |████████████████████████████████| 7.1 MB 47.5 MB/s 
Collecting tf-slim~=1.1.0
  Downloading tf_slim-1.1.0-py2.py3-none-any.whl (352 kB)
[K     |████████████████████████████████| 352 kB 68.3 MB/s 
[?25hCollecting frozendict==1.2
  Downloading frozendic

# Fetch models fom Google Storage

Next we can get pretrained checkpoint from Google Storage. For the sake of speed, this is base sized model trained on [SQA](https://www.microsoft.com/en-us/download/details.aspx?id=54253). Note that best results in the paper were obtained with a large model, with 24 layers instead of 12.

In [None]:
! gsutil cp gs://tapas_models/2020_04_21/tapas_sqa_base.zip . && unzip tapas_sqa_base.zip

Copying gs://tapas_models/2020_04_21/tapas_sqa_base.zip...
/ [1 files][  1.0 GiB/  1.0 GiB]   17.8 MiB/s                                   
Operation completed over 1 objects/1.0 GiB.                                      
Archive:  tapas_sqa_base.zip
   creating: tapas_sqa_base/
  inflating: tapas_sqa_base/model.ckpt.data-00000-of-00001  
  inflating: tapas_sqa_base/model.ckpt.index  
  inflating: tapas_sqa_base/README.txt  
  inflating: tapas_sqa_base/vocab.txt  
  inflating: tapas_sqa_base/bert_config.json  
  inflating: tapas_sqa_base/model.ckpt.meta  


# Imports

In [None]:
import tensorflow.compat.v1 as tf
import os 
import shutil
import csv
import pandas as pd
import IPython

tf.get_logger().setLevel('ERROR')

In [None]:
from tapas.utils import tf_example_utils
from tapas.protos import interaction_pb2
from tapas.utils import number_annotation_utils
from tapas.scripts import prediction_utils

# Load checkpoint for prediction

Here's the prediction code, which will create and `interaction_pb2.Interaction` protobuf object, which is the datastructure we use to store examples, and then call the prediction script.

In [None]:
os.makedirs('results/sqa/tf_examples', exist_ok=True)
os.makedirs('results/sqa/model', exist_ok=True)
with open('results/sqa/model/checkpoint', 'w') as f:
  f.write('model_checkpoint_path: "model.ckpt-0"')
for suffix in ['.data-00000-of-00001', '.index', '.meta']:
  shutil.copyfile(f'tapas_sqa_base/model.ckpt{suffix}', f'results/sqa/model/model.ckpt-0{suffix}')

In [None]:
max_seq_length = 512
vocab_file = "tapas_sqa_base/vocab.txt"
config = tf_example_utils.ClassifierConversionConfig(
    vocab_file=vocab_file,
    max_seq_length=max_seq_length,
    max_column_id=max_seq_length,
    max_row_id=max_seq_length,
    strip_column_names=False,
    add_aggregation_candidates=False,
)
converter = tf_example_utils.ToClassifierTensorflowExample(config)

def convert_interactions_to_examples(tables_and_queries):
  """Calls Tapas converter to convert interaction to example."""
  for idx, (table, queries) in enumerate(tables_and_queries):
    interaction = interaction_pb2.Interaction()
    for position, query in enumerate(queries):
      question = interaction.questions.add()
      question.original_text = query
      question.id = f"{idx}-0_{position}"
    for header in table[0]:
      interaction.table.columns.add().text = header
    for line in table[1:]:
      row = interaction.table.rows.add()
      for cell in line:
        row.cells.add().text = cell
    number_annotation_utils.add_numeric_values(interaction)
    for i in range(len(interaction.questions)):
      try:
        yield converter.convert(interaction, i)
      except ValueError as e:
        print(f"Can't convert interaction: {interaction.id} error: {e}")
        
def write_tf_example(filename, examples):
  with tf.io.TFRecordWriter(filename) as writer:
    for example in examples:
      writer.write(example.SerializeToString())

def predict(table_data, queries):
  table = [list(map(lambda s: s.strip(), row.split("|"))) 
           for row in table_data.split("\n") if row.strip()]
  examples = convert_interactions_to_examples([(table, queries)])
  write_tf_example("results/sqa/tf_examples/test.tfrecord", examples)
  write_tf_example("results/sqa/tf_examples/random-split-1-dev.tfrecord", [])
  
  ! python -m tapas.run_task_main \
    --task="SQA" \
    --output_dir="results" \
    --noloop_predict \
    --test_batch_size={len(queries)} \
    --tapas_verbosity="ERROR" \
    --compression_type= \
    --init_checkpoint="tapas_sqa_base/model.ckpt" \
    --bert_config_file="tapas_sqa_base/bert_config.json" \
    --mode="predict" 2> error


  results_path = "results/sqa/model/test_sequence.tsv"
  all_coordinates = []
  df = pd.DataFrame(table[1:], columns=table[0])
  display(IPython.display.HTML(df.to_html(index=False)))
  print()
  with open(results_path) as csvfile:
    reader = csv.DictReader(csvfile, delimiter='\t')
    for row in reader:
      coordinates = prediction_utils.parse_coordinates(row["answer_coordinates"])
      all_coordinates.append(coordinates)
      answers = ', '.join([table[row + 1][col] for row, col in coordinates])
      position = int(row['position'])
      print(">", queries[position])
      print(answers)
  return all_coordinates

# Predict

In [18]:
result = predict("""
name |	height	| mass	| hair_color	| skin_color |	eye_color	| birth_year |	gender |	homeworld |	species
Luke Skywalker    |	172    |	77    |	blond    |	fair    |	blue    |	19BBY    |	male    |	Tatooine    |	Human
C-3PO    |	167    |	75    |	NA    |	gold    |	yellow    |	112BBY    |	NA    |	Tatooine    |	Droid
R2-D2    |	96    |	32    |	NA    |	white, blue    |	red    |	33BBY    |	NA    |	Naboo    |	Droid
Darth Vader    |	202    |	136    |	none    |	white    |	yellow    |	41.9BBY    |	male    |	Tatooine    |	Human
Leia Organa    |	150    |	49    |	brown    |	light    |	brown    |	19BBY    |	female    |	Alderaan    |	Human
Owen Lars    |	178    |	120    |	brown, grey    |	light    |	blue    |	52BBY    |	male    |	Tatooine    |	Human
Beru Whitesun lars    |	165    |	75    |	brown    |	light    |	blue    |	47BBY    |	female    |	Tatooine    |	Human
R5-D4    |	97    |	32    |	NA    |	white, red    |	red    |	NA    |	NA    |	Tatooine    |	Droid
Biggs Darklighter    |	183    |	84    |	black    |	light    |	brown    |	24BBY    |	male    |	Tatooine    |	Human
Obi-Wan Kenobi    |	182    |	77    |	auburn, white    |	fair    |	blue-gray    |	57BBY    |	male    |	Stewjon    |	Human
Anakin Skywalker    |	188    |	84    |	blond    |	fair    |	blue    |	41.9BBY    |	male    |	Tatooine    |	Human
Wilhuff Tarkin    |	180    |	NA    |	auburn, grey    |	fair    |	blue    |	64BBY    |	male    |	Eriadu    |	Human
Chewbacca    |	228    |	112    |	brown    |	NA    |	blue    |	200BBY    |	male    |	Kashyyyk    |	Wookiee
Han Solo    |	180    |	80    |	brown    |	fair    |	brown    |	29BBY    |	male    |	Corellia    |	Human
Greedo    |	173    |	74    |	NA    |	green    |	black    |	44BBY    |	male    |	Rodia    |	Rodian
Jabba Desilijic Tiure    |	175    |	1358    |	NA    |	green-tan, brown    |	orange    |	600BBY    |	hermaphrodite    |	Nal Hutta    |	Hutt
Wedge Antilles    |	170    |	77    |	brown    |	fair    |	hazel    |	21BBY    |	male    |	Corellia    |	Human
Jek Tono Porkins    |	180    |	110    |	brown    |	fair    |	blue    |	NA    |	male    |	Bestine IV    |	Human
Yoda    |	66    |	17    |	white    |	green    |	brown    |	896BBY    |	male    |	NA    |	Yoda's species
Palpatine    |	170    |	75    |	grey    |	pale    |	yellow    |	82BBY    |	male    |	Naboo    |	Human
Boba Fett    |	183    |	78.2    |	black    |	fair    |	brown    |	31.5BBY    |	male    |	Kamino    |	Human
""", ["Who is the tallest character?",
      "How many characters are from Tatooine?",
      "What is the homeworld of Darth Vader?","Who is from rodian species in the dataset?","What is the homeworld of Luke Skywalker, Darth Vader, Owen Lars and C-3PO?"])

is_built_with_cuda: True
is_gpu_available: False
GPUs: []
Training or predicting ...
Evaluation finished after training step 0.


name,height,mass,hair_color,skin_color,eye_color,birth_year,gender,homeworld,species
Luke Skywalker,172,77.0,blond,fair,blue,19BBY,male,Tatooine,Human
C-3PO,167,75.0,,gold,yellow,112BBY,,Tatooine,Droid
R2-D2,96,32.0,,"white, blue",red,33BBY,,Naboo,Droid
Darth Vader,202,136.0,none,white,yellow,41.9BBY,male,Tatooine,Human
Leia Organa,150,49.0,brown,light,brown,19BBY,female,Alderaan,Human
Owen Lars,178,120.0,"brown, grey",light,blue,52BBY,male,Tatooine,Human
Beru Whitesun lars,165,75.0,brown,light,blue,47BBY,female,Tatooine,Human
R5-D4,97,32.0,,"white, red",red,,,Tatooine,Droid
Biggs Darklighter,183,84.0,black,light,brown,24BBY,male,Tatooine,Human
Obi-Wan Kenobi,182,77.0,"auburn, white",fair,blue-gray,57BBY,male,Stewjon,Human



> Who is the tallest character?
Darth Vader
> How many characters are from Tatooine?
Luke Skywalker, R5-D4, Darth Vader, Biggs Darklighter, Beru Whitesun lars, Owen Lars, Anakin Skywalker, C-3PO
> What is the homeworld of Darth Vader?
Tatooine
> Who is from rodian species in the dataset?
Greedo
> What is the homeworld of Luke Skywalker, Darth Vader, Owen Lars and C-3PO?
Tatooine, Tatooine, Tatooine
