Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pruning method in WTQ #88

Closed
sophgit opened this issue Nov 18, 2020 · 9 comments
Closed

Pruning method in WTQ #88

sophgit opened this issue Nov 18, 2020 · 9 comments

Comments

@sophgit
Copy link

sophgit commented Nov 18, 2020

Hello,

I am new to this topic and I'm currently trying to use the pruning/filtering method for long tables in the WTQ notebook.
I tried using the flag --prune_columns in the prediction function, but it still gives me "Can't convert interaction: error: Sequence too long".
What are the necessary steps to filter/prune long tables during prediction?

Thank you in advance.

@ghost
Copy link

ghost commented Nov 18, 2020

Thanks for your interest in TAPAS!

Can you provide some more details?
In particular, the exact example your trying to process (question + table)?

@sophgit
Copy link
Author

sophgit commented Nov 18, 2020

Thank you for your quick response. The questions asked were:

result2=predict(holiday_list_of_list, ["Which people are there?","What is the start date of Brittas Südfrankreich Urlaub?","End date of Brittas Südfrankreich Urlaub?","What is the total Duration of Britta Glatts Holidaystyle Urlaub?"])

This is what the table looks like, it contains 36 rows:

image

The predictions worked perfectly, when I dropped the last column "TESTCATEGORY". But when I leave it in the dataframe, I get the error mentioned above.

@eisenjulian
Copy link
Collaborator

Thanks for the quick response @sophgit . In order to facilitate debugging, do you mind sharing the table in a computer friendly format, for example a list of lists? Even better, if you can share a colab that reproduces the error that would be great, which you can do with Google Drive or saving to a github gist from the Save menu.

@sophgit
Copy link
Author

sophgit commented Nov 18, 2020

@ghost
Copy link

ghost commented Nov 18, 2020

Yes, we can open it.

I think the problem is that the current CLI call:

  ! python -m tapas.run_task_main \
    --task="WTQ" \
    --output_dir="results" \
    --noloop_predict \
    --test_batch_size={len(queries)} \
    --tapas_verbosity="ERROR" \
    --compression_type= \
    --reset_position_index_per_cell \
    --init_checkpoint="tapas_model/model.ckpt" \
    --bert_config_file="tapas_model/bert_config.json" \
    --mode="predict" 2> error \
    --prune_columns

Does only run the predictions but assumes that all TF examples have been created.
The prune_columns flag doesn't affect prediction but only the CREATE_DATA mode.

The actually conversion that should be affected happens in the convert_interactions_to_examples function.

@ghost
Copy link

ghost commented Nov 18, 2020

To add pruning to the colab you will have to create a token selector:

from tapas.utils import pruning_utils

token_selector = pruning_utils.HeuristicExactMatchTokenSelector(
      vocab_file,
      max_seq_length,
      pruning_utils.SelectionType.COLUMN,
      use_previous_answer=True,
      use_previous_questions=True,
)

and then you can call it just before calling the converter:

    interaction = token_selector.annotated_interaction(interaction)
    number_annotation_utils.add_numeric_values(interaction)
    for i in range(len(interaction.questions)):
      try:
        yield converter.convert(interaction, i)
      except ValueError as e:
        print(f"Can't convert interaction: {interaction.id} error: {e}")

When I tried this I realized there was some problem with beam not being properly installed.
I had to workaround it like this:

import apache_beam as beam

def fake_counter(namespace, message):
  class FakeCounter():
    def inc(increment=None, other=None):
      pass
  return FakeCounter() 

class FakeMetrics:
  def __init__(self):
    self.counter = fake_counter

class FakeMetricsModule:
  def __init__(self):
    self.Metrics = FakeMetrics()

beam.metrics = FakeMetricsModule()

@ghost
Copy link

ghost commented Nov 18, 2020

Looks like the apache_beam thing can also be fixed by restarting the runtime. See #89 for details.

@sophgit
Copy link
Author

sophgit commented Nov 19, 2020

Thank you so much!!! It seems to work. At least I don't get an error anymore and it does predict. Unfortunately the answers to the questions above are mainly incorrect now, but I'll see if I can work with that. :)

@ghost
Copy link

ghost commented Nov 19, 2020

Great that it's working for you now.

I am closing this issue, feel free to open a new issue for any model quality problems and we can see if there is something we can do about it.

@ghost ghost closed this as completed Nov 19, 2020
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants