New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pruning method in WTQ #88
Comments
Thanks for your interest in TAPAS! Can you provide some more details? |
Thank you for your quick response. The questions asked were: result2=predict(holiday_list_of_list, ["Which people are there?","What is the start date of Brittas Südfrankreich Urlaub?","End date of Brittas Südfrankreich Urlaub?","What is the total Duration of Britta Glatts Holidaystyle Urlaub?"]) This is what the table looks like, it contains 36 rows: The predictions worked perfectly, when I dropped the last column "TESTCATEGORY". But when I leave it in the dataframe, I get the error mentioned above. |
Thanks for the quick response @sophgit . In order to facilitate debugging, do you mind sharing the table in a computer friendly format, for example a list of lists? Even better, if you can share a colab that reproduces the error that would be great, which you can do with Google Drive or saving to a github gist from the Save menu. |
Yes, we can open it. I think the problem is that the current CLI call: ! python -m tapas.run_task_main \
--task="WTQ" \
--output_dir="results" \
--noloop_predict \
--test_batch_size={len(queries)} \
--tapas_verbosity="ERROR" \
--compression_type= \
--reset_position_index_per_cell \
--init_checkpoint="tapas_model/model.ckpt" \
--bert_config_file="tapas_model/bert_config.json" \
--mode="predict" 2> error \
--prune_columns Does only run the predictions but assumes that all TF examples have been created. The actually conversion that should be affected happens in the convert_interactions_to_examples function. |
To add pruning to the colab you will have to create a token selector:
and then you can call it just before calling the converter: interaction = token_selector.annotated_interaction(interaction)
number_annotation_utils.add_numeric_values(interaction)
for i in range(len(interaction.questions)):
try:
yield converter.convert(interaction, i)
except ValueError as e:
print(f"Can't convert interaction: {interaction.id} error: {e}") When I tried this I realized there was some problem with beam not being properly installed. import apache_beam as beam
def fake_counter(namespace, message):
class FakeCounter():
def inc(increment=None, other=None):
pass
return FakeCounter()
class FakeMetrics:
def __init__(self):
self.counter = fake_counter
class FakeMetricsModule:
def __init__(self):
self.Metrics = FakeMetrics()
beam.metrics = FakeMetricsModule() |
Looks like the apache_beam thing can also be fixed by restarting the runtime. See #89 for details. |
Thank you so much!!! It seems to work. At least I don't get an error anymore and it does predict. Unfortunately the answers to the questions above are mainly incorrect now, but I'll see if I can work with that. :) |
Great that it's working for you now. I am closing this issue, feel free to open a new issue for any model quality problems and we can see if there is something we can do about it. |
Hello,
I am new to this topic and I'm currently trying to use the pruning/filtering method for long tables in the WTQ notebook.
I tried using the flag --prune_columns in the prediction function, but it still gives me "Can't convert interaction: error: Sequence too long".
What are the necessary steps to filter/prune long tables during prediction?
Thank you in advance.
The text was updated successfully, but these errors were encountered: