Poor performance of theorem predictor #10

ICanFlyGFC · 2022-09-19T09:01:51Z

Hello, Pan. Thank you for your open source.

I download checkpoint model from https://acl2021-intergps.s3.us-west-1.amazonaws.com/tp_model_best.pt
But the evaluation results are empty. How can I get it back to normal? Thanks.

lupantech · 2022-09-19T15:45:05Z

Hi, Thank you for your interest in our work!

This evaluation result is not normal. Would you mind sharing the script you were running and the log it printed? It could help me narrow down the reasons.

Thanks!

Best,
Pan

ICanFlyGFC · 2022-09-20T00:35:13Z

Thank you for your reply!

The script is same as yours. I only change the file name of output.

#!/usr/bin/env python

coding: utf-8

import json
import ast
from tqdm import tqdm

import torch
from transformers import BartForConditionalGeneration, BartTokenizerFast

def evaluate(diagram_logic_file, text_logic_file, tokenizer_name, model_name, check_point, seq_num):

test_lst = range(2401, 3002)

## read logic form files
with open(diagram_logic_file) as f:
    diagram_logic_forms = json.load(f)
with open(text_logic_file) as f:
    text_logic_forms = json.load(f)

combined_logic_forms = {}
for pid in test_lst:
    combined_logic_forms[pid] = diagram_logic_forms[str(pid)]['diagram_logic_forms'] + \
                                text_logic_forms[str(pid)]['text_logic_forms']

## build tokenizer and model
tokenizer = BartTokenizerFast.from_pretrained(tokenizer_name) # 'facebook/bart-base'
model = BartForConditionalGeneration.from_pretrained(model_name).to(device) # 'facebook/bart-base'
model.load_state_dict(torch.load(check_point))

final = dict()
for pid in tqdm(test_lst):
    input = str(combined_logic_forms[pid])
    tmp = tokenizer.encode(input)
    if len(tmp) > 1024:
        tmp = tmp[:1024]
    input = torch.LongTensor(tmp).unsqueeze(0).to(device)

    output = model.generate(input, bos_token_id=0, eos_token_id=2,
                         max_length=20, num_beams=10, num_return_sequences=seq_num)
    # print(out.size())

    ## refine output sequence
    seq = []
    for j in range(seq_num):
        res = tokenizer.decode(output[j].tolist())
        res = res.replace("</s>", "").replace("<s>", "").replace("<pad>", "")
        # print(res)
        try:
            res = ast.literal_eval(res) # string class to list class
        except Exception as e:
            res = []
        seq.append(res)

    final[str(pid)] = {"id": str(pid), "num_seqs": seq_num, "seq": seq}

return final

if name == 'main':

diagram_logic_file = '../data/geometry3k/logic_forms/diagram_logic_forms_annot.json'
text_logic_file = '../data/geometry3k/logic_forms/text_logic_forms_annot_dissolved.json'

check_point = 'models/tp_model_best.pt'
output_file = 'results/test/pred_seqs_test_debugging.json'

tokenizer_name = 'facebook/bart-base'
model_name = 'facebook/bart-base'

SEQ_NUM = 5

device = torch.device('cuda:0')

result = evaluate(diagram_logic_file, text_logic_file, tokenizer_name, model_name, check_point, SEQ_NUM)

with open(output_file, 'w') as f:
    json.dump(result, f)

The log:

D:\Anaconda\envs\intergps\python.exe D:/WorkSpace/InterGPS-main/theorem_predict/eval_transformer.py
0%| | 0/601 [00:00<?, ?it/s]D:\Anaconda\envs\intergps\lib\site-packages\transformers\generation_utils.py:1839: UserWarning: floordiv is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
next_indices = next_tokens // vocab_size
22%|██▏ | 135/601 [00:23<01:25, 5.43it/s]Token indices sequence length is longer than the specified maximum sequence length for this model (1569 > 1024). Running this sequence through the model will result in indexing errors
100%|██████████| 601/601 [01:42<00:00, 5.88it/s]

Process finished with exit code 0

Thanks!

lupantech · 2022-09-20T00:52:16Z

Hi,

Below is my script:

cd symbolic_solver
python test.py --label final --strategy final

And the running log is here: https://github.com/lupantech/InterGPS/blob/main/symbolic_solver/logs/final/log-1612098244-predict_low-first_1.log.

The executed result is here: https://github.com/lupantech/InterGPS/blob/main/symbolic_solver/pred_results/final/logic_1612098244-predict_low-first_1.json.

ICanFlyGFC · 2022-09-20T01:10:37Z

Thank you Pan!

I can run your script to get the corresponding results. But I am focus on theorem predictor.
I wonder how to generate ../theorem_predict/results/pred_seq_result_bart_epoch19_seq5.json.
I also found that many geometry problems can be solved by rules based on formal language without theorems.
Can I understand that theorem prediction is not so important in this paper?

Thanks!

Best,
Fucheng

lupantech · 2022-09-20T01:19:37Z

Hi Fucheng,

For the theorem predictor, you can follow the instructions at https://github.com/lupantech/InterGPS#theorem-predictor.

For the second question, yes. As we discussed in the paper, one of the main functions of the theorem predictor is to improve the search efficiency and thus improve the final accuracy, which is verified in Table 7 and Figure 5.

Best,
Pan

ICanFlyGFC · 2022-09-20T01:33:47Z

Thanks, Pan!

I follow the instruction at https://github.com/lupantech/InterGPS#theorem-predictor.
I download the pre-trained model at step 4. But the evaluation results are empty in step 5.

Thanks!

Best,
Fucheng

lupantech · 2022-09-20T01:41:19Z

Hi Fucheng,

I see. Would you mind if I checked your issue a few days later? I am working on some emergent deadlines and I need more time to figure your problem out. For now, I think it is not a big problem to ignore the theorem predictor if you just want to reproduce our results.

I appreciate your understanding!

Best,
Pan

ICanFlyGFC · 2022-09-20T01:45:45Z

Thanks, Pan.

Sure.
Thank you for your work and look forward to your new achievements. Your paper and code have inspired me a lot.

Best,
Fucheng

lupantech · 2022-09-20T01:47:17Z

Hi Fucheng,

Thanks! I am happy to help with your project as well!

Yours sincerely,
Pan

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Poor performance of theorem predictor #10

Poor performance of theorem predictor #10

ICanFlyGFC commented Sep 19, 2022

lupantech commented Sep 19, 2022

ICanFlyGFC commented Sep 20, 2022 •

edited

lupantech commented Sep 20, 2022

ICanFlyGFC commented Sep 20, 2022

lupantech commented Sep 20, 2022

ICanFlyGFC commented Sep 20, 2022

lupantech commented Sep 20, 2022

ICanFlyGFC commented Sep 20, 2022

lupantech commented Sep 20, 2022

Poor performance of theorem predictor #10

Poor performance of theorem predictor #10

Comments

ICanFlyGFC commented Sep 19, 2022

lupantech commented Sep 19, 2022

ICanFlyGFC commented Sep 20, 2022 • edited

coding: utf-8

lupantech commented Sep 20, 2022

ICanFlyGFC commented Sep 20, 2022

lupantech commented Sep 20, 2022

ICanFlyGFC commented Sep 20, 2022

lupantech commented Sep 20, 2022

ICanFlyGFC commented Sep 20, 2022

lupantech commented Sep 20, 2022

ICanFlyGFC commented Sep 20, 2022 •

edited