answer contains <pad> leads to error in the example #90

deangeckt · 2021-11-01T09:50:00Z

when running the example:

nlp("42 is the answer to life, universe and everything.")

ValueError Traceback (most recent call last)
in
----> 1 nlp("42 is the answer to life, universe and everything.")

~\question_generation\pipelines.py in call(self, inputs)
58 qg_examples = self._prepare_inputs_for_qg_from_answers_prepend(inputs, answers)
59 else:
---> 60 qg_examples = self._prepare_inputs_for_qg_from_answers_hl(sents, answers)
61
62 qg_inputs = [example['source_text'] for example in qg_examples]

~\question_generation\pipelines.py in _prepare_inputs_for_qg_from_answers_hl(self, sents, answers)
140 answer_text = answer_text.strip()
141
--> 142 ans_start_idx = sent.index(answer_text)
143
144 sent = f"{sent[:ans_start_idx]} {answer_text} {sent[ans_start_idx + len(answer_text): ]}"

ValueError: substring not found

in _extract_answers() ,
when debugging i saw the "pad" in the dec (in the answer)

dec = [self.ans_tokenizer.decode(ids, skip_special_tokens=False) for ids in outs]

could be fixed when using skip_special_tokens to TRUE?

deangeckt · 2021-11-01T10:09:52Z

also in _prepare_inputs_for_qg_from_answers_hl()
i'd add this check to avoid further exceptions

if answer_text not in sent:
continue

cdhx · 2021-12-20T12:13:08Z

same issue
this is intermediate result during execute

input: 42 is the answer to life, the universe and everything.
sents, answers: ['42 is the answer to life, the universe and everything.'] [['<pad> 42']]
answer: [['<pad> 42']]

YiLing28 · 2021-12-28T11:28:29Z

in pipelines.py, line 90, set skip_special_tokens=True
dec = [self.ans_tokenizer.decode(ids, skip_special_tokens=True) for ids in outs]
This may solve the problem.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

answer contains <pad> leads to error in the example #90

answer contains <pad> leads to error in the example #90

deangeckt commented Nov 1, 2021 •

edited

deangeckt commented Nov 1, 2021

cdhx commented Dec 20, 2021

YiLing28 commented Dec 28, 2021

answer contains <pad> leads to error in the example #90

answer contains <pad> leads to error in the example #90

Comments

deangeckt commented Nov 1, 2021 • edited

deangeckt commented Nov 1, 2021

cdhx commented Dec 20, 2021

YiLing28 commented Dec 28, 2021

deangeckt commented Nov 1, 2021 •

edited