NameError: name 'batched_hypos' is not defined (mGENRE) #95

mrpeerat · 2023-02-03T09:17:22Z

Hi!

I ran the mGENRE example in the readme

import pickle

from genre.fairseq_model import mGENRE
from genre.trie import MarisaTrie, Trie

with open("../data/lang_title2wikidataID-normalized_with_redirect.pkl", "rb") as f:
    lang_title2wikidataID = pickle.load(f)

# memory efficient prefix tree (trie) implemented with `marisa_trie`
with open("../data/titles_lang_all105_marisa_trie_with_redirect.pkl", "rb") as f:
    trie = pickle.load(f)

# generate Wikipedia titles and language IDs
model = mGENRE.from_pretrained("../models/fairseq_multilingual_entity_disambiguation").eval()

model.sample(
    sentences=["[START] Einstein [END] era un fisico tedesco."],
    # Italian for "[START] Einstein [END] was a German physicist."
    prefix_allowed_tokens_fn=lambda batch_id, sent: [
        e for e in trie.get(sent.tolist()) if e < len(model.task.target_dictionary)
    ],
    text_to_id=lambda x: max(lang_title2wikidataID[
        tuple(reversed(x.split(" >> ")))
    ], key=lambda y: int(y[1:])),
    marginalize=True,
)

And the error is NameError: name 'batched_hypos' is not defined

Thank you.

The text was updated successfully, but these errors were encountered:

nicola-decao · 2023-02-03T09:24:25Z

Can you post the full error stack?

mrpeerat · 2023-02-03T12:03:03Z

Sure.

2023-02-03 09:13:50 | INFO | fairseq.tasks.fairseq_task | can_reuse_epoch_itr = False
2023-02-03 09:13:50 | INFO | fairseq.tasks.fairseq_task | reuse_dataloader = True
2023-02-03 09:13:50 | INFO | fairseq.tasks.fairseq_task | rebuild_batches = False
2023-02-03 09:13:50 | INFO | fairseq.tasks.fairseq_task | creating new batches for epoch 1
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In [137], line 1
----> 1 model.sample(
      2     sentences=["[START] Einstein [END] era un fisico tedesco."],
      3     # Italian for "[START] Einstein [END] was a German physicist."
      4     prefix_allowed_tokens_fn=lambda batch_id, sent: [
      5         e for e in trie.get(sent.tolist()) if e < len(model.task.target_dictionary)
      6     ],
      7     text_to_id=lambda x: max(lang_title2wikidataID[
      8         tuple(reversed(x.split(" >> ")))
      9     ], key=lambda y: int(y[1:])),
     10     marginalize=True,
     11 )

File ~/GENRE/genre/fairseq_model.py:53, in _GENREHubInterface.sample(self, sentences, beam, verbose, text_to_id, marginalize, marginalize_lenpen, max_len_a, max_len_b, **kwargs)
     36 batched_hypos = self.generate(
     37     tokenized_sentences,
     38     beam,
   (...)
     42     **kwargs,
     43 )
     45 outputs = [
     46     [
     47         {"text": self.decode(hypo["tokens"]), "score": hypo["score"]}
   (...)
     50     for hypos in batched_hypos
     51 ]
---> 53 outputs = post_process_wikidata(
     54     outputs, text_to_id=text_to_id, marginalize=marginalize
     55 )
     57 return outputs

File ~/GENRE/genre/utils.py:492, in post_process_wikidata(outputs, text_to_id, marginalize)
    486 outputs = [
    487     [{**hypo, "id": text_to_id(hypo["text"])} for hypo in hypos]
    488     for hypos in outputs
    489 ]
    491 if marginalize:
--> 492     for (i, hypos), hypos_tok in zip(enumerate(outputs), batched_hypos):
    493         outputs_dict = defaultdict(list)
    494         for hypo, hypo_tok in zip(hypos, hypos_tok):

NameError: name 'batched_hypos' is not defined

wanyanbin1998y · 2023-03-21T12:54:20Z

当然。

2023-02-03 09:13:50 | INFO | fairseq.tasks.fairseq_task | can_reuse_epoch_itr = False
2023-02-03 09:13:50 | INFO | fairseq.tasks.fairseq_task | reuse_dataloader = True
2023-02-03 09:13:50 | INFO | fairseq.tasks.fairseq_task | rebuild_batches = False
2023-02-03 09:13:50 | INFO | fairseq.tasks.fairseq_task | creating new batches for epoch 1
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In [137], line 1
----> 1 model.sample(
      2     sentences=["[START] Einstein [END] era un fisico tedesco."],
      3     # Italian for "[START] Einstein [END] was a German physicist."
      4     prefix_allowed_tokens_fn=lambda batch_id, sent: [
      5         e for e in trie.get(sent.tolist()) if e < len(model.task.target_dictionary)
      6     ],
      7     text_to_id=lambda x: max(lang_title2wikidataID[
      8         tuple(reversed(x.split(" >> ")))
      9     ], key=lambda y: int(y[1:])),
     10     marginalize=True,
     11 )

File ~/GENRE/genre/fairseq_model.py:53, in _GENREHubInterface.sample(self, sentences, beam, verbose, text_to_id, marginalize, marginalize_lenpen, max_len_a, max_len_b, **kwargs)
     36 batched_hypos = self.generate(
     37     tokenized_sentences,
     38     beam,
   (...)
     42     **kwargs,
     43 )
     45 outputs = [
     46     [
     47         {"text": self.decode(hypo["tokens"]), "score": hypo["score"]}
   (...)
     50     for hypos in batched_hypos
     51 ]
---> 53 outputs = post_process_wikidata(
     54     outputs, text_to_id=text_to_id, marginalize=marginalize
     55 )
     57 return outputs

File ~/GENRE/genre/utils.py:492, in post_process_wikidata(outputs, text_to_id, marginalize)
    486 outputs = [
    487     [{**hypo, "id": text_to_id(hypo["text"])} for hypo in hypos]
    488     for hypos in outputs
    489 ]
    491 if marginalize:
--> 492     for (i, hypos), hypos_tok in zip(enumerate(outputs), batched_hypos):
    493         outputs_dict = defaultdict(list)
    494         for hypo, hypo_tok in zip(hypos, hypos_tok):

NameError: name 'batched_hypos' is not defined

Has the problem been solved? How did you solve it?

highly0 · 2023-04-25T09:19:34Z

Same issues. Any update?

EmanuelaBoros · 2023-08-05T09:38:33Z

The solution is to modify this method to receive batched_hypos:

def post_process_wikidata(outputs, text_to_id=False, marginalize=False,
                          batched_hypos=None, marginalize_lenpen=0.5):

    if text_to_id:
        outputs = [
            [{**hypo, "id": text_to_id(hypo["text"])} for hypo in hypos]
            for hypos in outputs
        ]

        if marginalize:
            for (
                    i, hypos), hypos_tok in zip(
                    enumerate(outputs), batched_hypos):
                outputs_dict = defaultdict(list)
                for hypo, hypo_tok in zip(hypos, hypos_tok):
                    outputs_dict[hypo["id"]].append(
                        {**hypo, "len": len(hypo_tok["tokens"])}
                    )

                outputs[i] = sorted(
                    [
                        {
                            "id": _id,
                            "texts": [hypo["text"] for hypo in hypos],
                            "scores": torch.stack([hypo["score"] for hypo in hypos]),
                            "score": torch.stack(
                                [
                                    hypo["score"]
                                    * hypo["len"]
                                    / (hypo["len"] ** marginalize_lenpen)
                                    for hypo in hypos
                                ]
                            ).logsumexp(-1),
                        }
                        for _id, hypos in outputs_dict.items()
                    ],
                    key=lambda x: x["score"],
                    reverse=True,
                )

    return outputs

And then you can call it in class _GENREHubInterface with:

outputs = post_process_wikidata(
            outputs,
            text_to_id=text_to_id,
            marginalize=marginalize,
            batched_hypos=batched_hypos,
            marginalize_lenpen=marginalize_lenpen)

mayank-soni pushed a commit to mayank-soni/GENRE that referenced this issue Sep 26, 2023

Edit as per issue facebookresearch#95

44d1fad

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NameError: name 'batched_hypos' is not defined (mGENRE) #95

NameError: name 'batched_hypos' is not defined (mGENRE) #95

mrpeerat commented Feb 3, 2023

nicola-decao commented Feb 3, 2023

mrpeerat commented Feb 3, 2023

wanyanbin1998y commented Mar 21, 2023

highly0 commented Apr 25, 2023

EmanuelaBoros commented Aug 5, 2023

NameError: name 'batched_hypos' is not defined (mGENRE) #95

NameError: name 'batched_hypos' is not defined (mGENRE) #95

Comments

mrpeerat commented Feb 3, 2023

nicola-decao commented Feb 3, 2023

mrpeerat commented Feb 3, 2023

wanyanbin1998y commented Mar 21, 2023

highly0 commented Apr 25, 2023

EmanuelaBoros commented Aug 5, 2023