Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NameError: name 'batched_hypos' is not defined (mGENRE) #95

Open
mrpeerat opened this issue Feb 3, 2023 · 5 comments
Open

NameError: name 'batched_hypos' is not defined (mGENRE) #95

mrpeerat opened this issue Feb 3, 2023 · 5 comments

Comments

@mrpeerat
Copy link

mrpeerat commented Feb 3, 2023

Hi!

I ran the mGENRE example in the readme

import pickle

from genre.fairseq_model import mGENRE
from genre.trie import MarisaTrie, Trie

with open("../data/lang_title2wikidataID-normalized_with_redirect.pkl", "rb") as f:
    lang_title2wikidataID = pickle.load(f)

# memory efficient prefix tree (trie) implemented with `marisa_trie`
with open("../data/titles_lang_all105_marisa_trie_with_redirect.pkl", "rb") as f:
    trie = pickle.load(f)

# generate Wikipedia titles and language IDs
model = mGENRE.from_pretrained("../models/fairseq_multilingual_entity_disambiguation").eval()

model.sample(
    sentences=["[START] Einstein [END] era un fisico tedesco."],
    # Italian for "[START] Einstein [END] was a German physicist."
    prefix_allowed_tokens_fn=lambda batch_id, sent: [
        e for e in trie.get(sent.tolist()) if e < len(model.task.target_dictionary)
    ],
    text_to_id=lambda x: max(lang_title2wikidataID[
        tuple(reversed(x.split(" >> ")))
    ], key=lambda y: int(y[1:])),
    marginalize=True,
)

And the error is NameError: name 'batched_hypos' is not defined
Screenshot 2023-02-03 at 09 14 00

Thank you.

@nicola-decao
Copy link
Contributor

Can you post the full error stack?

@mrpeerat
Copy link
Author

mrpeerat commented Feb 3, 2023

Sure.

2023-02-03 09:13:50 | INFO | fairseq.tasks.fairseq_task | can_reuse_epoch_itr = False
2023-02-03 09:13:50 | INFO | fairseq.tasks.fairseq_task | reuse_dataloader = True
2023-02-03 09:13:50 | INFO | fairseq.tasks.fairseq_task | rebuild_batches = False
2023-02-03 09:13:50 | INFO | fairseq.tasks.fairseq_task | creating new batches for epoch 1
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In [137], line 1
----> 1 model.sample(
      2     sentences=["[START] Einstein [END] era un fisico tedesco."],
      3     # Italian for "[START] Einstein [END] was a German physicist."
      4     prefix_allowed_tokens_fn=lambda batch_id, sent: [
      5         e for e in trie.get(sent.tolist()) if e < len(model.task.target_dictionary)
      6     ],
      7     text_to_id=lambda x: max(lang_title2wikidataID[
      8         tuple(reversed(x.split(" >> ")))
      9     ], key=lambda y: int(y[1:])),
     10     marginalize=True,
     11 )

File ~/GENRE/genre/fairseq_model.py:53, in _GENREHubInterface.sample(self, sentences, beam, verbose, text_to_id, marginalize, marginalize_lenpen, max_len_a, max_len_b, **kwargs)
     36 batched_hypos = self.generate(
     37     tokenized_sentences,
     38     beam,
   (...)
     42     **kwargs,
     43 )
     45 outputs = [
     46     [
     47         {"text": self.decode(hypo["tokens"]), "score": hypo["score"]}
   (...)
     50     for hypos in batched_hypos
     51 ]
---> 53 outputs = post_process_wikidata(
     54     outputs, text_to_id=text_to_id, marginalize=marginalize
     55 )
     57 return outputs

File ~/GENRE/genre/utils.py:492, in post_process_wikidata(outputs, text_to_id, marginalize)
    486 outputs = [
    487     [{**hypo, "id": text_to_id(hypo["text"])} for hypo in hypos]
    488     for hypos in outputs
    489 ]
    491 if marginalize:
--> 492     for (i, hypos), hypos_tok in zip(enumerate(outputs), batched_hypos):
    493         outputs_dict = defaultdict(list)
    494         for hypo, hypo_tok in zip(hypos, hypos_tok):

NameError: name 'batched_hypos' is not defined

@wanyanbin1998y
Copy link

当然。

2023-02-03 09:13:50 | INFO | fairseq.tasks.fairseq_task | can_reuse_epoch_itr = False
2023-02-03 09:13:50 | INFO | fairseq.tasks.fairseq_task | reuse_dataloader = True
2023-02-03 09:13:50 | INFO | fairseq.tasks.fairseq_task | rebuild_batches = False
2023-02-03 09:13:50 | INFO | fairseq.tasks.fairseq_task | creating new batches for epoch 1
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In [137], line 1
----> 1 model.sample(
      2     sentences=["[START] Einstein [END] era un fisico tedesco."],
      3     # Italian for "[START] Einstein [END] was a German physicist."
      4     prefix_allowed_tokens_fn=lambda batch_id, sent: [
      5         e for e in trie.get(sent.tolist()) if e < len(model.task.target_dictionary)
      6     ],
      7     text_to_id=lambda x: max(lang_title2wikidataID[
      8         tuple(reversed(x.split(" >> ")))
      9     ], key=lambda y: int(y[1:])),
     10     marginalize=True,
     11 )

File ~/GENRE/genre/fairseq_model.py:53, in _GENREHubInterface.sample(self, sentences, beam, verbose, text_to_id, marginalize, marginalize_lenpen, max_len_a, max_len_b, **kwargs)
     36 batched_hypos = self.generate(
     37     tokenized_sentences,
     38     beam,
   (...)
     42     **kwargs,
     43 )
     45 outputs = [
     46     [
     47         {"text": self.decode(hypo["tokens"]), "score": hypo["score"]}
   (...)
     50     for hypos in batched_hypos
     51 ]
---> 53 outputs = post_process_wikidata(
     54     outputs, text_to_id=text_to_id, marginalize=marginalize
     55 )
     57 return outputs

File ~/GENRE/genre/utils.py:492, in post_process_wikidata(outputs, text_to_id, marginalize)
    486 outputs = [
    487     [{**hypo, "id": text_to_id(hypo["text"])} for hypo in hypos]
    488     for hypos in outputs
    489 ]
    491 if marginalize:
--> 492     for (i, hypos), hypos_tok in zip(enumerate(outputs), batched_hypos):
    493         outputs_dict = defaultdict(list)
    494         for hypo, hypo_tok in zip(hypos, hypos_tok):

NameError: name 'batched_hypos' is not defined

Has the problem been solved? How did you solve it?

@highly0
Copy link

highly0 commented Apr 25, 2023

Same issues. Any update?

@EmanuelaBoros
Copy link

The solution is to modify this method to receive batched_hypos:

def post_process_wikidata(outputs, text_to_id=False, marginalize=False,
                          batched_hypos=None, marginalize_lenpen=0.5):

    if text_to_id:
        outputs = [
            [{**hypo, "id": text_to_id(hypo["text"])} for hypo in hypos]
            for hypos in outputs
        ]

        if marginalize:
            for (
                    i, hypos), hypos_tok in zip(
                    enumerate(outputs), batched_hypos):
                outputs_dict = defaultdict(list)
                for hypo, hypo_tok in zip(hypos, hypos_tok):
                    outputs_dict[hypo["id"]].append(
                        {**hypo, "len": len(hypo_tok["tokens"])}
                    )

                outputs[i] = sorted(
                    [
                        {
                            "id": _id,
                            "texts": [hypo["text"] for hypo in hypos],
                            "scores": torch.stack([hypo["score"] for hypo in hypos]),
                            "score": torch.stack(
                                [
                                    hypo["score"]
                                    * hypo["len"]
                                    / (hypo["len"] ** marginalize_lenpen)
                                    for hypo in hypos
                                ]
                            ).logsumexp(-1),
                        }
                        for _id, hypos in outputs_dict.items()
                    ],
                    key=lambda x: x["score"],
                    reverse=True,
                )

    return outputs

And then you can call it in class _GENREHubInterface with:

outputs = post_process_wikidata(
            outputs,
            text_to_id=text_to_id,
            marginalize=marginalize,
            batched_hypos=batched_hypos,
            marginalize_lenpen=marginalize_lenpen)

mayank-soni pushed a commit to mayank-soni/GENRE that referenced this issue Sep 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants