Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loading mgenre models is taking 44GB RAM #102

Open
banyous opened this issue Sep 25, 2023 · 0 comments
Open

Loading mgenre models is taking 44GB RAM #102

banyous opened this issue Sep 25, 2023 · 0 comments

Comments

@banyous
Copy link

banyous commented Sep 25, 2023

When I try to run this test code, my computer's memory usage shoots up to 44GB, even though the models themselves are only 7GB on my hard drive. I know that .PKL files can take up much more space in memory than their actual file size in disk. What I'm wondering is if there's a way to shrink these model files when I load them into memory so I can run the code on machines with less RAM?

import pickle
from genre.fairseq_model import mGENRE
from genre.trie import MarisaTrie, Trie

with open("../lang_title2wikidataID-normalized_with_redirect.pkl", "rb") as f:
    lang_title2wikidataID = pickle.load(f)

# memory efficient prefix tree (trie) implemented with `marisa_trie`
with open("../titles_lang_all105_marisa_trie_with_redirect.pkl", "rb") as f:
    trie = pickle.load(f)

# generate Wikipedia titles and language IDs
model = mGENRE.from_pretrained("../fairseq_multilingual_entity_disambiguation.tar.gz").eval()

model.sample(
    sentences=["[START] Einstein [END] era un fisico tedesco."],
    # Italian for "[START] Einstein [END] was a German physicist."
    prefix_allowed_tokens_fn=lambda batch_id, sent: [
        e for e in trie.get(sent.tolist()) if e < len(model.task.target_dictionary)
    ],
    text_to_id=lambda x: max(lang_title2wikidataID[
        tuple(reversed(x.split(" >> ")))
    ], key=lambda y: int(y[1:])),
    marginalize=True,
)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant