Skip to content
This repository has been archived by the owner on Mar 1, 2024. It is now read-only.

Generating .t7 file for inferencing #65

Open
saswatidana opened this issue Jan 19, 2021 · 5 comments
Open

Generating .t7 file for inferencing #65

saswatidana opened this issue Jan 19, 2021 · 5 comments

Comments

@saswatidana
Copy link

Hello,
I am trying to generate .t7 file for a trained model. For that I am running scripts/generate_candidates.py . This python file needs another input file saved_candidates_ids. How do I create this candidate_ids file?
Any pointer will help me to run inference code.

@ledw
Copy link
Contributor

ledw commented Jan 27, 2021

@saswatidana Thanks for the reporting. We'll update an instruction on generating candidates shortly.

@JinfengXiao
Copy link

@ledw I'm also curious about how to generate the .t7 file for a trained model. What is the format of the saved_candidates_ids file required in scripts/generate_candidates.py? Could you give me more instructions on this?

@abhinavkulkarni
Copy link

@ledw: Any updates on this?

I was also looking at scripts/generate_candidates.py script and it looks like it expects another pre-generated input file saved_candidates_ids. Digging more into the code reveals that this is a torch tensor of token_idxs of candidates.

Can you please let us know how to generate this file?

This is needed so that we can introduce new candidates from newer versions of Wikipedia.

Thanks!

@ledw-2
Copy link

ledw-2 commented Jan 14, 2022

Sorry for not updating on this.
The token_idx are generated from BERT tokenizers. The format is batch x vec where vec is the BERT token id vector of input.
It's the input "ids" from this function:
https://github.com/facebookresearch/BLINK/blob/main/blink/biencoder/data_process.py#L96

@abhinavkulkarni
Copy link

Thanks @ledw-2: I was able to follow your reply and generate embeddings for candidate entities: #106 (comment)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants