Skip to content
This repository has been archived by the owner on Mar 1, 2024. It is now read-only.

how to generate embeddings for all entities after we have the model? #21

Open
XRodriguez10 opened this issue Jun 24, 2020 · 7 comments
Open

Comments

@XRodriguez10
Copy link

I'm trying to train a biencoder model to support Chinese. After I got the trained model for biencoder, how can I get the embeddings for all entities like the given file models/all_entities_large.t7?

@XRodriguez10
Copy link
Author

Also, are you going to release instructions on how to train the models? Thanks!

@rajatinteros
Copy link

rajatinteros commented Jul 9, 2020

Hello All, I would also like to get some tips on training this architecture from scratch, and information to use this architecture as a pre-trained network on any custom dataset

@anjalibhavan
Copy link

Yes, same issue here! I would like to know how to use this for a custom dataset, and how to generate embeddings from the linked documents.

@izuna385
Copy link

izuna385 commented Jul 31, 2020

Hello, I just re-implemented hard-negative mining and scripts for encoding entities with zeshel dataset from [Logeswaran et al., '19].
See here for your information. Also this repository might be useful for re-implementation of encoding all entities.
Thanks.

@abhinavkulkarni
Copy link

Hi all,

You can refer to my comment #106 (comment) with regards to generating embeddings for new candidates for an existing model.

@JLUGQQ
Copy link

JLUGQQ commented May 2, 2022

I'm trying to train a biencoder model to support Chinese. After I got the trained model for biencoder, how can I get the embeddings for all entities like the given file models/all_entities_large.t7?

I wonder whether you have trained this model using Chinese dataset. If so, can you share me your Chinese training dataset? I also want to use this model in Chinese, but I lack Chinese dataset. Thank you very much!

@abhinavkulkarni
Copy link

With regards to training a new model with custom data, yes, it is indeed possible to do so. I would recommend first training a zero-shot learning (zeshel) model first just to get hang of the training process. The scripts to download and pre-process zeshel data are in the repository. You can then replicate the same steps, bring your data in the same format as zeshel, modify any hyperparameters (such as context length or choice of bert base model) and train your own model.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants