We generated the following two word2vec embeddings: ‘Embedding_v1’ in which synonymous terms of genes, diseases, and drugs were substituted by their preferred terms from external biomedical databases, and ‘Embedding_v2’ for which the same preprocessing strategies and the same training process were applied but without replacing synonyms.
The two generated embeddings are available for download under the following links:
Embedding_v1
Embedding_v2
A sample code to load the model, get the top n similar words, check if a word is in the output model vocabulary, and get a word vector.
Install gensim to load the word2vec model.
from gensim.models import KeyedVectors
model = KeyedVectors.load_word2vec_format('embedding.bin', binary=True)similar_words= model.most_similar(word, topn= n)if word in model.wv.vocab:
print('True')
else:
print('False')vector= model.wv[word]