Skip to content

genexplain/Word2vec-based-Networks

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 

Repository files navigation

Word2vec-based-Networks

We generated the following two word2vec embeddings: ‘Embedding_v1’ in which synonymous terms of genes, diseases, and drugs were substituted by their preferred terms from external biomedical databases, and ‘Embedding_v2’ for which the same preprocessing strategies and the same training process were applied but without replacing synonyms.
The two generated embeddings are available for download under the following links:
Embedding_v1
Embedding_v2

Sample Code

A sample code to load the model, get the top n similar words, check if a word is in the output model vocabulary, and get a word vector.

Read an embedding file

Prerequisites:

Install gensim to load the word2vec model.

from gensim.models import KeyedVectors
model = KeyedVectors.load_word2vec_format('embedding.bin', binary=True)

Get the top n similar words ranked by cosine similarity values

similar_words= model.most_similar(word, topn= n)

Check if a word is in the vocabulary of the output model

if word in model.wv.vocab:
    print('True')
else:
    print('False')

Get a word numerical vector of 300 dimensions

vector= model.wv[word]

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published