This repository contains data files generated based on a series of openly-available word embeddings for the purpose of learning related terms for COVID-19 concepts.
This work is maintained by Danielle Mowery at the University of Pennsylvania (dlmowery@pennmedicine.upenn.edu). Contributors include:
Soham Parikh, Anahita Davoudi, Shun Yu, Carolina Giraldo, Emily Schriver
Relevant publications can be found below. Please be sure to cite these manuscripts when leveraging these data for other studies and presenting/publishing your results.
-Parikh S, Davoudi A, Yu S, Giraldo C, Schriver E, Mowery DL. An Intrinsic and Extrinsic Evaluation of Learned COVID-19 Concepts using Open-Source Word Embeddings. medRxiv. 2020. https://www.medrxiv.org/content/10.1101/2020.12.29.20249005v1.full.pdf+html
This work leverages 7 openly-available word embedding resources including:
- BioNLP Lab PubMed + PMC W2V: http://evexdb.org/pmresources/vec-space-models/
- BioNLP LabWiki + PubMed + PMC W2V: http://evexdb.org/pmresources/vec-space-models/
- BioASQ: http://bioasq.org/news/bioasq-releases-continuous-space-word-vectors-obtained-applying-word2vec-pubmed-abstracts
- Clinical Embeddings W2V300: https://github.com/gweissman/clinical_embeddings
- BioWordVec Extrinsic: https://figshare.com/articles/Improving_Biomedical_Word_Embeddings_with_Subword_Information_and_MeSH_Ontology/6882647
- BioWordVec Intrinsic: https://figshare.com/articles/dataset/Improving_Biomedical_Word_Embeddings_with_Subword_Information_and_MeSH_Ontology/6882647
- Standard GloVe Embeddings: https://nlp.stanford.edu/projects/glove/