Text-mining close to 800,000 PubMed Articles
Extracting titles from PubMed articles and using Latent Dirichlet Allocation to classify topics. The aim was to see how machine learning topic modeling ompares to topics classified manually under Medical Subject Headings (MeSH) terms by NIH PubMed library. This was done by calculating the frequency and probability of words in the titles from Cardiovascular Case Reports and comparing it with the topics modeled by Latent Dirichlet Allocation.