In [17]:
from thematic_alignment import EmbeddingModel, cosine_alignment
import pandas as pd

In [2]:
# Aim & Scope of PLOS Computational Biology journal
aim_and_scope = """
PLOS Computational Biology features works of exceptional significance that further our understanding of living systems at all scales—from molecules and cells, to patient populations and ecosystems—through the application of computational methods including applications of artificial intelligence and machine learning.Readers include life and computational scientists, who can take the important findings presented here to the next level of discovery.
Research articles model all aspects of biological systems and demonstrate novel scientific advances, through the introduction of novel methods, software, or tools, or through the application of computational methods to provide profound new biological insights. Research articles must be declared as belonging to a relevant section.
PLOS Computational Biology publishes three types of research articles: Research, Methods, and Software. Articles specifically designated as Methods or Software papers should describe outstanding new methods or software of exceptional importance that can provide new biological insights. The method or software must have the potential for being widely adopted by a broad community of users. Enhancements to existing published methods or software will only be considered if those enhancements bring exceptional new capabilities.
Generally, reliability and significance of biological discovery through computation should be validated and enriched by experimental studies and/or application to real-world data. Inclusion of experimental validation is not required for publication, but should be referenced where possible.
"""

# Create an instance of the EmbeddingModel with the "allenai-specter" model
embedding_model = EmbeddingModel("allenai-specter")
# Generate embedding for the Aim & Scope
aim_scope_embedding = embedding_model.embed(texts=[aim_and_scope], normalize_emb=True)

print("Aim & Scope embedding vector shape:", aim_scope_embedding.shape)

Aim & Scope embedding vector shape: (1, 768)


In [10]:
raw_papers_df = pd.read_csv("../data/plos_computational_biology_raw_papers.csv")

# Generate embeddings for all paper abstracts
abstracts = raw_papers_df["abstract"].tolist()
abstracts_embeddings = embedding_model.embed(texts=abstracts, normalize_emb=True)
print("Abstracts embeddings vector shape:", abstracts_embeddings.shape)

# Add embeddings of each abstract to the paper dataframe
raw_papers_df["embedding"] = list(abstracts_embeddings)

Papers embeddings shape: (300, 768)


In [19]:
# Compute alignment scores between each paper abstract and the Aim & Scope of the journal
raw_papers_df["alignment_score"] = raw_papers_df["embedding"].apply(lambda emb: cosine_alignment(emb, aim_scope_embedding[0])) 
papers_alignment_df = raw_papers_df.copy()

papers_alignment_df

Unnamed: 0,title,abstract,year,venue,embedding,alignment_score
0,PLoS Computational Biology Issue Image | Vol. ...,The tomato flowers are characterized by posses...,2021,PLoS Computational Biology,"[0.011619969, 0.00064947864, -0.012559804, 0.0...",0.635306
1,PLoS Computational Biology Issue Image | Vol. ...,The tomato flowers are characterized by posses...,2021,PLoS Computational Biology,"[0.010219732, 0.00018373683, -0.012367555, 0.0...",0.629383
2,Improving reproducibility in computational bio...,"1 Department of Biomedical Engineering, Univer...",2020,PLoS Comput. Biol.,"[-0.037152324, 0.013525118, -0.034178115, -0.0...",0.752962
3,Ten simple rules for designing learning experi...,Wikipedia is the largest and most visited ency...,2020,PLoS Comput. Biol.,"[-0.030761879, 0.0223286, -0.014010919, -0.005...",0.836371
4,10 simple rules for teaching wet-lab experimen...,1 Joint Carnegie Mellon–University of Pittsbur...,2020,PLoS Comput. Biol.,"[-0.029311182, 0.030932372, -0.03213358, -0.00...",0.821232
...,...,...,...,...,...,...
295,Let's Make Those Book Chapters Open Too!,"As authors, many of us have had less than sati...",2013,PLoS Comput. Biol.,"[-0.0361535, 0.05083255, -0.017369438, -0.0172...",0.902531
296,Ten simple rules to aid in achieving a vision,"In a career that now spans 40 years, I have ha...",2019,PLoS Comput. Biol.,"[-0.034074217, 0.009217825, -0.03960444, -0.01...",0.918672
297,What Do I Want from the Publisher of the Future?,When I took on the role of Editor-in-Chief of ...,2010,PLoS Comput. Biol.,"[-0.04797162, 0.03185367, -0.026495604, -0.009...",0.823654
298,The Signal in the Genomes,Nostra culpa. Not only did we foist a hastily ...,2006,PLoS Comput. Biol.,"[0.0016529376, 0.0033801463, -0.014256189, 0.0...",0.820862
