# Description

This notebook is a template notebook that is intended to be run across different parameters.

Based on the settings below, it loads an input file with paragraph pairs (original and revised) and uses the LLM-as-a-Judge approach to evaluate the quality of the paragraphs.

# Modules

In [1]:
import pandas as pd
from IPython.display import display
from langchain.cache import SQLiteCache
from langchain.globals import set_llm_cache
from proj import conf
from proj.utils import llm_pairwise

# Settings/paths

In [2]:
# Input manuscript
REPO = None

INPUT_FILE = None
OUTPUT_FILE = None

# Model and its parameters
LLM_JUDGE = None
TEMPERATURE = None
MAX_TOKENS = 2000
SEED_INIT = 0

# Evaluation parameters
N_REPS = None
THROW_IF_FAILED = False

In [3]:
# Parameters
REPO = "pivlab/manubot-ai-editor-code-test-phenoplier-manuscript"
INPUT_FILE = "/home/miltondp/projects/others/manubot/manubot-ai-editor-code/base/results/paragraph_match/phenoplier-manuscript--gpt-3.5-turbo.pkl"
OUTPUT_FILE = "/home/miltondp/projects/others/manubot/manubot-ai-editor-code/base/results/llm_pairwise/phenoplier-manuscript--gpt-3.5-turbo--openai_gpt-3.5-turbo.pkl"
LLM_JUDGE = "openai:gpt-3.5-turbo"
TEMPERATURE = 0.5
MAX_TOKENS = 2000
SEED_INIT = 0
N_REPS = 5


In [4]:
conf.common.LLM_CACHE_DIR.mkdir(parents=True, exist_ok=True)
display(conf.common.LLM_CACHE_DIR)

PosixPath('/home/miltondp/projects/others/manubot/manubot-ai-editor-code/base/results/llm_cache')

# Set default LangChain cache file

In [5]:
default_cache_file = conf.common.LLM_CACHE_DIR / "default.db"
display(default_cache_file)
set_llm_cache(SQLiteCache(database_path=str(default_cache_file)))

PosixPath('/home/miltondp/projects/others/manubot/manubot-ai-editor-code/base/results/llm_cache/default.db')

# Load paragraphs

In [6]:
df = pd.read_pickle(INPUT_FILE)

In [7]:
df.shape

(63, 3)

In [8]:
df.head()

Unnamed: 0,section,original,modified
0,abstract,Genes act in concert with each other in specif...,How do genes interact to influence complex tra...
1,introduction,Genes work together in context-specific networ...,Genes work together in specific networks to ca...
2,introduction,Given the availability of gene expression data...,With the abundance of gene expression data ava...
3,introduction,These gene-gene interactions play a crucial ro...,Gene-gene interactions are essential in curren...
4,introduction,"Here we propose PhenoPLIER, an omnigenic appro...",In this paper titled 'Projecting genetic assoc...


In [9]:
df.iloc[0]["original"]

'Genes act in concert with each other in specific contexts to perform their functions. Determining how these genes influence complex traits requires a mechanistic understanding of expression regulation across different conditions. It has been shown that this insight is critical for developing new therapies. Transcriptome-wide association studies have helped uncover the role of individual genes in disease-relevant mechanisms. However, modern models of the architecture of complex traits predict that gene-gene interactions play a crucial role in disease origin and progression. Here we introduce PhenoPLIER, a computational approach that maps gene-trait associations and pharmacological perturbation data into a common latent representation for a joint analysis. This representation is based on modules of genes with similar expression patterns across the same conditions. We observe that diseases are significantly associated with gene modules expressed in relevant cell types, and our approach i

In [10]:
df.iloc[0]["modified"]

'How do genes interact to influence complex traits and disease mechanisms, and how can this knowledge be leveraged for therapeutic development? This paper introduces PhenoPLIER, a computational approach that integrates gene-trait associations and pharmacological data to analyze gene expression patterns across different conditions. By identifying modules of co-expressed genes associated with diseases and drug mechanisms, PhenoPLIER can accurately predict drug-disease pairs and infer mechanisms of action. Through a CRISPR screen focused on lipid regulation, we demonstrate that PhenoPLIER can prioritize functionally important genes within trait-associated modules. This approach highlights the importance of considering gene-gene interactions in understanding disease etiology and identifying potential therapeutic targets for drug repurposing.'

# Test run

In [11]:
t_json = llm_pairwise(
    df.iloc[0]["original"],
    df.iloc[0]["modified"],
    df.iloc[0]["section"],
    model_name=LLM_JUDGE,
    model_params={
        "temperature": TEMPERATURE,
        "max_tokens": MAX_TOKENS,
        "model_kwargs": {
            "seed": SEED_INIT,
        },
    },
    verbose=True,
)



[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mSystem: You are an expert copyeditor with ample experience in scientific writing. You are assessing the quality of two versions of a paragraph from the Abstract of a scientific article.
Human: Evaluate the quality of the following paragraph by writing a list with positive (if any) and/or negative (if any) aspects on the following areas: 1) has a clear sentence structure, 2) is easy to follow, 3) is correct in grammar, 4) has no spelling errors.

Paragraph A: Genes act in concert with each other in specific contexts to perform their functions. Determining how these genes influence complex traits requires a mechanistic understanding of expression regulation across different conditions. It has been shown that this insight is critical for developing new therapies. Transcriptome-wide association studies have helped uncover the role of individual genes in disease-relevant mechanisms. However, modern models of th

In [12]:
t_json

{'best': 'Paragraph A',
 'rationale': 'Paragraph A has a slightly more detailed explanation of the computational approach, PhenoPLIER, and provides a more in-depth discussion on the role of gene-gene interactions in disease origin and progression. The sentence structure is clear, and the information is presented in a logical sequence.'}

In [13]:
type(t_json)

dict

# Run

Since models are stochastic, we run the pairwise comparison many times.

Here I use a cache to avoid hitting an external API multiple times.

In [14]:
results = []

In [15]:
for rep_idx in range(N_REPS):
    # we cache prompt/results by repetition
    output_cache_file = conf.common.LLM_CACHE_DIR / f"rep{rep_idx}.db"
    set_llm_cache(SQLiteCache(database_path=str(output_cache_file)))

    print(f"{str(rep_idx).zfill(2)} ({output_cache_file.name}): ", end="", flush=True)

    for par_idx, par in df.iterrows():
        print(".", end="", flush=True)

        res = llm_pairwise(
            par["original"],
            par["modified"],
            par["section"],
            model_name=LLM_JUDGE,
            model_params={
                "temperature": TEMPERATURE,
                "max_tokens": MAX_TOKENS,
                "model_kwargs": {
                    "seed": SEED_INIT + rep_idx,
                },
            },
            throw_if_failed=THROW_IF_FAILED,
            verbose=False,
        )

        results.append(
            {
                "rep_index": rep_idx,
                "paragraph_index": par_idx,
                "paragraph_section": par["section"],
                "winner": res["best"],
                "rationale": res["rationale"],
            }
        )

    print(flush=True)

00 (rep0.db): 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.




01 (rep1.db): 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.




02 (rep2.db): 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.




03 (rep3.db): 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.




04 (rep4.db): 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.




# Process results

In [16]:
winner_matchings = {
    "Paragraph A": "-1",  # Original
    "Paragraph 1": "1",  # Modified
    "tie": "0",
}

In [17]:
df_results = pd.DataFrame(results)

In [18]:
df_results.shape

(315, 5)

In [19]:
df_results.head()

Unnamed: 0,rep_index,paragraph_index,paragraph_section,winner,rationale
0,0,0,abstract,Paragraph A,Paragraph A has a slightly more detailed expla...
1,0,1,introduction,Paragraph 1,"Paragraph 1 has a clear sentence structure, is..."
2,0,2,introduction,Paragraph 1,Paragraph 1 demonstrates slightly better reada...
3,0,3,introduction,tie,Both paragraphs demonstrate a high level of qu...
4,0,4,introduction,Paragraph 1,Paragraph 1 has a slightly better structure wi...


In [20]:
df_results["winner"].value_counts()

winner
Paragraph 1    157
tie            130
Paragraph A     28
Name: count, dtype: int64

In [21]:
df_results = df_results[df_results["winner"].isin(winner_matchings.keys())]

In [22]:
df_results.shape

(315, 5)

In [23]:
df_results = df_results.assign(
    winner_score=df_results["winner"].replace(winner_matchings).apply(float)
)

In [24]:
df_results.shape

(315, 6)

In [25]:
df_results.head()

Unnamed: 0,rep_index,paragraph_index,paragraph_section,winner,rationale,winner_score
0,0,0,abstract,Paragraph A,Paragraph A has a slightly more detailed expla...,-1.0
1,0,1,introduction,Paragraph 1,"Paragraph 1 has a clear sentence structure, is...",1.0
2,0,2,introduction,Paragraph 1,Paragraph 1 demonstrates slightly better reada...,1.0
3,0,3,introduction,tie,Both paragraphs demonstrate a high level of qu...,0.0
4,0,4,introduction,Paragraph 1,Paragraph 1 has a slightly better structure wi...,1.0


In [26]:
df_results.dtypes

rep_index              int64
paragraph_index        int64
paragraph_section     object
winner                object
rationale             object
winner_score         float64
dtype: object

In [27]:
df_results.groupby("paragraph_section")["winner_score"].mean()

paragraph_section
abstract                 -0.600000
discussion                0.500000
introduction              0.550000
methods                   0.424000
results                   0.433333
supplementary material    0.257143
Name: winner_score, dtype: float64

# Save

In [28]:
df_results.to_pickle(OUTPUT_FILE)