# Journal2Wiki - Few-Shot Prompting of GPT-4 
In this notebook we use our two-shot prompt method with GPT-4 

We create wiki pages out of NEJM journal articles that have been scraped and saved into a csv file.

We use the first two rows of the csv file as examples for the prompt

The csv file should have three columns 'Acronym', 'Article','Wiki'

In [58]:
# Libraries needed.  Uncomment to install them  
# Not all of these are used here but were used in development

#%pip install beautifulsoup4
#%pip install requests
#%pip install selenium
#%pip install bs4
#%pip install transformers datasets torch evaluate scikit-learn sentencepiece
#%pip install accelerate -U
#%pip install transformers[sentencepiece]
#%pip install wheel
#%pip install oauth2client --upgrade oauth2client
#%pip install google
#%pip install google-colab
#%pip install gspread
#%pip install absl-py
#%pip install rouge_score 
#%pip install nltk
#%pip install plotly
#%pip install nbformat
#%pip install ipykernel


In [67]:
# imports used 
import pandas as pd
import numpy as np
from openai import OpenAI
import pickle
import os
import re
import json
import evaluate


# Read in CSV dataset into Panda Dataframe

In [60]:
# articles_cleaned.csv came from 
csv="articles_cleaned.csv"
df = pd.read_csv(csv)


In [61]:
df

Unnamed: 0,Acronym,Article,Wiki
0,"ATLAS ACS-2, TIMI 51",Abstract BACKGROUND Acute coronary syndrome...,"""Rivaroxaban in Patients with Recent Acute Co..."
1,CAST I,Letters Abstract BACKGROUND AND METHODS. In ...,"""Mortality and morbidity in patients receivin..."
2,CHAMPION PHOENIX,Letters Abstract BACKGROUND The intensity of...,"""Effect of Platelet Inhibition with Cangrelor..."
3,CHARISMA,Letters Abstract BACKGROUND Dual antiplatele...,"""Clopidogrel and aspirin versus aspirin alone..."
4,COLCOT,Letters Abstract BACKGROUND Experimental an...,"""Efficacy and safety of low-dose colchicine a..."
...,...,...,...
326,UPLIFT,Letters Abstract BACKGROUND Previous studies...,"""A 4-Year Trial of Tiotropium in Chronic Obst..."
327,WISDOM,Letters 1 Comment Abstract BACKGROUND Treatm...,"""Withdrawal of inhaled glucocorticoids and ex..."
328,INPULSIS Trials,Letters Abstract BACKGROUND Nintedanib (form...,"""Efficacy and safety of nintedanib in idiopat..."
329,ASCEND (IPF),Letters Abstract BACKGROUND In two of three ...,"""A phase 3 trial of pirfenidone in patients w..."


In [62]:
df = df.loc[:, ~df.columns.str.contains('^Unnamed')]
df = df.dropna()



In [63]:
print(df)

                  Acronym                                            Article  \
0    ATLAS ACS-2, TIMI 51     Abstract BACKGROUND Acute coronary syndrome...   
1                  CAST I    Letters Abstract BACKGROUND AND METHODS. In ...   
2        CHAMPION PHOENIX    Letters Abstract BACKGROUND The intensity of...   
3                CHARISMA    Letters Abstract BACKGROUND Dual antiplatele...   
4                  COLCOT    Letters  Abstract BACKGROUND Experimental an...   
..                    ...                                                ...   
326                UPLIFT    Letters Abstract BACKGROUND Previous studies...   
327                WISDOM    Letters 1 Comment Abstract BACKGROUND Treatm...   
328       INPULSIS Trials    Letters Abstract BACKGROUND Nintedanib (form...   
329          ASCEND (IPF)    Letters Abstract BACKGROUND In two of three ...   
330                  RAVE    Letters Abstract BACKGROUND Cyclophosphamide...   

                                       

In [45]:
def journal2wiki_2shot(df,startrow=2,OPENAI_API_KEY="",max_rows=1,format="text"):
    # Build a few-shot prompt using the first two rows of df as examples.
    #
    # df: dataframe with columns "Article", "Wiki", and optional "Acronym"
    # startrow: which row to start with 
    # (defaults to 2 because df[0] and df[1] are used in the prompt)
    # OPENAI_API_KEY: your API Key for OpenAI 
    # format: can be format= "text" or "json_object"
    # max_rows: how many rows of df to run the task on.

    index=0 # index of df to use for the example
            # we will use index and index+1 to give two examples.
    pklname=""
    model="gpt-4-1106-preview"
    responses=[] # initialize list for the responses

    stub=f"""Example Article: {df['Article'][index]}\n Example Wiki: {df['Wiki'][index]}\n 
            Example Article: {df['Article'][index+1]}\n Example Wiki: {df['Wiki'][index+1]}\n"""
    for i in range(startrow,startrow+max_rows):
        try:
            query=f"{stub}\nArticle: {df['Article'][i]}\n Wiki:"
            messages=[
                {"role": "system", "content": f"""You are a helpful assistant 
                designed to summarize Articles into Wiki format. 
                Here are some Example Articles, corresponding Example Wiki 
                followed by your Input Article. 
                The Wiki format always has the following sections: 
                "Clinical Question","Bottom Line", "Major Points", 
                "Guidelines", "Design", "Population","Interventions","Outcomes",
                "Criticisms","Funding","Further Reading".\n
        
            Please respond with your Wiki in {format} format."""},
            {"role": "user", "content": query}
            ]
            client = OpenAI(api_key=OPENAI_API_KEY)
            response   = client.chat.completions.create(
            model  = model,
            response_format = { "type": format},
            messages = messages,
            )
            #response = ask_gpt(model,messages,OPENAI_API_KEY,format)
        except Exception as exc:
            # we probably exceeded max tokens/day limit. 
            # Save results to a pkl file
            # (Common reason for erroring out)
            if i>startrow:
                pklname=f'wiki_pages_text_{startrow}-{i-1}.pkl'
                with open(pklname, 'wb') as handle:
                    pickle.dump(responses, handle, protocol=pickle.HIGHEST_PROTOCOL)
                handle.close()
                print(f"saved to {pklname} ")
            else:
                print(f"Nothing was run, sorry!")
            return responses,i,pklname
        responses.append(response.choices[0].message.content)
        print(response.choices[0].message.content)
    # Save results to a pkl file
    pklname=f'wiki_pages_text_{startrow}-{i}.pkl'
    with open(pklname, 'wb') as handle:
        pickle.dump(responses, handle, protocol=pickle.HIGHEST_PROTOCOL)
    handle.close()
    print(f"saved to {pklname} ")
    return responses,i,pklname

In [65]:
def load_pickles(pickles):
    alltext=[]
    alltext_array=[]

    lens = []
    for i,pkl in enumerate(pickles):

        with open(pkl, 'rb') as handle:
            wiki_pages_i = pickle.load(handle)
            lens.append(len(wiki_pages_i))
        handle.close()
        alltext_array.append(wiki_pages_i)
        for j,page in enumerate(wiki_pages_i):
            alltext.append(page)
    return alltext, alltext_array, lens    


# Now we can send our prompt to GPT-4!

In [49]:
i_start=2
print(f"Starting at i={i_start}")
new_responses = []
# set testing=False to run all rows
testing=True

if testing:
    max_rows=1
else:
    max_rows=len(df)

OPENAI_API_KEY="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"

wiki_pages_text_new,i_end,pklname_new = journal2wiki_2shot(df,i_start,OPENAI_API_KEY,max_rows)
if len(pklname_new)>0:
  newtext,newtext_arr,newtext_len  = load_pickles([pklname_new])
  for i,text in enumerate(newtext):
    new_responses.append(text)
  pklname_out=f'wiki_pages_{i_start}-{len(new_responses)+1}.pkl'
  with open(pklname_out, 'wb') as handle:
    pickle.dump(new_responses, handle, protocol=pickle.HIGHEST_PROTOCOL)
    handle.close()
  print(f"wrote {pklname_out}")

Starting at i=2
"Cangrelor during Percutaneous Coronary Intervention".The New England Journal of Medicine. (Published date not provided).

**Clinical Question**
Does intravenous cangrelor reduce ischemic complications during percutaneous coronary intervention (PCI), and how does it compare with clopidogrel in terms of safety and efficacy?

**Bottom Line**
Cangrelor significantly reduced the rate of ischemic events, including stent thrombosis, during PCI without a significant increase in severe bleeding compared to clopidogrel.

**Major Points**
- Cangrelor, an intravenous, fast-acting antiplatelet drug, has rapid onset and reversible effects beneficial for urgent or periprocedural treatment in patients undergoing PCI.
- In this double-blind trial, cangrelor was associated with a 22% reduction in peri-procedural ischemic complications compared with clopidogrel.
- Severe bleeding did not differ significantly between the cangrelor and clopidogrel groups. However, cangrelor was linked with

# Load previously generated responses


In [68]:
pickles_merged = ["wiki_pages_text_2-330.pkl",]
responses, alltext_array_merged, lens_merged= load_pickles(pickles_merged)
len(responses)

329

# Save previously generated responses to txt files


In [69]:
# This code will generate all the GPT generated wiki pages as txt files
cwd = os.getcwd()
dirname = cwd #"journal2wiki"
txtname = "text"
d = os.path.join(cwd,dirname)
if not os.path.exists(d):
    os.makedirs(d)
d = os.path.join(d,txtname)
if not os.path.exists(d):
    os.makedirs(d)
fnames = []
titles = []

    
for i,text in enumerate(responses):
    #title = re.findall(r'["][\w\s]+["]',text[0])
    if isinstance(text,list):
        text = " ".join(text)
    title = re.findall(r'^(.*)The New England Journal of Medicine',text)
    if len(title)<1:
        title = re.findall(r'\"(.*)\"',text)
    if len(title)<1:
        title = re.findall(r'^(.*)\"',text)
    if len(title)<1:
        title = re.findall(r'^(.*)\n',text)
        #print(text)
    if len(title)<1:
        print(str(text[0]))
    title = title[0]
    title = re.sub("\"",'',title)
    title = re.sub("\'",'',title)
    titles.append(title)
    fname = f"article{i+2}.txt"
    fname = os.path.join(d,fname)
    with open(fname,'w') as f:
        f.writelines(text)
    f.close()
    fnames.append(fname)

In [54]:
# spot checking
responses[0]

['"Cangrelor During PCI".The New England Journal of Medicine. 2013. 368:1303-1313.PubMed•Full text•PDF\n\nContents\n1 Clinical Question\n2 Bottom Line\n3 Major Points\n4 Guidelines\n5 Design\n6 Population\n6.1 Inclusion Criteria\n6.2 Exclusion Criteria\n6.3 Baseline Characteristics\n7 Interventions\n8 Outcomes\n8.1 Primary Outcome\n8.2 Secondary Outcomes\n9 Criticisms\n10 Funding\n11 Further Reading\n\nClinical Question\nIn patients undergoing PCI and receiving guideline-recommended therapy, does cangrelor compared to clopidogrel reduce periprocedural ischemic complications without increasing the risk of severe bleeding?\n\nBottom Line\nIn patients undergoing PCI, cangrelor reduced ischemic events, including stent thrombosis during PCI, with no significant increase in severe bleeding compared to clopidogrel.\n\nMajor Points\nCangrelor is a rapid-onset, reversible intravenous ADP P2Y12 inhibitor which may provide a benefit over clopidogrel in reducing ischemic complications during PCI. 

In [55]:
df["Wiki"][2]

' "Effect of Platelet Inhibition with Cangrelor during PCI on Ischemic Events".The New England Journal of Medicine. 2013. 368(14):1303-1313.PubMed•Full text•PDFContents1Clinical Question2Bottom Line3Major Points4Guidelines5Design6Population6.1Inclusion Criteria6.2Exclusion Criteria6.3Baseline Characteristics7Interventions8Outcomes8.1Primary Outcomes8.2Secondary Outcomes8.3Other Outcomes8.4Subgroup Analysis8.5Adverse Events8.6Additional Analyses9Criticisms10Funding11Further ReadingClinical QuestionAmong patients undergoing urgent or elective PCI, does cangrelor reduce the risk of ischemic events, as compared to clopidogrel?Bottom LineAmong patients undergoing urgent or elective PCI, cangrelor reduces the risk of ischemic events, as compared to clopidogrel.Major PointsThe effectiveness of P2Y12-receptor antagonist (eg, clopidogrel, prasugrel, ticagrelor) in ACS has been demonstrated in multiple trials including theCURE,COMMIT,TRITON-TIMI 38, andPLATO. Periprocedural administration of the

In [56]:
df["Article"][2]

"  Letters Abstract BACKGROUND The intensity of antiplatelet therapy during percutaneous coronary intervention (PCI) is an important determinant of PCI-related ischemic complications. Cangrelor is a potent intravenous adenosine diphosphate (ADP)–receptor antagonist that acts rapidly and has quickly reversible effects. METHODS In a double-blind, placebo-controlled trial, we randomly assigned 11,145 patients who were undergoing either urgent or elective PCI and were receiving guideline-recommended therapy to receive a bolus and infusion of cangrelor or to receive a loading dose of 600 mg or 300 mg of clopidogrel. The primary efficacy end point was a composite of death, myocardial infarction, ischemia-driven revascularization, or stent thrombosis at 48 hours after randomization; the key secondary end point was stent thrombosis at 48 hours. The primary safety end point was severe bleeding at 48 hours. RESULTS The rate of the primary efficacy end point was 4.7% in the cangrelor group and 5.

In [57]:
df['Acronym'][2]

'CHAMPION PHOENIX'

# Other code tidbits

In [None]:
def ask_gpt(model="gpt-4-1106-preview",messages="",OPENAI_API_KEY="",format="text"):
  # function to pass a prompt to the openai API
  # model = "gpt-4-1106-preview" # or     #model="gpt-3.5-turbo-1106", etc.
  # messages = ["list of text prompts fpr GPT"]
  # OPENAI_API_KEY = "" # your OpenAI API key
  # format="json_object" or "text"
  client = OpenAI(api_key=OPENAI_API_KEY)
  response = client.chat.completions.create(
    model=model,
    response_format={ "type": format},
    messages=messages,
  )
  return response.choices[0].message.content


In [124]:
def index_containing_substring(the_list, substring):
    all = []
    for i, s in enumerate(the_list):
        if substring.lower() in s.lower():
              all.append(i)
    return all

In [199]:
ind = index_containing_substring(responses, "sepsis")
print(ind)
responses[ind[0]]

[47, 49, 161, 185, 187, 188]


'"Mild Therapeutic Hypothermia to Improve the Neurologic Outcome after Cardiac Arrest". The New England Journal of Medicine. Published date not stated.\n\nContents\n\n1. Clinical Question\n2. Bottom Line\n3. Major Points\n4. Guidelines\n5. Design\n6. Population\n6.1 Inclusion Criteria\n6.2 Exclusion Criteria\n6.3 Baseline Characteristics\n7. Interventions\n8. Outcomes\n8.1 Primary Outcome\n8.2 Secondary Outcomes\n9. Funding\n10. Further Reading\n\nClinical Question\n\nDoes mild systemic hypothermia increase the rate of neurologic recovery after resuscitation from cardiac arrest due to ventricular fibrillation?\n\nBottom Line\n\nMild systemic hypothermia (32°C to 34°C) for 24 hours improves neurologic outcomes and reduces mortality at six months among patients successfully resuscitated from cardiac arrest due to ventricular fibrillation.\n\nMajor Points\n\nThe study demonstrated the effectiveness of therapeutic hypothermia in patients who had a cardiac arrest due to ventricular fibrilla

In [202]:
ind = index_containing_substring(df["Wiki"], "sepsis")
ind2 = index_containing_substring(df["Wiki"], "septic")
ind = np.unique(np.asarray(ind+ind2))

print(ind)


[  9  49  51  53  69  74 104 110 122 131 162 163 164 165 166 167 168 169
 170 171 172 173 174 175 176 177 180 181 182 183 185 186 187 188 189 190
 191 192 193 194 195 196 197 198 199 200 201 207 208 209 214 221 222 227
 229 243 248 253 266 268 269 287 305]


In [203]:
ind = index_containing_substring(df["Article"], "sepsis")
ind2 = index_containing_substring(df["Article"], "septic")
ind = np.unique(np.asarray(ind+ind2))

print(ind)


[  4   9  19  24  42  49  51  69 104 122 131 161 162 163 164 168 169 170
 172 173 174 176 177 180 181 182 183 186 187 188 189 190 192 193 194 195
 196 197 198 199 200 201 207 208 212 214 215 218 224 243 252 254 255 266
 269 303 305 307 309 312 322]


In [214]:
print(df["Wiki"][191])


 "Effect of hydrocortisone on development of shock among patients with severe sepsis".Journal of the American Medical Association. 2016. 316(17):1775-1785.PubMed•Full text•PDFContents1Clinical Question2Bottom Line3Major Points4Guidelines5Design6Population6.1Inclusion Criteria6.2Exclusion Criteria6.3Baseline Characteristics7Interventions8Outcomes8.1Primary Outcomes8.2Secondary Outcomes8.3Subgroup Analysis8.4Adverse Events9Criticisms10Funding11Further ReadingClinical QuestionIn patients with severe sepsis, does early hydrocortisone therapy reduce progression to septic shock compared to placebo?Bottom LineIn patients with severe sepsis, hydrocortisone in the first 48 hours did not reduce progression to septic shock at 14 days compared to placebo. Hydrocortisone therapy also failed to result in improvement in time to septic shock or short- or intermediate-term mortality. Hydrocortisone was associated with a 10% absolute increase in hyperglycemia and an absolute 13% reduction in delirium.Ma

In [210]:
df["Wiki"][192]


' "A randomized trial of protocol-based care for early septic shock".The New England Journal of Medicine. 2014. 370(10):1683-1693.PubMed•Full text•PDFContents1Clinical Question2Bottom Line3Major Points4Guidelines5Design6Population6.1Inclusion Criteria6.1.1For Research Sites6.1.2For Participants6.2Exclusion Criteria6.2.1For Participants6.3Baseline Characteristics7Interventions8Outcomes8.1Primary Outcome8.2Secondary Outcomes8.3Additional Analyses8.4Additional Analyses8.5Subgroup Analysis8.6Adverse Events9Criticisms10Funding11Further ReadingClinical QuestionAmong patients with early septic shock, is early goal-directed therapy or a novel protocol-based therapy superior to usual care in reducing all-cause in-hospital mortality at 60 days?Bottom LineAmong patients with early septic shock, there was no difference in all-cause in-hospital mortality at 60 days with management driven by early goal-directed therapy, a novel protocol-based therapy, or usual care.Major PointsA dramatic shift in th

In [None]:
wiki_max = max(df.Wiki.map(str).apply(len))
print(f"wiki_max = {wiki_max}")


wiki_max = 18160


In [None]:
article_max = max(df.Article.map(str).apply(len))
print(f"article_max = {article_max}")


article_max = 47729
