## Running prompts on OpenAI

In [5]:
%pip install markdown
%pip install annoy
%pip install openai

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


In [1]:
from openai import OpenAI # OpenAI API
import json
import requests # to download some resources
import os # file operations
import numpy as np # linear algebra
import pandas as pd # data processing
from markdown import markdown # to render markdown
from IPython.display import Markdown
import annoy # Approximate Nearest Neighbors Oh Yeah for fast searching
import pickle
from annoy import AnnoyIndex

In [2]:
# read in our api key
with open('../api-keys/our_api_key.txt', 'r') as file:
    api_key = file.read().replace('\n', '')

# read in finns api key (the one we'll use for testing)
with open('../api-keys/finns_api_key.txt', 'r') as file:
    finns_api_key = file.read().replace('\n', '')

In [3]:
# Setting the OpenAI
client = OpenAI(api_key=finns_api_key) # using finns for testing

#### Sample of countries

In [4]:
# Sample of countries
sample_countries = ['Finland','Netherlands','Denmark','Norway', 'Chile', 'Bangladesh']

# Folder with raw PDFs
folder_mds = "../data/3-naps-md"

# Importing MDs files
for file in os.listdir(folder_mds):
    for country in sample_countries:
        if country in file:  
            with open(os.path.join(folder_mds, file), "r", encoding="utf-8") as md_file:
                content = md_file.read()
            globals()[country.lower()] = content # Saving the MD file in lowercase


### 1. Using the WHOLE document

#### * Asking OpenAI -- Example: Norway ||| Question: About policies/objectives

In [5]:
country_test = netherlands
country_name = "netherlands"

In [6]:
system_prompt = """
You are an assistant that extracts and summarizes national action plans for antimicrobial resistance (NAPs). Your task is to answer the following questions concisely while relying strictly on the document content. 

For each question, provide:
1. A 'yes' or 'no' answer.
2. The specific text from the document that supports the answer.

Here are the questions:

1. What is the period considered for the NAP? (Provide the exact years or date range)
2. Does the NAP include any strategy/policy/goal/action related to the priority sector of "Human Health"? (yes/no + supporting text)
3. Does the NAP include any strategy/policy/goal/action related to the priority sector of "Animal Health"? (yes/no + supporting text)
4. Does the NAP include any strategy/policy/goal/action related to the priority sector of "Environment"? (yes/no + supporting text)
5. Does the NAP include any strategy/policy/goal/action related to the priority sector of "Agriculture/Food Security"? (yes/no + supporting text)
6. Does the NAP include any mechanisms for progress reporting to track how its objectives are being met, such as an annual or semi-annual progress report, a dashboard displaying the status, or similar documents/tools? (yes/no + supporting text)
7. Does the NAP include any specific, time-bound targets to track progress? (yes/no + supporting text)
8. Does the NAP specify any budget allocation assigned for the strategies/policies/goals/actions directed at tackling AMR? (yes/no + supporting text)
9. Does the NAP establish a multisectoral committee or task force for AMR coordination? (yes/no + supporting text)
10. Does the NAP include any strategy/policy/goal/action related to training and educational initiatives to increase awareness about AMR – in school curricula or professional training programs for doctors and pharmacists? (yes/no + supporting text)
11. Does the NAP include any strategy/policy/goal/action related to public awareness campaigns in combating AMR, such as media involvement and community engagement? (yes/no + supporting text)
12. Does the NAP include any strategy/policy/goal/action related to international collaboration and support for AMR initiatives, such as partnerships with organizations like WHO, FAO, and OIE? (yes/no + supporting text)
13. Does the NAP include any strategy/policy/goal/action related to data collection and analysis for AMR surveillance, such as developing surveillance systems, data collection tools, setting up laboratories for AMR monitoring, or other similar actions? (yes/no + supporting text)
14. Does the NAP include any strategy/policy/goal/action related to developing the capacity to detect and report newly emerged resistance that may constitute a public health emergency of international concern, as required by the International Health Regulations (2005)? (yes/no + supporting text)
15. Does the NAP include any strategy/policy/goal/action related to specific prevention strategies such as sanitation, hygiene measures, and infection control? (yes/no + supporting text)
16. Does the NAP include any strategy/policy/goal/action related to implementing the standards outlined in the OIE Terrestrial and Aquatic Animal Health Codes or the FAO/WHO Codex Alimentarius Code of Practice to Minimize and Contain Antimicrobial Resistance? (yes/no + supporting text)
17. Does the NAP include any strategy/policy/goal/action related to regulating the distribution, prescription, and dispensation of antibiotics, such as developing or maintaining essential medicine lists for antibiotics? (yes/no + supporting text)
18. Does the NAP include any strategy/policy/goal/action related to strengthening the legislative and regulatory framework for AMR, including laws and regulations for antimicrobial use? (yes/no + supporting text)
19. Does the NAP include any strategy/policy/goal/action related to requiring the rational use of antibiotics through regulations/laws? (yes/no + supporting text)
20. Does the NAP include any strategy/policy/goal/action related to participating in international collaborative research to support the development of new medicines, diagnostic tools, and vaccines? (yes/no + supporting text)

Format the response as a JSON object:
{
  "period": "XXXX-YYYY",
  "questions": [
    {
      "question": "Does the NAP include any strategy/policy/goal/action related to the priority sector of 'Human Health'?",
      "answer": "yes",
      "supporting_text": "The NAP mentions specific actions in the human health sector, including..."
    },
    ...
  ]
}
"""

In [7]:
response = client.chat.completions.create(
    model="gpt-4o-mini",
    temperature=0,  # Ensures deterministic output
    top_p=0.5,  # Controls diversity
    response_format={"type": "json_object"},
    messages=[
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": country_test}
    ]
)

# Print the structured JSON response
print(response.choices[0].message.content)

{
  "period": "2015-2019",
  "questions": [
    {
      "question": "Does the NAP include any strategy/policy/goal/action related to the priority sector of 'Human Health'?",
      "answer": "yes",
      "supporting_text": "The approach to antibiotic resistance addresses all domains where human health is threatened by antibiotic-resistant bacteria; healthcare, animals, food and environment."
    },
    {
      "question": "Does the NAP include any strategy/policy/goal/action related to the priority sector of 'Animal Health'?",
      "answer": "yes",
      "supporting_text": "The goal is reduction and prudent use of antibiotics in animal farming in order to limit the development of resistance where possible."
    },
    {
      "question": "Does the NAP include any strategy/policy/goal/action related to the priority sector of 'Environment'?",
      "answer": "yes",
      "supporting_text": "Knowledge about the development of, the occurrence of and the spread of antibiotic resistance in p

In [8]:
import json

# Extract response content
response_json = json.loads(response.choices[0].message.content)

# Save response as JSON
output_path = f"../data/4-summary-responses-json/summary_response_{country_name}.json"
with open(output_path, "w") as f:
    json.dump(response_json, f, indent=4)

print(f"Saved response to {output_path}")

Saved response to ../data/4-summary-responses-json/summary_response_netherlands.json


In [17]:
# Load the annotations
annotations_df = pd.read_excel('../data/5-annotations/annotations.xlsx')

# Filter for the current country
annotations_df = annotations_df[annotations_df['country'] == country_name]

In [18]:
annotations_df

Unnamed: 0,country,question,answer,comment,Supporting text (True answer is Yes but prediction is No)
63,netherlands,year_star,2015,,
64,netherlands,year_end,2019,,
65,netherlands,1,yes,,Actors agree to the following goals: The numbe...
66,netherlands,2,yes,,The reduction of antibiotic use has flattened....
67,netherlands,3,yes,,"Despite these uncertainties, the research by Z..."
68,netherlands,4,no,,
69,netherlands,5,yes,,We will regularly inform Parliament of the pro...
70,netherlands,6,yes,,The number of avoidable healthcare-associated ...
71,netherlands,7,no,,
72,netherlands,8,no,,


In [19]:
# Extract period and questions from the model response
model_answers = [{"country": country_name, "question": "period_start", "answer": response_json["period"].split("-")[0]}]
model_answers.append({"country": country_name, "question": "period_end", "answer": response_json["period"].split("-")[1]})

for q in response_json["questions"]:
    model_answers.append({
        "country": country_name,
        "question": q["question"],
        "answer": q["answer"]
    })

# Convert to DataFrame
model_answers_df = pd.DataFrame(model_answers)


In [20]:
model_answers_df

Unnamed: 0,country,question,answer
0,netherlands,period_start,2015
1,netherlands,period_end,2019
2,netherlands,Does the NAP include any strategy/policy/goal/...,yes
3,netherlands,Does the NAP include any strategy/policy/goal/...,yes
4,netherlands,Does the NAP include any strategy/policy/goal/...,yes
5,netherlands,Does the NAP include any strategy/policy/goal/...,yes
6,netherlands,Does the NAP include any mechanisms for progre...,yes
7,netherlands,"Does the NAP include any specific, time-bound ...",yes
8,netherlands,Does the NAP specify any budget allocation ass...,no
9,netherlands,Does the NAP establish a multisectoral committ...,no


In [23]:
question_mapping = {
    "Does the NAP include any strategy/policy/goal/action related to the priority sector of 'Human Health'?": "1",
    "Does the NAP include any strategy/policy/goal/action related to the priority sector of 'Animal Health'?": "2",
    "Does the NAP include any strategy/policy/goal/action related to the priority sector of 'Environment'?": "3",
    "Does the NAP include any strategy/policy/goal/action related to the priority sector of 'Agriculture/Food Security'?": "4",
    "Does the NAP include any mechanisms for progress reporting to track how its objectives are being met, such as an annual or semi-annual progress report, a dashboard displaying the status, or similar documents/tools?": "5",
    "Does the NAP include any specific, time-bound targets to track progress?": "6",
    "Does the NAP specify any budget allocation assigned for the strategies/policies/goals/actions directed at tackling AMR?": "7",
    "Does the NAP establish a multisectoral committee or task force for AMR coordination?": "8",
    "Does the NAP include any strategy/policy/goal/action related to training and educational initiatives to increase awareness about AMR – in school curricula or professional training programs for doctors and pharmacists?": "9",
    "Does the NAP include any strategy/policy/goal/action related to public awareness campaigns in combating AMR, such as media involvement and community engagement?": "10",
    "Does the NAP include any strategy/policy/goal/action related to international collaboration and support for AMR initiatives, such as partnerships with organizations like WHO, FAO, and OIE?": "11",
    "Does the NAP include any strategy/policy/goal/action related to data collection and analysis for AMR surveillance, such as developing surveillance systems, data collection tools, setting up laboratories for AMR monitoring, or other similar actions?": "12",
    "Does the NAP include any strategy/policy/goal/action related to developing the capacity to detect and report newly emerged resistance that may constitute a public health emergency of international concern, as required by the International Health Regulations (2005)?": "13",
    "Does the NAP include any strategy/policy/goal/action related to specific prevention strategies such as sanitation, hygiene measures, and infection control?": "14",
    "Does the NAP include any strategy/policy/goal/action related to implementing the standards outlined in the OIE Terrestrial and Aquatic Animal Health Codes or the FAO/WHO Codex Alimentarius Code of Practice to Minimize and Contain Antimicrobial Resistance?": "15",
    "Does the NAP include any strategy/policy/goal/action related to regulating the distribution, prescription, and dispensation of antibiotics, such as developing or maintaining essential medicine lists for antibiotics?": "16",
    "Does the NAP include any strategy/policy/goal/action related to strengthening the legislative and regulatory framework for AMR, including laws and regulations for antimicrobial use?": "17",
    "Does the NAP include any strategy/policy/goal/action related to requiring the rational use of antibiotics through regulations/laws?": "18",
    "Does the NAP include any strategy/policy/goal/action related to participating in international collaborative research to support the development of new medicines, diagnostic tools, and vaccines?": "19"
}

model_answers_df["question"] = model_answers_df["question"].replace(question_mapping)

In [24]:
# Merge based on country and question
merged_df = pd.merge(annotations_df, model_answers_df, on=["country", "question"], suffixes=('_gs', '_llm'))


In [25]:
merged_df

Unnamed: 0,country,question,answer_gs,comment,Supporting text (True answer is Yes but prediction is No),answer_llm
0,netherlands,1,yes,,Actors agree to the following goals: The numbe...,yes
1,netherlands,2,yes,,The reduction of antibiotic use has flattened....,yes
2,netherlands,3,yes,,"Despite these uncertainties, the research by Z...",yes
3,netherlands,4,no,,,yes
4,netherlands,5,yes,,We will regularly inform Parliament of the pro...,yes
5,netherlands,6,yes,,The number of avoidable healthcare-associated ...,yes
6,netherlands,7,no,,,no
7,netherlands,8,no,,,no
8,netherlands,9,yes,,"For professionals, the strategy consists of sh...",yes
9,netherlands,10,yes,,This strategy includes:\n· Increasing public a...,yes


In [26]:
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

# Convert categorical yes/no answers to binary (1 for yes, 0 for no)
merged_df['gs_binary'] = merged_df['answer_gs'].apply(lambda x: 1 if x == 'yes' else (0 if x == 'no' else None))
merged_df['llm_binary'] = merged_df['answer_llm'].apply(lambda x: 1 if x == 'yes' else (0 if x == 'no' else None))

# Drop rows with NaN values in either gs_binary or llm_binary for accurate calculations
yesno_df = merged_df[~merged_df['question'].isin(["period_start", "period_end"])]
yesno_df = yesno_df.dropna(subset=['gs_binary', 'llm_binary'])

# Performance for yes/no questions
accuracy = accuracy_score(yesno_df['gs_binary'], yesno_df['llm_binary'])
precision = precision_score(yesno_df['gs_binary'], yesno_df['llm_binary'])
recall = recall_score(yesno_df['gs_binary'], yesno_df['llm_binary'])
f1 = f1_score(yesno_df['gs_binary'], yesno_df['llm_binary'])

# Performance for period extraction (period_start, period_end)
period_matches = merged_df[merged_df['question'].isin(["period_start", "period_end"])]

# For period extraction, we don't need binary conversion, so we directly compare the answers
period_matches['period_match'] = period_matches['answer_gs'] == period_matches['answer_llm']

# Calculate period extraction accuracy as the percentage of exact matches
period_accuracy = period_matches['period_match'].mean()  # Exact match percentage

# Questions where the model failed
questions_failed = merged_df[merged_df['gs_binary'] != merged_df['llm_binary']]['question'].unique()

# Print results
print(f"Accuracy (Yes/No Questions): {accuracy:.2f}")
print(f"Precision (Yes/No Questions): {precision:.2f}")
print(f"Recall (Yes/No Questions): {recall:.2f}")
print(f"F1 Score (Yes/No Questions): {f1:.2f}")
print(f"Period Extraction Accuracy: {period_accuracy:.2f}")
print(f"Questions where the model failed: {questions_failed}")

Accuracy (Yes/No Questions): 0.79
Precision (Yes/No Questions): 0.73
Recall (Yes/No Questions): 1.00
F1 Score (Yes/No Questions): 0.85
Period Extraction Accuracy: nan


In [115]:
# see how many tokens that last query used
response.usage

CompletionUsage(completion_tokens=1346, prompt_tokens=4538, total_tokens=5884, completion_tokens_details=CompletionTokensDetails(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0), prompt_tokens_details=PromptTokensDetails(audio_tokens=0, cached_tokens=0))