# What do LLMs know about Plan Net IG without any additional context?

The purpose of this notebook is to test understanding of LLM models **on MITRE's AI Platform (AIP)**  and endpoints regarding their understanding of the PlanNet Implementation Guide. Conducting this baseline experiment will demonstarate the inherent knowledge of the LLMs and serve as a control in our experiments.

Were the models trained on the PlanNet IG? To do this we will ask 10 questions about the PlanNet IG and see how the models respond.

For more information on MITRE's AIP:

https://ai-platform.pages.mitre.org/resources/llm-endpoints/#using-the-endpoint-api

The best available models for use are 
- Mixtral-8x22B-Instruct-v0.1
- Meta-Llama-3.1-70B-Instruct
- Codestral-22B-v0.1 (Mistral-based)
- CohereForAI/c4ai-command-r-plus

Be sure to check the outages page to check which models are avaliable:

https://uptime.k8s.aip.mitre.org/status/aip

In [1]:
import os
import google.generativeai as gemini
from anthropic import Anthropic
from openai import OpenAI
import io, threading, time, re, json
import pandas as pd
from dotenv import load_dotenv
import httpx
from IPython.display import display, Markdown
from datetime import datetime
from huggingface_hub import InferenceClient

In [2]:
load_dotenv()

claude_api_key = os.getenv('ANTHROPIC_API_KEY')
gemini_api_key = os.getenv('GEMINI_API_KEY')
OpenAI.api_key = os.getenv('OPENAI_API_KEY')

The following is our list of questions that will be asked independently of each other:

In [3]:
questions = [
    "What is the purpose of the FHIR DaVinci PlanNet Implementation Guide?",
    "Who are the intended users and actors of the FHIR DaVinci PlanNet Implementation Guide?",
    "Are there one or more workflows defined in the FHIR DaVinci PDex Plan Net Implementation Guide? Please use all the information you know.",
    "What data is being exchanged in the FHIR DaVinci PDex Plan Net Implementation Guide and why?",
    "How is that data represented by the resources and profiles in the FHIR DaVinci PDex Plan Net Implementation Guide?",
    "What actions (REST/CRUD) or operations can be used in the FHIR DaVinci PDex Plan Net Implementation Guide?",
    "What are all the mandatory requirements and rules from the DaVinci PDex Plan Net Implementation Guide for compliant implementations?",
    "What are all the optional requirements and rules from the DaVinci PDex Plan Net Implementation Guide for compliant implementations?",
    "How would you create a test plan for the FHIR DaVinci PDex Plan Net Implementation Guide?"
]


These functions are used to run querries and display results within this notebook.

In [4]:
#function to add sleep pauses
def heartbeat(stop_event, start_time):
    """Prints elapsed time periodically until stopped."""
    while not stop_event.is_set():
        elapsed = time.time() - start_time
        print(f"... still processing ({elapsed:.1f}s elapsed)")
        time.sleep(5)

In [5]:
def display_qa(json_file_path):
    """
    Display just the questions and answers in a clean format
    """
    with open(json_file_path, 'r') as f:
        data = json.load(f)
    
    for question, response in data['results'].items():
        display(Markdown(f"## {question}\n"))
        display(Markdown(f"{response}\n"))
        display(Markdown("---\n"))

## Mistralai/Mixtral-8x22B-Instruct-v0.1

https://mixtral-8x22b.k8s.aip.mitre.org

In [31]:
output_dir = "Mixtral_responses"
os.makedirs(output_dir, exist_ok=True)

In [7]:
from huggingface_hub import InferenceClient

In [32]:
def create_mixtral_client():
    return InferenceClient(model="https://mixtral-8x22b.k8s.aip.mitre.org")


In [33]:
def ask_mixtral(client, question, max_retries=3):
    """Ask Mixtral a question with retry logic"""
    for attempt in range(max_retries):
        try:
            messages = [{"role": "user", "content": question}]
            response = client.chat_completion(messages, max_tokens=60000)
            return response
        except Exception as e:
            if attempt == max_retries - 1:
                raise e
            print(f"Attempt {attempt + 1} failed. Retrying...")
            time.sleep(2 ** attempt)

In [35]:
def analyze_all_questions_mixtral():
    """Run analysis for all questions and save results with timestamp"""    
    results = {}
    client = create_mixtral_client()
        
    # Generate timestamp for the filename
    timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
    output_file = os.path.join(output_dir, f"mixtral_plannet_analysis_{timestamp}.json")
    
    for question in questions:
        print(f"\nAnalyzing question: {question}")
        response = ask_mixtral(client, question)
        results[question] = response
        
        # Save results after each question
        with open(output_file, 'w') as f:
            final_output = {
                "metadata": {
                    "timestamp": timestamp,
                    "generation_date": datetime.now().isoformat()
                },
                "results": results
            }
            json.dump(final_output, f, indent=2)
            
        time.sleep(2)  # Small delay between questions
        
    print(f"\nResults saved to: {output_file}")
    return results

In [36]:
results = analyze_all_questions_mixtral()


Analyzing question: What is the purpose of the FHIR DaVinci PlanNet Implementation Guide?

Analyzing question: Who are the intended users and actors of the FHIR DaVinci PlanNet Implementation Guide?

Analyzing question: Are there one or more workflows defined in the FHIR DaVinci PDex Plan Net Implementation Guide? Please use all the information you know.

Analyzing question: What data is being exchanged in the FHIR DaVinci PDex Plan Net Implementation Guide and why?

Analyzing question: How is that data represented by the resources and profiles in the FHIR DaVinci PDex Plan Net Implementation Guide?

Analyzing question: What actions (REST/CRUD) or operations can be used in the FHIR DaVinci PDex Plan Net Implementation Guide?

Analyzing question: What are all the mandatory requirements and rules from the DaVinci PDex Plan Net Implementation Guide for compliant implementations?

Analyzing question: What are all the optional requirements and rules from the DaVinci PDex Plan Net Implement

## meta-llama/Llama-3.1-70B-Instruct

In [39]:
output_dir = "Llama_responses"
os.makedirs(output_dir, exist_ok=True)

In [37]:
def create_llama_client():
    return InferenceClient(model="https://llama3-70b.k8s.aip.mitre.org")

In [38]:
def ask_llama(client, question, max_retries=3):
    """Ask Llama a question with retry logic"""
    for attempt in range(max_retries):
        try:
            messages = [{"role": "user", "content": question}]
            response = client.chat_completion(messages, max_tokens=4192)
            return response
        except Exception as e:
            if attempt == max_retries - 1:
                raise e
            print(f"Attempt {attempt + 1} failed. Retrying...")
            time.sleep(2 ** attempt)

In [40]:
def analyze_all_questions_llama():
    """Run analysis for all questions and save results with timestamp"""    
    results = {}
    client = create_llama_client()
        
    # Generate timestamp for the filename
    timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
    output_file = os.path.join(output_dir, f"llama_plannet_analysis_{timestamp}.json")
    
    for question in questions:
        print(f"\nAnalyzing question: {question}")
        response = ask_llama(client, question)
        results[question] = response
        
        # Save results after each question
        with open(output_file, 'w') as f:
            final_output = {
                "metadata": {
                    "timestamp": timestamp,
                    "generation_date": datetime.now().isoformat()
                },
                "results": results
            }
            json.dump(final_output, f, indent=2)
            
        time.sleep(2)  # Small delay between questions
        
    print(f"\nResults saved to: {output_file}")
    return results

In [41]:
results = analyze_all_questions_llama()


Analyzing question: What is the purpose of the FHIR DaVinci PlanNet Implementation Guide?

Analyzing question: Who are the intended users and actors of the FHIR DaVinci PlanNet Implementation Guide?

Analyzing question: Are there one or more workflows defined in the FHIR DaVinci PDex Plan Net Implementation Guide? Please use all the information you know.

Analyzing question: What data is being exchanged in the FHIR DaVinci PDex Plan Net Implementation Guide and why?

Analyzing question: How is that data represented by the resources and profiles in the FHIR DaVinci PDex Plan Net Implementation Guide?

Analyzing question: What actions (REST/CRUD) or operations can be used in the FHIR DaVinci PDex Plan Net Implementation Guide?

Analyzing question: What are all the mandatory requirements and rules from the DaVinci PDex Plan Net Implementation Guide for compliant implementations?

Analyzing question: What are all the optional requirements and rules from the DaVinci PDex Plan Net Implement

## mistralai/Codestral-22B-v0.1

In [42]:
output_dir = "Codestral_responses"
os.makedirs(output_dir, exist_ok=True)

In [43]:
def create_codestral_client():
    return InferenceClient(model="https://codestral-22b.k8s.aip.mitre.org")

In [44]:
def ask_codestral(client, question, max_retries=3):
    """Ask Codestral a question with retry logic"""
    for attempt in range(max_retries):
        try:
            messages = [{"role": "user", "content": question}]
            response = client.chat_completion(messages, max_tokens=4192)
            return response
        except Exception as e:
            if attempt == max_retries - 1:
                raise e
            print(f"Attempt {attempt + 1} failed. Retrying...")
            time.sleep(2 ** attempt)

In [45]:
def analyze_all_questions_codestral():
    """Run analysis for all questions and save results with timestamp"""    
    results = {}
    client = create_codestral_client()
        
    # Generate timestamp for the filename
    timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
    output_file = os.path.join(output_dir, f"codestral_plannet_analysis_{timestamp}.json")
    
    for question in questions:
        print(f"\nAnalyzing question: {question}")
        response = ask_llama(client, question)
        results[question] = response
        
        # Save results after each question
        with open(output_file, 'w') as f:
            final_output = {
                "metadata": {
                    "timestamp": timestamp,
                    "generation_date": datetime.now().isoformat()
                },
                "results": results
            }
            json.dump(final_output, f, indent=2)
            
        time.sleep(2)  # Small delay between questions
        
    print(f"\nResults saved to: {output_file}")
    return results

In [46]:
results = analyze_all_questions_codestral()


Analyzing question: What is the purpose of the FHIR DaVinci PlanNet Implementation Guide?

Analyzing question: Who are the intended users and actors of the FHIR DaVinci PlanNet Implementation Guide?

Analyzing question: Are there one or more workflows defined in the FHIR DaVinci PDex Plan Net Implementation Guide? Please use all the information you know.

Analyzing question: What data is being exchanged in the FHIR DaVinci PDex Plan Net Implementation Guide and why?

Analyzing question: How is that data represented by the resources and profiles in the FHIR DaVinci PDex Plan Net Implementation Guide?

Analyzing question: What actions (REST/CRUD) or operations can be used in the FHIR DaVinci PDex Plan Net Implementation Guide?

Analyzing question: What are all the mandatory requirements and rules from the DaVinci PDex Plan Net Implementation Guide for compliant implementations?

Analyzing question: What are all the optional requirements and rules from the DaVinci PDex Plan Net Implement

## CohereForAI/c4ai-command-r-plus

In [47]:
output_dir = "Cohere_responses"
os.makedirs(output_dir, exist_ok=True)

In [48]:
def create_cohere_client():
    return InferenceClient(model="https://command-rplus.k8s.aip.mitre.org")

In [49]:
def ask_cohere(client, question, max_retries=3):
    """Ask Cohere a question with retry logic"""
    for attempt in range(max_retries):
        try:
            messages = [{"role": "user", "content": question}]
            response = client.chat_completion(messages, max_tokens=4192)
            return response
        except Exception as e:
            if attempt == max_retries - 1:
                raise e
            print(f"Attempt {attempt + 1} failed. Retrying...")
            time.sleep(2 ** attempt)

In [50]:
def analyze_all_questions_cohere():
    """Run analysis for all questions and save results with timestamp"""    
    results = {}
    client = create_cohere_client()
        
    # Generate timestamp for the filename
    timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
    output_file = os.path.join(output_dir, f"cohere_plannet_analysis_{timestamp}.json")
    
    for question in questions:
        print(f"\nAnalyzing question: {question}")
        response = ask_llama(client, question)
        results[question] = response
        
        # Save results after each question
        with open(output_file, 'w') as f:
            final_output = {
                "metadata": {
                    "timestamp": timestamp,
                    "generation_date": datetime.now().isoformat()
                },
                "results": results
            }
            json.dump(final_output, f, indent=2)
            
        time.sleep(2)  # Small delay between questions
        
    print(f"\nResults saved to: {output_file}")
    return results

In [51]:
results = analyze_all_questions_cohere()


Analyzing question: What is the purpose of the FHIR DaVinci PlanNet Implementation Guide?

Analyzing question: Who are the intended users and actors of the FHIR DaVinci PlanNet Implementation Guide?

Analyzing question: Are there one or more workflows defined in the FHIR DaVinci PDex Plan Net Implementation Guide? Please use all the information you know.

Analyzing question: What data is being exchanged in the FHIR DaVinci PDex Plan Net Implementation Guide and why?

Analyzing question: How is that data represented by the resources and profiles in the FHIR DaVinci PDex Plan Net Implementation Guide?

Analyzing question: What actions (REST/CRUD) or operations can be used in the FHIR DaVinci PDex Plan Net Implementation Guide?

Analyzing question: What are all the mandatory requirements and rules from the DaVinci PDex Plan Net Implementation Guide for compliant implementations?

Analyzing question: What are all the optional requirements and rules from the DaVinci PDex Plan Net Implement