<a href="https://colab.research.google.com/github/paulokuriki/agentic-researcher-flashcards/blob/main/agentic_researcher_flashcards.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 🤖🩺 **AI on Scrubs**

## **Agentic AI for Radiology Research & Education**

Welcome to this interactive notebook on **Agentic AI**, where we explore how intelligent agents can collaborate autonomously to solve complex, domain-specific tasks—in this case, within **radiology**.

---

### 🎯 **What is Agentic AI?**

Agentic AI refers to systems composed of autonomous agents, each with a specific role, that work together to accomplish multi-step objectives. These agents can:

- Search for information
- Analyze complex data
- Generate structured outputs
- Coordinate sequential workflows

This framework is especially powerful in **medical research, education, and automation**.

---

## 🛠️ Tasks We’ll Perform in This Notebook:

### 📚 **Part 1: Literature Review with Agentic AI**

We will use a team of intelligent agents to:
- 🔍 Search PubMed for recent and relevant studies on a selected radiology topic.
- 🧪 Analyze each study in depth (methods, results, limitations).
- 🧠 Synthesize a **comprehensive literature review** with insights and references.
- 💾 Export the result to a structured Word document.

---

### 🧾 **Part 2: Flashcard Generation for Board Review**

Another agentic crew will:
- 🔎 Search Radiopaedia for high-yield educational articles on a radiology topic.
- 📋 Extract board-relevant information from each article.
- ❓ Generate 10–15 **high-quality flashcards** in Q&A format.
- 📽️ Compile the flashcards into a downloadable PowerPoint deck.

---

# 🔐 **Before We Start:** Setting Up API Keys

This notebook uses **Agentic AI** to automate radiology research and education workflows. To function properly, it requires **two API keys**:

1. [**APIFY API Token**](https://apify.com/) – for intelligent web browsing and scraping from sites like PubMed and Radiopaedia  
2. [**Gemini API Key**](https://aistudio.google.com/app/apikey) – for running the language model that powers your AI agents  

> 🔒 These keys are used **locally only** and are **never stored or shared**.

---

### ❓ Why These Keys?

- **APIFY** lets your agents simulate browser sessions to extract full article content from the web  
- **Gemini** provides the LLM intelligence for reasoning, summarizing, analyzing, and generating content

---

### ⚙️ Get Your **APIFY API Token**

1. Go to [Apify Console](https://console.apify.com/)
2. Log in or sign up  
3. Click **Settings** → **API & Integrations**  
4. Copy your **Personal API Token**  
5. Store it somewhere safe  

---

### 🤖 Get Your **Gemini API Key**

1. Visit [Google AI Studio](https://aistudio.google.com/app/apikey)  
2. Log in with your Google account  
3. Click **Get API Key** and agree to terms  
4. Copy the generated key  

> 💡 Free-tier usage should be enough for this notebook.

---

### 🔁 Use the Keys in the Notebook

Paste your keys when prompted:

---

## 📦 Install Required Packages

Run the following cell to install all necessary libraries for this notebook.

> ⚠️ You may see a long list of packages and dependency warnings — this is expected.  
> 🔄 After installation, **restart the notebook** to apply changes. This is normal for Jupyter.


In [None]:
pip install crewai[tools] langchain-apify python-docx python-pptx markdown2 langchain-google-genai

## 🔑 Enter Your API Keys

Paste the API keys you created earlier when prompted below. They’ll be securely collected (not displayed or stored).

> 🔒 These inputs are hidden for your privacy.

In [None]:
import getpass

# Securely collect APIFY_API_TOKEN
APIFY_API_TOKEN = getpass.getpass("Enter your APIFY API Token: ")

# Securely collect GEMINI_API_KEY
GEMINI_API_KEY = getpass.getpass("Enter your GEMINI API Key: ")

# Optional: Confirm keys are set (use with caution)
print("API keys collected securely.")


## 📚 Import Libraries & Set API Keys

Import all required libraries and set your API keys for use.

In [None]:
# --------------------------
# 📦 Import necessary libraries
# --------------------------

import warnings

# Suppress all warnings globally (e.g., Pydantic deprecation messages)
warnings.filterwarnings("ignore")

import os
import re
import time
import random
import json

from typing import List, Dict, Any

from crewai import Agent, Task, Crew, Process, LLM
#from langchain.agents import Tool
from crewai.tools import tool
from crewai_tools import ApifyActorsTool

from IPython.display import Markdown, display
import markdown
from bs4 import BeautifulSoup
from docx import Document
from docx.enum.text import WD_PARAGRAPH_ALIGNMENT
from docx.shared import Pt, RGBColor as DocxRGBColor
from docx.oxml.ns import qn

from pptx import Presentation
from pptx.util import Inches, Pt
from pptx.dml.color import RGBColor as PptxRGBColor

# --------------------------
# 🔐 Set API Keys securely
# --------------------------

# These should already be defined using getpass earlier in the notebook
# Avoid hardcoding sensitive information
os.environ["APIFY_API_TOKEN"] = APIFY_API_TOKEN
os.environ["GEMINI_API_KEY"] = GEMINI_API_KEY

# --------------------------
# ⚙️ Define system constants
# --------------------------

# Max requests per minute (to respect API rate limits)
MAX_RPM = 20

# Max retry attempts for failed API calls
MAX_RETRY_LIMIT = 10


## 🧰 Utility Functions

Helper functions to display markdown, save Word docs, and generate flashcard PowerPoints.

In [None]:
# === Display and Export Utilities ===

def display_markdown(text):
    """
    This function is meant for Jupyter notebooks.
    In a regular Python script, you can't use this directly.

    For Jupyter notebooks:
    from IPython.display import Markdown, display
    display(Markdown(text))
    """
    try:
        from IPython.display import Markdown, display
        display(Markdown(text))
    except ImportError:
        print("IPython not available. This function is intended for Jupyter notebooks only.")
        print("Preview of markdown:\n", text)

def save_to_word(markdown_text, filename):
    """
    Convert markdown text to a Word document with proper formatting.

    Parameters:
    markdown_text (str): Markdown formatted text
    filename (str): Output filename (without extension)

    Returns:
    None: Saves the document to disk
    """
    # Convert markdown to HTML with extensions
    html = markdown.markdown(markdown_text, extensions=['tables', 'fenced_code'])

    # Parse HTML using BeautifulSoup
    soup = BeautifulSoup(html, 'html.parser')

    # Create a Word document
    doc = Document()

    # Process the HTML elements recursively
    process_elements(soup, doc)

    # Save to Word file
    doc.save(f"{filename}.docx")
    print(f"✅ Saved as '{filename}.docx'")

def process_elements(parent, doc, current_paragraph=None):
    """
    Recursively process HTML elements and add them to the Word document.

    Parameters:
    parent: BeautifulSoup element or NavigableString
    doc: Document object
    current_paragraph: Optional paragraph object for inline elements

    Returns:
    None: Modifies the document in place
    """
    # If the parent is a string (NavigableString), just add it to the current paragraph
    if isinstance(parent, str) or not hasattr(parent, 'children'):
        if current_paragraph:
            current_paragraph.add_run(str(parent))
        else:
            doc.add_paragraph(str(parent))
        return

    # Process all children
    for element in parent.children:
        if element.name is None:  # This is a text node
            if current_paragraph and str(element).strip():
                current_paragraph.add_run(str(element))
            elif str(element).strip():
                current_paragraph = doc.add_paragraph(str(element))
            continue

        if element.name == 'h1':
            doc.add_heading(element.get_text(), level=1)
        elif element.name == 'h2':
            doc.add_heading(element.get_text(), level=2)
        elif element.name == 'h3':
            doc.add_heading(element.get_text(), level=3)
        elif element.name == 'p':
            # Create a new paragraph
            p = doc.add_paragraph()
            p.alignment = WD_PARAGRAPH_ALIGNMENT.LEFT
            # Process all children of the paragraph
            process_elements(element, doc, p)
        elif element.name == 'strong' or element.name == 'b':
            # If we're in a paragraph, add a bold run
            if current_paragraph:
                run = current_paragraph.add_run(element.get_text())
                run.bold = True
            else:
                p = doc.add_paragraph()
                run = p.add_run(element.get_text())
                run.bold = True
        elif element.name == 'em' or element.name == 'i':
            # If we're in a paragraph, add an italic run
            if current_paragraph:
                run = current_paragraph.add_run(element.get_text())
                run.italic = True
            else:
                p = doc.add_paragraph()
                run = p.add_run(element.get_text())
                run.italic = True
        elif element.name == 'ul':
            for li in element.find_all('li', recursive=False):
                p = doc.add_paragraph(style='List Bullet')
                process_elements(li, doc, p)
        elif element.name == 'ol':
            for index, li in enumerate(element.find_all('li', recursive=False)):
                p = doc.add_paragraph(style='List Number')
                process_elements(li, doc, p)
        elif element.name == 'hr':
            doc.add_paragraph('_' * 50)
        elif element.name == 'a':
            if current_paragraph:
                run = current_paragraph.add_run(element.get_text())
                run.underline = True
                #run.font.color.rgb = RGBColor(0, 0, 255)
                run.font.color.rgb = DocxRGBColor(0x00, 0x00, 0xFF)
                # Add hyperlink
                if 'href' in element.attrs:
                    run.hyperlink = element['href']
            else:
                p = doc.add_paragraph()
                run = p.add_run(element.get_text())
                run.underline = True
                #run.font.color.rgb = RGBColor(0, 0, 255)
                run.font.color.rgb = DocxRGBColor(0x00, 0x00, 0xFF)
                # Add hyperlink
                if 'href' in element.attrs:
                    run.hyperlink = element['href']
        elif element.name == 'blockquote':
            p = doc.add_paragraph()
            p.style = 'Quote'
            process_elements(element, doc, p)
        else:
            # Recursively process other elements
            process_elements(element, doc, current_paragraph)

def create_flashcard_ppt(json_file_path, topic):
    """
    Create a PowerPoint presentation from flashcard data in JSON format.

    Args:
        json_file_path (str): Path to the JSON file containing flashcards

    Returns:
        str: Path to the created PowerPoint file
    """
    clean_topic = topic.replace(" ", "_").lower()
    output_pptx_name = f"{clean_topic}_flashcards.pptx"

    # Load the flashcards from JSON
    with open(json_file_path, 'r') as f:
        flashcards = json.load(f)

    # Create a presentation
    prs = Presentation()

    # Define slide layouts
    title_layout = prs.slide_layouts[0]  # Title Slide
    content_layout = prs.slide_layouts[1]  # Title and Content

    # Create title slide
    title_slide = prs.slides.add_slide(title_layout)
    title = title_slide.shapes.title
    subtitle = title_slide.placeholders[1]

    title.text = f"Radiology Flashcards"
    subtitle.text = f"Topic: {topic.title()}\n\nCore Exam Preparation"

    # Add instructions slide
    instruction_slide = prs.slides.add_slide(content_layout)
    instruction_slide.shapes.title.text = "How to Use These Flashcards"
    content = instruction_slide.placeholders[1]
    content.text = (
        "1. Each flashcard consists of two slides: a Question followed by an Answer\n\n"
        "2. Try to answer the question before moving to the next slide\n\n"
        "3. These cards are designed to reinforce critical concepts for Core Exam\n\n"
    )

    # Create flashcard slides
    for i, card in enumerate(flashcards, 1):
        # Question slide
        q_slide = prs.slides.add_slide(content_layout)
        q_slide.shapes.title.text = f"Question {i}"
        q_content = q_slide.placeholders[1]
        q_content.text = card['question']

        # Style question text
        for paragraph in q_content.text_frame.paragraphs:
            paragraph.alignment = 1  # Center alignment
            for run in paragraph.runs:
                run.font.size = Pt(28)
                run.font.bold = True

        # Answer slide
        a_slide = prs.slides.add_slide(content_layout)
        a_slide.shapes.title.text = f"Answer {i}"
        a_content = a_slide.placeholders[1]
        a_content.text = card['answer']

        # Style answer text
        for paragraph in a_content.text_frame.paragraphs:
            for run in paragraph.runs:
                run.font.size = Pt(24)

    # Save the presentation
    prs.save(output_pptx_name)
    print(f"PowerPoint created: {output_pptx_name}")



# Save results to a JSON file
def extract_json_from_output(output_text):
    """
    Attempt to extract a JSON array of flashcards from raw agent output.

    Args:
        output_text (str): Raw string containing potential JSON.

    Returns:
        list: Parsed list of flashcards, or empty list if none found.
    """
    # Look for JSON array pattern
    json_pattern = r'\[\s*{\s*"question":.+\}\s*\]'
    match = re.search(json_pattern, output_text, re.DOTALL)

    if match:
        try:
            # Try parsing the matched JSON
            json_text = match.group(0)
            return json.loads(json_text)
        except json.JSONDecodeError:
            pass

    # Fallback: Try to extract based on code blocks
    json_block_pattern = r'```json\s*([\s\S]*?)\s*```'
    block_match = re.search(json_block_pattern, output_text)

    if block_match:
        try:
            return json.loads(block_match.group(1))
        except json.JSONDecodeError:
            pass

    # Return empty list if no valid JSON found
    return []

# === Initialize Web Browser Tool and LLM ===

# Web scraping tool powered by Apify for structured markdown content
rag_browser = ApifyActorsTool(
    actor_name="apify/rag-web-browser",
    actor_config={
        "outputFormats": ["markdown"],
        "requestTimeoutSecs": 45,
        "scrapingTool": "browser-playwright",
        "removeCookieWarnings": True
    }
)


# Google Gemini LLM instance for reasoning tasks
llm = LLM("gemini/gemini-2.0-flash-lite")

# ✅ Confirmation message
print("Utility functions loaded successfully 💪")

## 🔍 Literature Review

### ✅ Overview
A **three-agent pipeline** will be activated to conduct a **comprehensive literature review** on the radiology topic you want:

### 🧠 Agents Involved:
1. **PubMed Researcher**
   - Searches PubMed for recent and relevant studies.
   - Returns at least 3 articles with full bibliographic details.

2. **Medical Research Analyst**
   - Retrieves and analyzes each article.
   - Summarizes objectives, methods, findings, limitations, and relevance.

3. **Research Synthesizer**
   - Integrates all analyses into a structured literature review.
   - Highlights current knowledge, gaps, contradictions, and future directions.

### 📄 Outputs:
- A comprehensive **Markdown-formatted literature review** displayed in the notebook.
- A **Word document (.docx)** version of the review saved locally as:
  ```
  Literature_Review.docx
  ```


In [None]:
# -------------------------------------------------------------------
# Tools
# -------------------------------------------------------------------

@tool("GetFullArticle")
def get_full_article(url: str) -> str:
    """
    Retrieve the full content of an article from a given URL.

    Args:
        url (str): The URL of the article to retrieve.

    Returns:
        str: The full content of the article in markdown format, or an error message if retrieval fails.
    """
    try:
        # Validate URL format
        if not url.startswith("http"):
            return "Please provide a valid URL starting with http:// or https://"

        # Fetch content using Apify's RAG Web Browser
        article_content = rag_browser.run(
            run_input={"query": url, "maxResults": 1}
            )

        return article_content
    except Exception as e:
        # Handle and return the error message
        return f"Error retrieving article content: {str(e)}"


@tool("PubMedSearch")
def search_pubmed(query: str, max_results: int = 5) -> str:
    """
    Search PubMed for medical research articles using Apify's RAG Web Browser.

    Args:
        query (str): The search term to query PubMed.
        max_results (int): Maximum number of search results to return (default is 5).

    Returns:
        str: A list of search results in markdown format, or an error message if the search fails.
    """
    try:
        # Construct the PubMed-specific search query
        pubmed_query = f"site:pubmed.ncbi.nlm.nih.gov {query} after:2024"

        # Execute the search using Apify's RAG Web Browser
        results = rag_browser.run(run_input={
            "query": pubmed_query,
            "maxResults": max_results
        })
        return results
    except Exception as e:
        # Handle and return the error message
        return f"Error searching PubMed: {str(e)}"


# -------------------------------------------------------------------
# PubMed Researcher Agent
# -------------------------------------------------------------------
pubmed_researcher = Agent(
    role="PubMed Researcher",
    goal="Locate and compile a curated list of current, pertinent studies on [RADIOLOGY_TOPIC], focusing on those that offer significant contributions to the field.",
    backstory="""You are an expert medical research specialist with deep knowledge of radiology.
You excel at navigating PubMed to find the most relevant and recent research for any given radiology topic,
emphasizing high-impact and high-quality studies. You have developed advanced strategies for searching
PubMed over many years, quickly filtering out low-impact or outdated articles and homing in on
publications that meaningfully advance understanding and clinical practice in radiology.""",
    verbose=True,
    allow_delegation=False,
    tools=[search_pubmed],
    llm=llm,
    max_rpm=MAX_RPM,
    max_retry_limit=MAX_RETRY_LIMIT
)

pubmed_search_task = Task(
    description="""
1. Conduct a targeted PubMed search on [RADIOLOGY_TOPIC].
2. Identify at least 3 recent, high-quality papers.
3. For each paper, provide:
   - Title
   - Authors
   - Publication date
   - Journal
   - URL or citation link
4. Ensure the curated list includes studies that collectively reflect the major
   developments and perspectives on [RADIOLOGY_TOPIC].
""",
    agent=pubmed_researcher,
    expected_output="A list of at least 3 relevant research articles with full bibliographic details and URLs."
)

# -------------------------------------------------------------------
# Medical Research Analyst Agent
# -------------------------------------------------------------------
pubmed_analyst = Agent(
    role="Medical Research Analyst",
    goal="Deliver structured, in-depth analyses of each retrieved article, focusing on the objectives, methods, results, conclusions, and an assessment of each study’s reliability and importance.",
    backstory="""You are a medical research analyst with specialized expertise in radiology.
You excel at dissecting and evaluating research publications—particularly their methods,
results, and overall quality. You have a keen eye for detail and extensive background
in research methodology, making you adept at unpacking complex study designs and interpreting
data rigorously.""",
    verbose=True,
    allow_delegation=False,
    tools=[get_full_article],
    llm=llm,
    max_rpm=MAX_RPM,
    max_retry_limit=MAX_RETRY_LIMIT
)

pubmed_analysis_task = Task(
    description="""
For each paper provided by the PubMed Researcher:
1. Try to retrieve the abstract.
2. Identify and summarize:
   - The research question or objective
   - Study design and methodology
   - Key findings or results
   - Limitations acknowledged by the authors
   - Conclusions and implications for the field
3. Offer a brief assessment of the paper’s overall quality, significance, and relevance.
4. Present the analysis in a structured, consistent format to facilitate subsequent synthesis.
""",
    agent=pubmed_analyst,
    expected_output="A detailed yet concise analysis of each research paper, capturing its core contributions, strengths, and weaknesses."
)

# -------------------------------------------------------------------
# Research Synthesizer Agent
# -------------------------------------------------------------------
pubmed_synthesizer = Agent(
    role="Research Synthesizer",
    goal="Merge the analyses from all reviewed papers into a clear, comprehensive literature review that illuminates the current state of knowledge on [RADIOLOGY_TOPIC] and guides further research.",
    backstory="""You are a research synthesis expert specializing in radiology.
You excel at integrating findings from multiple studies into a cohesive narrative that clarifies
current understanding, notes contradictions or gaps, and suggests directions for future investigation.
You have years of experience writing academic and clinical reviews and can reconcile divergent findings
by comparing methodologies and study populations.""",
    verbose=True,
    allow_delegation=False,
    llm=llm,
    max_rpm=MAX_RPM,
    max_retry_limit=MAX_RETRY_LIMIT
)

pubmed_synthesis_task = Task(
    description="""
Based on the analyses provided:
1. Integrate findings: Compare and combine the key points from each analysis to form a unified overview of the research landscape on [RADIOLOGY_TOPIC].
2. Identify common themes and contradictions: Highlight consistent results, discuss conflicting data, and propose potential reasons for disparities.
3. Evaluate overall strength of evidence: Note methodological limitations or areas lacking robust data, explaining how these affect the validity of conclusions.
4. Summarize current knowledge: Clearly state what is broadly accepted in the field, as well as any ongoing controversies.
5. Highlight gaps and future directions: Suggest where additional or alternative research might be warranted and what questions remain unanswered.
6. Provide a bibliography: List all referenced studies in a consistent citation format.
7. Produce a structured literature review: Present a concise yet thorough synthesis that can serve as a stand-alone overview of the topic.
""",
    agent=pubmed_synthesizer,
    expected_output="""
A well-organized, comprehensive literature review on [RADIOLOGY_TOPIC] that clearly integrates
all analyzed studies and includes suggestions for future research, along with a bibliography
of all cited papers.
"""
)

def generate_literature_review(topic):
    # Update tasks with the specific topic
    pubmed_search_task.description = pubmed_search_task.description.replace("[RADIOLOGY_TOPIC]", topic)
    pubmed_synthesis_task.description = pubmed_synthesis_task.description.replace("[RADIOLOGY_TOPIC]", topic)

    # Create the crew
    pubmed_review_crew = Crew(
        agents=[pubmed_researcher, pubmed_analyst, pubmed_synthesizer],
        tasks=[pubmed_search_task, pubmed_analysis_task, pubmed_synthesis_task],
        process=Process.sequential,
        max_rpm=MAX_RPM,
        max_retry_limit=MAX_RETRY_LIMIT,
        output_log_file=True,
        verbose=True)

    # Execute the crew
    literature_review = pubmed_review_crew.kickoff()

    # Display in Jupyter Notebook
    display_markdown(literature_review.raw)

    # Save to Word
    save_to_word(literature_review.raw, "Literature_Review")

    print('-'*50)
    print("\nLiterature Review is complete.\n")
    print("<<<--- Open the left tab and look for the Literature_Review.doc file.\n")
    print('-'*50)

## 🧠 Time to run the Agentic Literature Reviewer

This section will launch a team of three AI agents to generate a structured literature review on a radiology topic of your choice.

---

### 🔍 Choose Your Topic:
Update the `topic` variable with **any radiology topic** you want to explore:

---

### 📝 Outputs:
- 📄 Review displayed directly in the notebook
- 💾 Automatically saved as a Word document: `Literature_Review.docx`


In [None]:
# Replace with your radiology topic of interest
topic = "artificial intelligence in breast cancer"

# Run the literature review pipeline
generate_literature_review(topic)


## 📚 Agentic Flashcard Generator

### ✅ Overview
Another **three-agent pipeline** will run to produce **board-style flashcards** for radiology education.

### 🧠 Agents Involved:
1. **Radiopaedia Researcher**
   - Searches Radiopaedia for high-yield educational articles.
   - Returns 5+ curated links with brief justifications.

2. **Radiology Content Extractor**
   - Extracts structured, board-relevant content from the articles.
   - Focuses on:
     - Pathology
     - Imaging features
     - Differential diagnosis
     - Clinical presentation
     - Epidemiology

3. **Flashcard Generator**
   - Creates 10–15 Q&A flashcards.
   - Formats output as a valid JSON array.

### 📁 Output:
- Flashcards compiled into a **PowerPoint (.pptx)** presentation:
  ```
  topic_flashcards.pptx
  ```


In [None]:
# -------------------------------------------------------------------
# Tools
# -------------------------------------------------------------------

@tool("search_radiopaedia")
def search_radiopaedia(query: str, max_results: int = 5) -> str:
    """
    Search Radiopaedia for radiology-related articles using Apify's RAG Web Browser.
    Args:
        query (str): The search term to query Radiopaedia.
        max_results (int): Maximum number of search results to return (default is 5).
    Returns:
        str: A list of search results in markdown format, or an error message if the search fails.
    """
    try:
        # Construct the Radiopaedia-specific search query
        radiopaedia_query = f"site:radiopaedia.org {query}"
        # Execute the search using Apify's RAG Web Browser
        results = rag_browser.run(run_input={
            "query": radiopaedia_query,
            "maxResults": max_results
        })
        return results
    except Exception as e:
        # Handle and return the error message
        return f"Error searching Radiopaedia: {str(e)}"

@tool("get_full_article")
def get_full_article(url: str) -> str:
    """
    Retrieve the full content of an article from a given URL.
    Args:
        url (str): The URL of the article to retrieve.
    Returns:
        str: The full content of the article in markdown format, or an error message if retrieval fails.
    """
    try:
        # Validate URL format
        if not url.startswith("http"):
            return "Please provide a valid URL starting with http:// or https://"
        # Fetch content using Apify's RAG Web Browser
        article_content = rag_browser.run(run_input={
            "query": url,
            "maxResults": 1
        })
        return article_content
    except Exception as e:
        # Handle and return the error message
        return f"Error retrieving article content: {str(e)}"


# -------------------------------------------------------------------
# Radiopaedia Researcher Agent
# -------------------------------------------------------------------
radiopaedia_researcher = Agent(
    role="Radiopaedia Researcher",
    goal="Locate and compile a curated list of pertinent Radiopaedia articles on [RADIOLOGY_TOPIC], focusing on those that offer significant educational value for radiology board exam preparation.",
    backstory="""You are an expert radiology educator with deep knowledge of board exam requirements.
You excel at navigating Radiopaedia to find the most relevant articles for any given radiology topic,
emphasizing high-educational-value articles with clear descriptions of imaging findings, differential
diagnoses, and clinical correlations that are commonly tested on board exams.""",
    verbose=True,
    allow_delegation=False,
    tools=[search_radiopaedia],
    llm=llm,
    max_rpm=MAX_RPM,
    max_retry_limit=MAX_RETRY_LIMIT
)

radiopaedia_search_task = Task(
    description="""
1. Conduct a targeted Radiopaedia search on [RADIOLOGY_TOPIC].
2. Identify at least 5 high-quality educational articles.
3. For each article, provide:
   - Title
   - URL
   - Brief description of why it's valuable for board exam preparation
4. Ensure the curated list includes articles that collectively cover the major
   aspects of [RADIOLOGY_TOPIC] relevant to board exams.
""",
    agent=radiopaedia_researcher,
    expected_output="A list of at least 5 relevant Radiopaedia articles with URLs and brief descriptions of their educational value."
)

# -------------------------------------------------------------------
# Content Extractor Agent
# -------------------------------------------------------------------
content_extractor = Agent(
    role="Radiology Content Extractor",
    goal="Extract comprehensive educational content from Radiopaedia articles to create high-quality board review materials.",
    backstory="""You are a specialized radiology educator who excels at identifying and extracting
board-relevant content from educational resources. You know exactly what information is most
likely to appear on board exams and how to organize it effectively for learning. You have
years of experience preparing residents for their board exams and know what level of detail
is appropriate.""",
    verbose=True,
    allow_delegation=False,
    tools=[get_full_article],
    llm=llm,
    max_rpm=MAX_RPM,
    max_retry_limit=MAX_RETRY_LIMIT
)

content_extraction_task = Task(
    description="""
For each Radiopaedia article provided by the researcher:
1. Retrieve the full article content using the URL.
2. Extract key educational information relevant to board exams:
   - Pathology name and classification
   - Key imaging findings (for different modalities if available)
   - Epidemiology and demographics
   - Clinical presentation
   - Differential diagnoses
   - Treatment and management (if available)
3. Organize the extracted information in a clear, structured format.
""",
    agent=content_extractor,
    expected_output="Organized, board-relevant content extracted from each Radiopaedia article."
)

# -------------------------------------------------------------------
# Flashcard Generator Agent
# -------------------------------------------------------------------
flashcard_generator = Agent(
    role="Radiology Flashcard Generator",
    goal="Create high-quality question-answer pairs for radiology board exam preparation.",
    backstory="""You are an expert in radiology education who specializes in creating effective
board review materials. You have a deep understanding of what makes a good flashcard - clear,
focused questions with comprehensive yet concise answers. You know exactly what information
is most commonly tested on board exams and how to frame questions to enhance learning and
retention. Your flashcards are consistently rated as the most helpful study resources by
radiology residents.""",
    verbose=True,
    allow_delegation=False,
    llm=llm,
    max_rpm=MAX_RPM,
    max_retry_limit=MAX_RETRY_LIMIT
)

flashcard_generation_task = Task(
    description="""
Based on the extracted content from Radiopaedia articles:
1. Create 10-15 high-quality question-answer pairs suitable for board exam preparation on [RADIOLOGY_TOPIC].
2. For each question-answer pair:
   - Formulate a clear, specific question that tests understanding of key concepts
   - Provide a comprehensive yet concise answer that includes the most board-relevant information
   - Ensure questions cover different aspects (imaging findings, differential diagnoses, epidemiology, etc.)
3. Format the output as a valid JSON array with each flashcard as an object with "question" and "answer" fields:
   ```json
   [
     {
       "question": "What are the classic MRI findings of pilocytic astrocytoma?",
       "answer": "Cystic lesion with enhancing mural nodule; T1 hypointense, T2 hyperintense; avid enhancement of solid component; commonly in cerebellum, optic pathways, brain stem."
     },
     {
       "question": "...",
       "answer": "..."
     }
   ]
   ```
4. Ensure the JSON is properly formatted and valid.
""",
    agent=flashcard_generator,
    expected_output="A JSON array of 10-15 high-quality question-answer pairs suitable for radiology board exam preparation."
)


def generate_flashcards(topic):
    # Update tasks with the specific topic
    radiopaedia_search_task.description = radiopaedia_search_task.description.replace("[RADIOLOGY_TOPIC]", topic)
    flashcard_generation_task.description = flashcard_generation_task.description.replace("[RADIOLOGY_TOPIC]", topic)

    # Create the crew
    flashcard_crew = Crew(
        agents=[radiopaedia_researcher, content_extractor, flashcard_generator],
        tasks=[radiopaedia_search_task, content_extraction_task, flashcard_generation_task],
        process=Process.sequential,
        max_rpm=MAX_RPM,
        max_retry_limit=MAX_RETRY_LIMIT,
        output_log_file=True,
        verbose=True)

    # Execute the crew and get results
    result = flashcard_crew.kickoff()

    # Extract and save the flashcards
    flashcards = extract_json_from_output(result.raw)
    qa_bank_json = "qa_bank.json"
    with open(qa_bank_json, "w") as f:
        json.dump(flashcards, f, indent=2)

    print(f"Generated {len(flashcards)} flashcards for '{topic}'")

    create_flashcard_ppt("qa_bank.json", topic)

    print('-'*50)
    print("\nFlashcard generation complete.\n")
    print("<<<--- Open the left tab and look for your flashcard ppt file.\n")
    print('-'*50)

## 🧠 Time to Run the Agentic Flashcard Generator

This section will launch a team of three AI agents to generate high-quality, board-style radiology flashcards based on Radiopaedia content.

---

### 🔍 Choose Your Topic:
Update the `topic` variable with **any radiology topic** you want to use for flashcard generation.  

---

### 📝 Outputs:
- 📽️ Flashcards compiled into a PowerPoint deck: `flashcards.pptx`


In [None]:
# Replace with the radiology topic of interest
topic = "CNS lesions in HIV"

# Run the flashcards pipeline
generate_flashcards(topic)


## ✅ Notebook Complete – Powered by Agentic AI

You’ve just completed an end-to-end demonstration of **Agentic AI in Radiology**, showcasing how multi-agent systems can streamline research, automate content extraction, and generate high-yield educational tools.

---

### 🙌 Acknowledgments

This notebook was made possible through the integration of multiple cutting-edge platforms and technologies:

- **[PubMed](https://pubmed.ncbi.nlm.nih.gov/)** – for sourcing peer-reviewed medical literature  
- **[Radiopaedia](https://radiopaedia.org/)** – for open-access radiology education content  
- **[Google Gemini](https://aistudio.google.com/)** – for LLM-powered reasoning and synthesis  
- **[Apify](https://apify.com/)** – for browser-based web scraping with markdown output  
- **[CrewAI](https://github.com/joaomdmoura/crewAI)** – for coordinating intelligent agent workflows  

---

### 👨‍⚕️ Authored by

**Paulo Kuriki, MD**  
Assistant Professor & Director, AI Lab / AIR-Hub  
**UT Southwestern Medical Center**  

🔗 [Connect on LinkedIn](https://www.linkedin.com/in/paulokuriki/)  
💻 [GitHub](https://github.com/paulokuriki)

---

### 💬 Stay Curious. Keep Building.

If this notebook inspired you to bring agentic workflows into your clinical, research, or educational practice—**you're just getting started**.

Let’s continue building the future of radiology, together. 🧠💥

---