Medical Diagnosis Extraction Demo

A practical demonstration of structured output generation from natural language medical texts using Google's Gemini AI API and Pydantic models.

Overview

This project extracts structured medical diagnoses from unstructured clinical text, transforming narrative medical reports into JSON-formatted data with diagnostic terms, context, and temporal information.

Features

Text Preprocessing: Cleans and normalizes medical text files
AI-Powered Extraction: Uses Gemini 2.5 Flash for intelligent diagnosis identification
Structured Output: Returns JSON with term, context, and temporal aspects
Pydantic Validation: Ensures data quality and type safety
Beautiful Terminal Output: Color-coded results with emojis for better readability

Project Structure

demo_basic_structured_output/
├── agents/
│   ├── __init__.py
│   └── agent.py           # Main extraction logic
├── preprocess/
│   ├── __init__.py
│   └── preprocess.py      # Text cleaning utilities
├── texts/
│   └── case1.txt          # Sample medical case
├── output/                # Generated JSON results
├── .env                   # API keys (not in repo)
├── pyproject.toml         # Dependencies
└── README.md

Quick Start

1. Prerequisites

Python 3.12+
uv package manager
Google Gemini API key

2. Setup

# Clone or download the project
cd demo_basic_structured_output

# Install dependencies
uv sync

# Create environment file
cp .env.example .env  # or create manually

3. Configure API Key

Create a .env file in the project root:

GEMINI_API_KEY=your_actual_api_key_here

Get your API key: Visit Google AI Studio to generate a free Gemini API key.

4. Run the Demo

# Execute the extraction agent
uv run python agents/agent.py

Expected Output

The demo will:

Preprocess the medical text from texts/case1.txt
Extract diagnoses using Gemini AI
Display results in the terminal with formatting
Save structured JSON to output/diagnosis_extraction.json

Example output structure:

{
  "diagnostics": [
    {
      "term": "ventricular fibrillation",
      "context": "patient shocked for ventricular fibrillation",
      "temporal": "upon arrival at emergency room"
    }
  ]
}

Usage with UV

# Install/update dependencies
uv sync

# Run the main script
uv run python agents/agent.py

# Add new dependencies
uv add package-name

# Run in development mode
uv run --dev python agents/agent.py

# Check installed packages
uv pip list

Customization

Adding New Medical Cases

Place text files in the texts/ directory
Modify the case_file path in agents/agent.py
Run the extraction

Modifying Output Schema

Edit the Pydantic models in agents/agent.py:

class Diagnosis(BaseModel):
    term: str
    context: str
    temporal: str
    # Add new fields here
    severity: str
    icd_code: str

Adjusting AI Parameters

Modify the GenerateContentConfig in agents/agent.py:

config=types.GenerateContentConfig(
    response_mime_type="application/json",
    response_schema=DiagnosisList,
    temperature=0.1,  # Lower = more deterministic
    max_output_tokens=2048
)

Dependencies

google-genai: Google Gemini AI API client
pydantic: Data validation and settings management
python-dotenv: Environment variable management

Troubleshooting

API Key Issues:

Ensure .env file exists with valid GEMINI_API_KEY
Check API key permissions and quotas

Import Errors:

Run uv sync to install dependencies
Verify Python version compatibility (3.12+)

Text Processing:

Ensure input files are UTF-8 encoded
Check file paths in the agent configuration

License

This is a demonstration project for educational purposes.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
agents		agents
output		output
preprocess		preprocess
texts		texts
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
main.py		main.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Medical Diagnosis Extraction Demo

Overview

Features

Project Structure

Quick Start

1. Prerequisites

2. Setup

3. Configure API Key

4. Run the Demo

Expected Output

Usage with UV

Customization

Adding New Medical Cases

Modifying Output Schema

Adjusting AI Parameters

Dependencies

Troubleshooting

License

About

Uh oh!

Releases

Packages

Languages

gusmmm/demo_basic_structured_output

Folders and files

Latest commit

History

Repository files navigation

Medical Diagnosis Extraction Demo

Overview

Features

Project Structure

Quick Start

1. Prerequisites

2. Setup

3. Configure API Key

4. Run the Demo

Expected Output

Usage with UV

Customization

Adding New Medical Cases

Modifying Output Schema

Adjusting AI Parameters

Dependencies

Troubleshooting

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages