Biomedical RAG API Testing Framework

Testing framework for biomedical RAG applications via FastAPI endpoints. Generate questions from study abstracts and evaluate API performance.

Features

Generate test questions from biomedical abstracts CSV
Test DugBot/BDCBot APIs via HTTP endpoints
Performance evaluation and reporting
RAGAS evaluation for answer quality (context recall, faithfulness, etc.)
Question types: factual, analytical, comparative, unanswerable

Setup

pip install -r requirements.txt

Configuration

To Configure the application, please copy .env-template to a new .env file and modify the appropriate variables. When the program starts it will load in .env file automatically.

Process Flow

Load Abstracts → CSV with study abstracts
Generate Questions → 4 types using configurable LLM (factual, analytical, comparative, unanswerable)
Test API → Send questions to DUGBot endpoint, track performance
Store Results → Raw API responses with timing and status
Compute Basic Metrics → Success rate, response time, error analysis
Run RAGAS → Answer quality evaluation using OpenAI or Ollama (faithfulness, context recall, etc.)
Generate Report → Combined performance + quality assessment

Usage

Generate Questions

# From JSON file 
python main.py generate documents.json -o questions.json -n 40

# From CSV file 
python main.py generate abstracts.csv -o questions.json -n 40

Test API

python main.py test questions.json -o test_results -r

Evaluate Results

python main.py evaluate test_results_results.json -o evaluation
python main.py evaluate test_results_results.json --with-ragas  # enable RAGAS evaluation

Compare Multiple Tests

python main.py compare test1_results.json test2_results.json -o comparison

File Structure

testing_framework/
├── main.py              # CLI interface
├── config.py            # Configuration 
├── qa_generator.py      # Question generation from abstracts
├── api_tester.py        # API testing (DugBot and BDCBot)
├── evaluator.py         # Results evaluation
├── data_processor.py    # Data loading/saving
├── format_converter.py  # Dataset format conversion
├── requirements.txt     # Dependencies
└── results/             # Generated results

Input Format

JSON Format

[
  {
    "ID": "study_001",
    "CONTEXT": "This study examines the ..."
  },
  {
    "ID": "study_002", 
    "CONTEXT": "The C4R study ..."
  }
]

Output

Question datasets in JSON format
API test results with response times and success rates
Evaluation reports with performance metrics
Comparison analysis across multiple tests (this is just an idea to track the performance for a given period)

Question generation uses configurable LLM (default: Ollama/Llama) for specific biomedical question types.

JSON Input Examples

Simple document list:

[
  {"ID": "doc1", "CONTEXT": "The C4R studies are ..."},
  {"ID": "doc2", "CONTEXT": "The Covid studies..."}
]

With metadata wrapper:

{
  "documents": [
    {"ID": "doc1", "CONTEXT": "The C4R studies are..."},
    {"ID": "doc2", "CONTEXT": "The Covid studies..."}
  ]
}

LLM Configuration

Question Generation

Default: Ollama with Llama 3.1 OR "gemma3:12b"
Configurable: Any Ollama-compatible model that are limited to GPU resources on Sterling (Cluster)
Purpose: Generate test questions from abstracts

RAGAS Evaluation

Option 1: OpenAI GPT-4
Option 2: Ollama with Llama 3.1/Gemma3:12b
Purpose: Evaluate answer quality with RAGAS metrics

Configuration Examples

OpenAI for RAGAS :

RAGAS_EVALUATION_LLM_PROVIDER = "openai"
RAGAS_EVALUATION_LLM_API_KEY = "your-openai-key"

Ollama for RAGAS :

RAGAS_EVALUATION_LLM_PROVIDER = "ollama"
RAGAS_EVALUATION_LLM_MODEL = "llama3.1:latest" OR "gemma3:12b"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Biomedical RAG API Testing Framework

Features

Setup

Configuration

Process Flow

Usage

Generate Questions

Test API

Evaluate Results

Compare Multiple Tests

File Structure

Input Format

JSON Format

Output

JSON Input Examples

LLM Configuration

Question Generation

RAGAS Evaluation

Configuration Examples

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
q_a_generation		q_a_generation
.env-template		.env-template
LICENSE		LICENSE
README.md		README.md
api_tester.py		api_tester.py
config.py		config.py
data_processor.py		data_processor.py
evaluator.py		evaluator.py
format_converter.py		format_converter.py
main.py		main.py
qa_generator.py		qa_generator.py
ragas_evaluator.py		ragas_evaluator.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Biomedical RAG API Testing Framework

Features

Setup

Configuration

Process Flow

Usage

Generate Questions

Test API

Evaluate Results

Compare Multiple Tests

File Structure

Input Format

JSON Format

Output

JSON Input Examples

LLM Configuration

Question Generation

RAGAS Evaluation

Configuration Examples

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages