Prompt Optimization Platform

A machine learning platform for iteratively refining LLM prompts through a 5-step ML-driven workflow.

Architecture Overview

                   ┌───────────────────────────┐
                   │    FastAPI Application    │
                   │    (Main Entry Point)     │
                   └───────────┬───────────────┘
                               │
                ┌──────────────┴──────────────┐
                │                             │
    ┌───────────▼────────────┐  ┌─────────────▼─────────────┐
    │     Flask Dashboard    │  │      Prefect Flows        │
    │  (Monitoring & Admin)  │  │  (Workflow Orchestration) │
    └───────────┬────────────┘  └─────────────┬─────────────┘
                │                              │
                └──────────────┬──────────────┘
                               │
                    ┌──────────▼──────────┐
                    │   Shared Services   │
                    │ (Auth, DB, Caching) │
                    └─────────────────────┘

Features

5-Step ML-Driven Workflow: A structured approach to prompt optimization
Prefect Flow Integration: Orchestrated workflow management
FlaskUI Dashboard: Monitoring and visualization for experiments and costs
FastAPI Backend: High-performance REST API
Authentication: API key validation for secure access
Cost Tracking: Monitor token usage and associated costs
Caching: Reduce API costs with response caching
Unified Logging: Structured JSON logs across all components

Getting Started

Install dependencies:

pip install -r requirements.txt

Start the application:

uvicorn src.app.main:app --host 0.0.0.0 --port 5000 --reload

Access the application:
- Dashboard: http://localhost:5000/dashboard
- API Documentation: http://localhost:5000/api/docs

Dashboard

The Flask dashboard provides visualization and monitoring for:

Experiment tracking
Cost monitoring
Workflow status
System health

API Documentation

The API documentation is available at /api/docs and includes endpoints for:

Prompt management
Optimization workflows
Experiment tracking
Dataset management
Cost reporting

Configuration

Configure the application through environment variables or a .env file:

ENVIRONMENT: The environment (development, production)
DEBUG: Enable debug mode (1 or 0)
API_KEY: API key for authentication
VERTEX_PROJECT_ID: Google Cloud Vertex AI project ID
PRIMARY_MODEL: Primary LLM model name
OPTIMIZER_MODEL: Optimizer LLM model name
PREFECT_ENABLED: Enable Prefect integration (1 or 0)
PREFECT_API_URL: Prefect API URL
PREFECT_API_KEY: Prefect API key

License

MIT License

MLprompt - Prompt Engineering ML Platform

Overview

A web-based UI for iteratively testing and refining LLM prompts using a machine learning approach with Google's Gemini API. This platform allows users to experiment with different prompt variations, evaluate responses against expected outputs, track metrics over time, and visualize effectiveness of prompt refinements.

Key Features

Interactive Training Interface: ML-style training flow for prompt engineering
Three-LLM Architecture: Primary LLM, Evaluation Engine, and Optimizer LLM
Multiple Optimization Strategies: Full rewrite, targeted edit, example addition
Comprehensive Experiment Tracking: History views and metrics visualization
User-Friendly Design: Modern styling with contextual help tooltips

Technical Stack

Flask backend with JavaScript frontend
Google Gemini API Integration
Experiment tracking and versioning
Data visualization using Chart.js

Getting Started

Clone this repository
Install dependencies with pip install -r requirements.txt
Set up your Google API key in environment variables
Run the application with python main.py

Usage

Input system prompt and output prompt for the LLM
Add test cases with input-output pairs
Run evaluation and view metrics
Use the optimizer to refine prompts based on results
Track experiment history and compare iterations

Medical Case Study Feature

This application comes pre-loaded with NEJM (New England Journal of Medicine) case studies:

159 medical cases split 50/50 between training and validation sets
Enhanced similarity evaluation that checks if the ground truth diagnosis appears in the LLM response
Ideal for training diagnostic reasoning prompts

Name		Name	Last commit message	Last commit date
Latest commit History 323 Commits
alembic		alembic
app		app
attached_assets		attached_assets
cost_reports		cost_reports
data		data
experiments		experiments
prompts		prompts
row2_results		row2_results
scripts		scripts
src		src
static/css		static/css
templates		templates
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
.replit		.replit
GITHUB_WORKFLOW.md		GITHUB_WORKFLOW.md
README.md		README.md
TODO.md		TODO.md
alembic.ini		alembic.ini
app.py		app.py
check_gemini_models.py		check_gemini_models.py
config.yaml		config.yaml
debug_and_fix_training.py		debug_and_fix_training.py
debug_history_examples.py		debug_history_examples.py
debug_platform.py		debug_platform.py
debug_test_cases.csv		debug_test_cases.csv
debug_training.py		debug_training.py
debug_ui.py		debug_ui.py
deployment_cleanup.sh		deployment_cleanup.sh
fix_api_auth.py		fix_api_auth.py
fix_data_loading.py		fix_data_loading.py
fix_experiment_tracking.py		fix_experiment_tracking.py
fix_nejm_data.py		fix_nejm_data.py
fix_workflow_limitations.py		fix_workflow_limitations.py
generated-icon.png		generated-icon.png
gh_login.sh.template		gh_login.sh.template
main.py		main.py
memory_monitor.py		memory_monitor.py
mlprompt_project.tar.gz		mlprompt_project.tar.gz
models.py		models.py
process_nejm_data.py		process_nejm_data.py
prompts.py		prompts.py
push_to_github.sh		push_to_github.sh
pyproject.toml		pyproject.toml
requirements-debug.txt		requirements-debug.txt
requirements.txt		requirements.txt
routes.py		routes.py
run_tests.py		run_tests.py
run_workflow_tests.sh		run_workflow_tests.sh
startup.sh		startup.sh
test_all_components_debug.py		test_all_components_debug.py
test_api_endpoints.py		test_api_endpoints.py
test_api_key.py		test_api_key.py
test_backend.py		test_backend.py
test_five_api_workflow.py		test_five_api_workflow.py
test_five_api_workflow_row2.py		test_five_api_workflow_row2.py
test_huggingface.py		test_huggingface.py
test_integration_quick.py		test_integration_quick.py
test_llm_connection.py		test_llm_connection.py
test_minimal.py		test_minimal.py
test_nejm_row.py		test_nejm_row.py
test_platform_components.py		test_platform_components.py
test_quick.py		test_quick.py
test_results.json		test_results.json
test_token.py		test_token.py
test_training.py		test_training.py
test_workflow_architecture.py		test_workflow_architecture.py
test_workflow_components.py		test_workflow_components.py
test_workflow_integration.py		test_workflow_integration.py
test_workflow_minimal.py		test_workflow_minimal.py
utils.py		utils.py
uv.lock		uv.lock
variables.md		variables.md
verify_data_module.py		verify_data_module.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Prompt Optimization Platform

Architecture Overview

Features

Getting Started

Dashboard

API Documentation

Configuration

License

MLprompt - Prompt Engineering ML Platform

Overview

Key Features

Technical Stack

Getting Started

Usage

Medical Case Study Feature

About

Uh oh!

Releases

Packages

Uh oh!

Languages

gsiegel14/MLprompt

Folders and files

Latest commit

History

Repository files navigation

Prompt Optimization Platform

Architecture Overview

Features

Getting Started

Dashboard

API Documentation

Configuration

License

MLprompt - Prompt Engineering ML Platform

Overview

Key Features

Technical Stack

Getting Started

Usage

Medical Case Study Feature

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages