This repository contains a system for selecting optimal prompts for Large Language Models (LLMs) using a coalition game theory approach. The system can evaluate prompt performance, train utility models, and select the best prompt or combination of prompts for specific tasks.
Traditional LLM prompt engineering relies on manually crafted prompts for specific tasks. This project provides an automated approach that:
- Evaluates prompt performance across different task categories
- Trains a utility function model to predict prompt effectiveness
- Uses coalition game theory to identify optimal combinations of prompts
- Selects the best prompt or prompt coalition for a given task in real-time
- Prompt Repository Management: Load and manage a diverse collection of system prompts
- Task Example Generation: Create evaluation examples from real datasets across multiple categories
- Performance Evaluation: Assess prompt performance on various tasks with or without API calls
- Utility Model Training: Train models to predict prompt effectiveness for different tasks
- Coalition Game Theory: Find optimal prompt combinations using Shapley values
- Interactive Metrics Dashboard: Visualize and analyze prompt performance data
- Command Line Interface: Easy-to-use commands for all system functions
# Clone the repository
git clone https://github.com/yourusername/llm-prompt-selection.git
cd llm-prompt-selection
# Install dependencies
pip install -r requirements.txt- Python 3.8+
- pandas
- numpy
- scikit-learn
- sentence-transformers
- tqdm
- openai (optional, for real evaluation)
- matplotlib
- seaborn
- dash (for interactive dashboard)
- plotly
Process your prompts and create evaluation datasets:
python main.py process_data --csv System_Prompt.csv --output dataYou can either simulate evaluations or use a real LLM API:
# Simulate evaluations
python main.py simulate --data_dir data
# Or with real LLM (requires API key)
python main.py pipeline --eval --api_key YOUR_OPENAI_API_KEYTrain the utility function model using evaluation results:
python main.py train --data_dir data --model_dir modelsUse the model to select optimal prompts for specific tasks:
# Select a single prompt
python main.py select --task "Write a function to find duplicates in an array" --csv System_Prompt.csv --data_dir data --model_dir models
# Select a coalition of prompts
python main.py select --task "Write a function to find duplicates in an array" --csv System_Prompt.csv --data_dir data --model_dir models --coalition --max_size 3Analyze and visualize the performance improvements from using prompt coalitions:
# Quick analysis without API calls
python analyze_coalitions.py --csv System_Prompt.csv --data_dir data --model_dir models --examples 5
# Comprehensive evaluation with visualization
python coalition_metrics.py --csv System_Prompt.csv --data_dir data --model_dir models --test_size 20
# Interactive dashboard
python coalition_dashboard.py --results_dir resultsRun the complete end-to-end pipeline:
python main.py pipeline --csv System_Prompt.csv --data_dir data --model_dir modelsdata_preparation.py: Dataset loading and preparationprompt_evaluator.py: Prompt performance evaluationutility_model.py: Utility function model and coalition game logicprompt_selection_system.py: End-to-end system for prompt selectionmain.py: Command line interfacecoalition_metrics.py: Performance analysis for prompt coalitionsanalyze_coalitions.py: Quick coalition analysis toolcoalition_dashboard.py: Interactive metrics dashboard
Your System_Prompt.csv should have the following columns:
- Prompt ID: Unique identifier (e.g., CG-01)
- Prompt Name: Descriptive name
- Full Prompt Text: The complete system prompt
- Intended Function: Description of the prompt's purpose
- Example Inputs: Sample tasks the prompt is designed to handle
- Expected Output Style: Description of expected outputs
- Strengths: The prompt's strengths
- Limitations: Known limitations
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License - see the LICENSE file for details.
- The project utilizes sentence transformers for semantic embeddings
- Coalition game theory concepts are implemented for prompt combinations
- Evaluation uses benchmark datasets from various sources